Click for next page ( 230


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 229
11 Developing Classroom Process Data for the Improvement of Teaching lames TY Stigler and Michelle Perry Of the many factors that determine student academic achievement, class- room instruction is but one. Yet it is surely an important one. Indeed, all attempts to improve education must of necessity at some point be mediated through the classroom. This is obvious because classroom practice represents the most direct means for affecting student outcomes. However, there has been surprisingly little research on this link in the chain in affecting student outcomes. As a nation, we collect very little data on what happens inside classrooms. As Mandel (1996:3-29) wrote, "The national conversation about teaching has always been compromised by a dearth of information about the quality of prac- tice and practitioners. . . . When dismal or promising results about student performance are reported, a new chain reaction of suppositions is often set off about the degree to which teachers are to be blamed or praised. But these suppositions are just that hypotheses disconnected from much of a factual base that might shed some light on what is occurring, including the extent to which the observed results can be accurately attributed to teacher actions." This relative dearth of data can be blamed, at least in part, on what Burstein et al. (1995) point out as the inherent difficulty in measuring instructional practice. Despite this inherent difficulty, we argue that the merits of these data out- weigh the obstacles in collecting them. As an example of the importance of these data, here it is argued that we cannot know which instructional strategies lead to positive learning outcomes unless we know which instructional practices are being used and we cannot know which are being used without somehow looking directly at educational practices. In other words, achievement data may tell us a lot, but those data cannot tell us what should be done differently inside the 229

OCR for page 229
230 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING classroom. We argue that for test data to be most informative, classroom pro- cesses need to be examined. If change in student learning outcomes is observed in the tests, we still need to know whether change is due to something going on in the classroom or something independent of that. In this paper, we make the assumption that classroom process data, especially when collected in conjunction with student achievement data, can play a critical role in efforts to improve education. We further assume, however, that such data will not necessarily improve education and that it is therefore extremely important to have an explicit idea of exactly how data will be used to improve education and by whom. In particular, we argue that researchers, policy makers, and teachers need different kinds of data and will use data in different ways to improve the quality of teaching and learning in classrooms. Five questions guide this paper: (1) What is the nature of classroom instruc- tion, and what implications does this have for developing indicators of instruc- tional quality? (2) What kind of data can be collected, and what are the advantages and disadvantages of each? (3) What kind of data ought to be collected, and how will the data be used to improve the quality of instruction? (4) What are the costs of collecting data of various kinds? (5) How can new kinds of data collection be integrated into the existing National Assessment of Educational Progress (NAEP) program? Given these issues and questions, the goal of this paper is to consider what sorts of data can be collected on classroom processes. With this goal in mind, we examine the kinds of data that are currently collected on classroom processes and evaluate what can and cannot be learned from these data. We then look beyond current research practices and make suggestions for future data collection on classroom processes. STUDYING CLASSROOM PROCESSES Nature of the Classroom Having established a broad interest in collecting data on classroom processes, we consider what kind of data might be collected. Before launching into a discussion of specific data collection techniques, we need to ponder the nature of classroom instruction. The data collected and measures constructed are only indicators. To assess the validity of these indicators, we must first think through the nature of what it is they are intended to be indicators of. Indeed, a framework for thinking about the constructs that define classroom instruction provides a necessary theoretical context in which indicators can be interpreted. Classroom instruction, first and foremost, is a complex, dynamic, goal- directed system. One goal of the system is student learning, although there certainly are other goals as well. For purposes of this paper we will assume that achievement, as measured by the NAEP, is an important overall goal of the

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 231 system we describe. The system consists of several important elements, includ- ing a teacher, students, curriculum, and materials. These elements interact with each other in complex ways. Teachers orchestrate the sequence of activities that comprise the classroom lesson. These activities represent organized behavioral interactions between students, teachers, and curriculum/materials. In addition, these lesson elements interact with key contextual factors that impinge on the classroom. To say that the classroom is a system implies that it is more than the sum of individual features or independent dimensions. Although features might be mea- sured to indicate indirectly the functioning of the system, it is difficult to imagine features of instruction that are always good or dimensions on which lessons should be uniformly high. For example, although in general it might be true that lessons in which students are cognitively challenged are better than lessons in which they are not, there are many instances in which repeated practice with less challenging tasks is appropriate and necessary for students' learning. This pre- sents the researcher with a significant challenge. To define quality of instruction, one must do more than define a set of features; one must evaluate features of a specific lesson with reference to how they function in the context of a goal- directed system. Indeed, one must describe the system itself to understand the meaning of indicators. An example will serve to illustrate the practical implications of this point. In the process-product research of the 1970s and 1980s, it was demonstrated, across many studies, that student learning of mathematics was significantly associated with rapid coverage of a large number of problems during the lesson: the more problems the teacher led students through, and the faster the pace, the more students learned as measured by achievement tests (Leighton, 1994; Leinhardt and Putnam, 1987~. As often as this effect was found, however, it turned out not to hold up in cross-cultural comparisons. Japanese students achieve in math- ematics at far higher levels than U.S. students, yet Japanese teachers often are found to cover only one or two problems in a single lesson, compared with 30 to 40 in an American lesson (Stigler and Perry, 1988~. Clearly, the indicator of how many problems are covered has different meanings in the context of different instructional systems. U.S. teachers were using problems for repeated practice, and clearly there is something to be gained by such practice. Japanese teachers, in contrast, were using problems as the focus of students' deep thinking and reflection. Simply knowing how many problems were covered was not enough to characterize the kind of instruction students experienced. Another truth about classroom teaching is that it is a cultural activity (Gallimore, 1996; Stigler and Hiebert, 1997~. What this means is that teaching, like other cultural activities, is constructed largely out of widely shared routines that are learned implicitly and are highly resistant to change. Although in our culture we perceive variability across teachers in their approach to teaching, cross-cultural comparison reveals that such variability may be relatively insig

OCR for page 229
232 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING nificant compared with the large differences across cultures in the ways that teachers teach. U.S. teachers, for example, have varied ways of providing feed- back to students who are working on math problems during seatwork. But these variations pale in size when we realize that virtually all U.S. teachers tell students how to solve the problem before they ask the students to solve it, whereas most times Japanese teachers do not. We tend not to notice those aspects of cultural activities that are shared, focusing instead on features that vary. But it may well be that the aspects of teaching that are widely shared in a culture are the ones that have the most impact on student learning. One important implication of this fact about teaching is that it shifts our focus somewhat from the study of teachers to the study of teaching. Because the literature on classroom indicators has been largely an American one, it has tended to focus on aspects of teaching that vary in our culture. But we need to focus as well on identifying the shared cultural scripts that underlie most or all of what we see inside American classrooms. The improvement of teaching over time may be much greater if we focus on changing widely shared scripts than if we focus on understanding variations in the competence with which teachers use the scripts. Research Questions Viewing classroom instruction as a complex system and as a cultural activity leads us to identify several important research questions to guide our inquiry into instructional quality. What kinds of instructional systems can we identify? How can we describe these systems? This will involve, minimally, identifying the key elements of the classroom lesson and describing the ways in which these elements interact. What kinds of quantitative indicators can we develop to assess the func- tioning of different types of instructional systems? What are the processes that affect these indicators? We must quantify the descriptions developed in response to the first research question if we are going to validate them across large numbers of classrooms. What is the role of the student in different instructional systems? What are the processes by which students learn from classroom instruction, and what characteristics of different instructional systems affect how much students learn? These are key questions, as our interest in instruction rests on the assumption that student learning is affected by instruction. What is the role of the teacher in different instructional systems? How can teaching be improved? Again, we assume that teachers play a critical role in shaping the nature and quality of instruction in the classroom. Each of these general research questions can be approached through various analytic frames. For example, classroom lessons can be described on a more

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 233 macrolevel in terms of activity structures (e.g., classwork or seatwork) or from a more microanalytic level (e.g., detailed analysis of discourse patterns as they unfold throughout the lesson). Units and Methods of Analysis Starting with the assumption that classroom instruction is a complex cultural system, we have proposed a broad set of research questions. The complexity of instruction also has implications for the units and methods of analysis we choose. Classrooms must be studied using units that make sense and that preserve the crucial aspects of the system. These units might be relatively large (e.g., units, grade levels), but they are probably not smaller than the classroom lesson. Class- room lessons have ecological validity from the teacher's point of view. Teachers plan their days in terms of lessons: "First we'll do math, then social studies." Lessons are goal directed and orchestrated by the teacher. The explicit goal of the lesson might be a student learning goal, or it may simply be the completion of some series of activities. Regardless of the goal, the lesson itself can only be understood in relation to the goal. Although we can study the lesson through different lenses (e.g., we can study the nature of classroom discourse or the patterning of teacher-student interactions), we will need to collect information about the context in which the processes operate. It is also important to note at the outset that both qualitative and quantitative analyses will be required in our efforts to understand and improve classroom learning. The first research question we listed is one that must be answered through qualitative analysis. Identifying parts of lessons and figuring out how the parts interact to produce student learning require a qualitative analysis of the instructional process. Once the process has been described, however, it is useful to develop indicators that can be used to validate and refine the descriptive model of instruction. Not only do we need both qualitative and quantitative data, we also need a way to link the two kinds of data together. As we will see, this has been a problem with more traditional approaches to the study of classroom processes. TRADITIONAL METHODS: SURVEYS AND NARRATIVE DESCRIPTIONS Most commonly we have relied on surveys to collect data on classrooms. Additionally, narrative descriptions have been used as a method of collecting classroom data. In this section, we review those methods. In particular, first we provide descriptions and overviews of the data forms. Next, we examine what we typically learn from data collected with each of these methods. Finally, we offer an evaluation of each of these methods, with some attention to both the limita

OCR for page 229
234 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING lions that each method has in terms of producing data on classroom processes and the potential of providing new insights about teaching and learning. Survey Methods Descriptions and Overviews Surveys represent relatively straightforward ways to collect data on a host of issues related to classroom processes; however, surveys can take several different forms. For example, even if we are just surveying teachers, teachers can be surveyed about their recollections or their opinions with questionnaires (whose answers can take the form of a rating scale, forced multiple-choice responses, or open-ended answers), interviews, or diaries. In this section, we also include observational checklists, which in some ways resemble the other data forms in this section but in other ways resemble narrative observational records. In the remainder of the section, we provide a general description of the various types of survey methods. Questionnaires and rating scales Questionnaires and rating scales are often used to tap classroom processes. Questionnaires and rating scales used for these purposes typically request information from teachers about the activities taking place in their classrooms. Others, including classroom observers and students, also may participate in completing questionnaires about classroom processes. This data source can provide information about what is taught, how the teaching takes place, and how much time is spent on various topics and activities. As an example, Burstein et al. (1995:xiii) asked teachers to judge the percentage of class time spent instructing with various strategies (e.g., whole-class instruc- tion, administering tests, performing administrative tasks). One of their major findings was that, "although the picture of teaching that can be drawn from survey data is quite general, it is probably valid, because . . . data clearly show that there is little variation in teachers' instructional strategies. The majority of teachers use a few instructional approaches and use them often." With these methods we can obtain data from a large number of informants who have direct access to the information we find of interest. Diaries We use the term diary to represent teachers' records of their lessons, including lesson plans, outcomes, and the like. Diaries have been used, relatively successfully, to measure curriculum content. Given that we are concerned with classroom processes, one might wonder why we specified that diaries are used to measure curriculum content. The reason is that curriculum content has often served as a proxy for classroom practices, although it is not itself a direct measure of classroom practices. Barr and Dreeben (1983:107) defined content coverage (also commonly referred to as instructional pace) as the amount of curricular

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 235 material that is covered over a period of time. They argued that although other indexes of productivity designed for judging the effectiveness of instruction are possible, "we have selected this one because, when treated at the level of indi- vidual children, it represents an instructional condition integrally connected with learning." Another reason for focusing on diaries to measure content when we are concerned about the relationship between teaching and learning, according to Brophy and Good (1986:360) is that "the most consistently replicated findings link achievement to the quantity and pacing of instruction." As an example, Perry (1988) surveyed nine fourth-grade teachers' math- ematics lesson plan books over the course of one year and recorded which prob- lems were assigned. She then coded each problem as belonging to one of several mathematical topics. She also measured the students' mathematics problem- solving performance, both at the beginning and the end of the school year. Problems that most children solved incorrectly at the beginning of the year were designated as representing difficult topics, and problems that most children solved correctly were designated as representing easy topics. Generally, Perry found that problem assignment was related to student learning; more specifically, she found that spending a great deal of time on a few difficult problems led to better student achievement than covering many problems, especially problems that most students could solve before receiving instruction. In this study, a diary of what instruction consisted of was used to make inferences about teaching practices that were related to learning outcomes. Interviews Interviews, conducted face to face or by telephone, allow us to get teachers' and/or the students' views of classroom processes. We can ask what happened and we can ask for evaluations about what was reported to have happened. Interview techniques are especially useful, compared to paper-and-pencil methods (such as questionnaires and rating scales) when the potential responses have not been determined in advance. Interviews, especially those conducted by well-trained interviewers who know what sorts of issues are of interest and which deserve lengthy commentary, are desirable when we expect complex responses because interviewers can ask respondents different questions, depending on pre- vious answers. If the potential responses are already known, less expensive methods may be more desirable. Checklists Checklists often have been used to document classroom processes. When using checklists, all of the behaviors of interest must be defined in advance. Additionally, observers (i.e., the ones responsible for checking off observed behaviors on a checklist) need to agree about what constitutes the observed behavior. Thus, categories must not only be defined in advance, but must also be specified as clearly as possible so that the observers check the appropriate entry. Typically, checklists are completed by outside observers, which makes this

OCR for page 229
236 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING method different from those already discussed. In this way, checklists resemble the narrative descriptions of classroom observations, which we discuss later. However, this data form resembles the other forms of survey data in that the questions to be examined generally are already known before the data are collected. To lay out more clearly the data that can be obtained with observational checklists, we provide a brief description of two well-known investigations that have relied on this method. As a first example, Brophy and Evertson (1976) had observers note each time a specified behavior occurred, such as teacher praise for a student' s good response. From their observations and analyses, they concluded that teachers whose students had the highest achievement treated their students in a businesslike and task-oriented manner. As a second example, Stigler et al. (1987) had observers in three countries check when certain classroom behaviors and certain features of classroom organization were present. Their conclusions centered around the idea that whole-class instruction means that every student received some instruction, and teachers who relied heavily on individualized instruction had some students who, basically, were never taught. Both of these examples illustrate that checklists can provide a general snapshot of classroom life. Uses of and Outcomes from These Methods Survey methods are used to assess many variables related to instruction and life in classrooms. One reason these methods are used so frequently is that they are easy to use. With these methods it is easy to measure curriculum content. For example, researchers can read through teacher plan books or diaries kept for the purpose of noting what topics were covered and easily judge what was and was not taught. It is also easy to measure the amount and pace of instruction. For example, researchers can ask teachers in an interview which pages in the text were covered and can use a questionnaire to ask how much time was spent in instruction. It is also easy to measure the format of instruction. For example, researchers can ask teachers to check each form that was used on each day of instruction (lecture, small-group work, etc.~. More significantly, given the concerns motivating the present paper, these methods can even be used to measure classroom processes. For example, we can ask teachers in a questionnaire whether the questions they asked their students required short answers or reflection and abstraction; we can ask whether the students responded only to the teachers' requests or whether the students pro- vided substantive contributions without teacher prompts. In short, researchers have used these methods successfully to document a wide array of classroom features. These methods typically have been used and analyzed in the process- product approach to classroom investigation (e.g., Brophy and Good, 1986~. In general, the process-product approach assesses classroom processes or their proxies and relates these to student outcomes.

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 237 In addition, we note that these methods are typically used to test theories. Because survey methods must generate categories and items before the data are collected, the categories and items necessarily reflect a theoretical bias. The data collected in surveys can, for example, support or call into question a relationship that a theory would predict. In this way survey data can tell us when a theory cannot be supported and thus when a new theory is called for. Evaluation These methods of collecting data are used frequently, in part because they can be used on a wide scale: they are easy to administer and easy to analyze relative to other methods. The ease associated with collecting survey data makes these methods the most widely used for gathering data on classrooms. The difficulty and costliness of other methods have sometimes made them prohibitive altogether or at least have limited the number of classrooms that could be included for study (we document these more fully as these other methods are discussed). Burstein et al. (1995:35) say that "there is still much that survey data can tell us about instructional strategy. Survey data can describe the major dimensions of classroom processes and how they vary across course levels and types of schools. National survey data, collected periodically, can document trends in teachers' use of generic instructional strategies. Such information is important for determining whether or not teaching is changing in ways consistent with the expectations of curriculum reformers and policymakers." For these reasons we imagine that the NAEP could collect and productively use these sorts of data. Of course, with any method there are drawbacks. We see three major draw- backs to the methods just described: (1) These methods leave open many threats to validity; (2) Most significant among these threats is a lack of shared language; and (3) These methods rarely contribute to generation of new ideas and thereby do not prominently contribute to national discussion. We discuss each method in turn. Problems of Validity Probably the most serious problem with survey methods is that responses often are not accurate, thereby making them not valid. In many instances, typical paper-and-pencil survey instruments are not to be trusted be- cause teachers are fallible human beings and may easily forget what they have done or unwittingly skew their responses based on their individual biases. We do not mean to say that teachers are not to be trusted. What we mean is that it is sometimes difficult to produce accurate responses. In particular, it is difficult to be precise about certain behaviors. This prob- lem was made clear by some careful work (Mayer, 1999) on the reliability of these methods. Mayer (1999:43) writes: "We cannot rely on the individual survey questions to assess the amount of time . . . teachers use specific practices . . . because the teachers do not report their practices in a consistent manner.

OCR for page 229
238 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING Thus, the portrait of specific practices conveyed by the survey is unreliable and therefore invalid." It is much more reasonable to ask teachers what they believe than exactly what they do or how they have impacted their students with what they have done. For example, imagine how hard it would be to be precise about whether you had conveyed the concept of equivalent fractions primarily with questions, explanations, or examples. Imagine the further difficulty of knowing which of these three methods of instructional practice had the greatest positive influence on students' understanding of equivalent fractions. Mayer (1999:43) investigated this directly by comparing teachers' responses on surveys to classroom observations of these teachers. He found that "low reliability existed for most of the practice items [i.e., items intended to measure teachers' practices] examined in this study." In short, surveys probably could never give us reliable and detailed data about classroom practice. And without reliability we cannot claim to have validly measured their behaviors. A cousin to this problem is that those who respond to surveys are often tempted to answer questions as they imagine the researchers would like them to be answered, rather than with accuracy and honesty (e.g., Burstein et al., 1995; Cohen, 1990), thus making these methods susceptible to problems of social desir- ability. For example, with the recent implementation of reform-based standards, teachers are increasingly aware that their practice should reflect these standards. However, their practice may lag behind their knowledge of these standards, and so they honestly respond about what they know about the standards, even though their knowledge may not be reflected in their practice, thus making their responses on surveys inaccurate (i.e., not valid). Although reliability is clearly a problematic aspect of relying on survey methods for documenting classroom processes, the reliability of constructs mea- sured by surveys increases when multiple, rather than single, items are used to measure constructs (e.g., Light et al., 1990; Mayer, 1998; Shavelson et al., 1986~. As Mayer (1999:43) writes: "Individual indicators of limited reliability can be grouped into a highly reliable indicator." The point here is that if we can get at a potentially important behavior with multiple approaches (e.g., use observa- tional checklists to determine which instructional strategies were used and follow them up with interviews to learn more about how often they are used and under what conditions) or multiple items on the same measure, we are more likely to avoid problems with reliability and validity than if we rely on a single item or a single measure. Thus, we would recommend that if the NAEP were to include survey measures of teacher behavior, multiple measures should be used. Lack of Shared Language Related to the problem of not obtaining a valid picture of classroom practices with typical paper-and-pencil survey instruments is that these instruments require an evaluation of whether teachers understand the items in the way they were intended. However, for this we need a common language that we really do not have. As Burstein et al. (1995:35) put it: "Surveys typically

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 239 cannot capture the subtle differences in how teachers define and use different techniques." For example, what one teacher means when she agrees with the item "we had a discussion" may be very different from what another teacher means when he agrees with the same item. Even something as specific as "We folded paper to demonstrate equivalent fractions" is open to multiple, potentially inconsistent interpretations (Was the paper a square or a rectangular shape to begin with? How many folds were used?), thus rendering responses invalid, even to specific descriptions. This notion is corroborated by Palincsar and her colleagues (1998), who argue that teachers' professional development should be constructed as a "com- munity of practice." They argue that this model deals head on with two pervasive problems in the culture of American schoolteachers: "(a) the lack of consensus regarding the goals and means of education . . . and (b) the private, personal, and individualistic nature of teaching . . . which deprives teachers of collegial and intellectual support (Little, 1992~." In other words, Palincsar et al. believe that if examples are collected and used for discussion, a common language can be developed for teaching. Besides the inherent problems associated with not hav- ing a common language when teachers respond to survey items, we note that having a common language is the first critical step toward improvement and change. In this case, a common language would enable teachers to share ideas; teachers cannot be expected to implement and evaluate new practices until this takes place. Failure to Contribute to New Ideas Third, and perhaps most importantly, these sorts of data rarely if ever contribute to the discussion of improving practice and outcomes. Why not? Because to improve practice concrete new ideas about classroom practice are needed. Without these, we cannot expect the dialogue about classroom practice to move forward productively. And, of course, all of the methods we have discussed thus far have the questions, issues, and items defined before any data are collected, thus limiting or excluding altogether the possibility of producing new, heretofore unimagined ideas about classroom prac- tice. In this way, survey data are much better suited to supporting or questioning existing theory than developing new theory. However, this must be qualified: when theories are not supported by data, researchers are placed in a position to refine, revise, or generate new theory. In this way, survey data have the potential to contribute to theory. Currently, most data on classroom practice can only tell us if what we want to see in teachers' practice is there or not because people (researchers, policy makers, administrators, etc.) have predefined what should happen. Thus, these data can tell us what is not working but cannot help generate new ideas for improvement. To generate new ideas for improvement, we would need to obtain data that permit the development of a shared language to refer to concrete

OCR for page 229
254 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING examples, a dialogue could be built about which of these lessons were good and why. In this scenario we would not have to worry as much as we do with other data sources that our language about these lessons is not understood by others: if we all watch the same lesson, for example, using folded paper to show equivalent fractions, we will know exactly how the paper is folded and how it is marked. In sum, video data can provide a shared set of examples for building language and theories for analyzing classroom practices. Data Needed to Test and Validate Theoretical Models The data most useful to policy makers are probably those that say whether or not teachers have implemented the stated policy and, if so, what the impact of the implementation has been on student achievement. This can then be related to student achievement data: if students perform well, the policy should remain; if students perform poorly, the policy should be revised. Thus, the first concern for policy makers is to know whether policy is being implemented. If the stated policy is indeed being implemented, it is also important to know how it was implemented. Here is an example of this issue: the National Council of Teachers of Mathe- matics recommends that students participate in mathematical discussions. Among the many reasons for making this suggestion is that research has told us that students learn better when they participate actively than when they are passively taking in what the teacher tells them. To see whether insisting on discussions is indeed a good policy to be recommended to all teachers, we would want to know how frequently and how well teachers engaged their students in mathematical discussions, especially in relation to the amount of time teachers expect their students to be more passive (e.g., when the teacher stands at the front of the room and explains to the students what she wants them to know). When we know the absolute and relative amounts of time spent in mathematical discussions versus just listening to the teacher, we can relate these to student outcomes. How do we get these data? We can imagine several scenarios, but for this sort of question we suggest that none involve teacher self-reports because teachers cannot possibly teach and note when they are using different instructional tech- niques and also report how much time they spent in these episodes. Thus, we recommend videotaped observations because they permit a careful and relatively accurate measure of what teachers do and do not do in their classrooms. We also acknowledge that different types of data may be necessary to test theoretical models of teaching and learning than the types of data used to develop the models. For example, we can use videotaped records of classroom instruction to develop ideas about what might facilitate learning and then test these ideas using experimental methods. As an example, Flevares and Perry (2000) dis- covered that teachers vary their presentations of nonverbal information to accom- pany the verbal content and activities in a lesson. From this discovery, they

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 255 hypothesized that the naturally occurring nonverbal information may be crucial to learning the lesson content. At this point, Flevares and Perry (1999) are systematically presenting the same lesson content in verbal form but varying the nonverbal forms and then measuring learning outcomes. Eventually, they expect to understand which nonverbal forms aid learning of different concepts. We also wish to make the point that even when we have what we believe is a good policy, video data can clarify the policy. This point is important because policy, such as that reflected in standards, is typically vague. When policy is vague, it leaves plenty of room for interpreting and misinterpreting. As Cohen (1990:313) puts it, "The [California] framework's mathematical exhortations were general; it offered few specifics about how teachers might respond, and left room for many different [implied: some bad] responses." Thus, we suggest that clear examples, especially those derived from videotaped observations, not only allow the development of a shared language about what practices actually reflect policy and which do not but can hone and clarify the policy. In sum, a wide array of data forms may be necessary to test models of the effects of policy and to test theories of teaching and learning. Data Needed as Basis for Communicating to the Public Finally, we raise the point that data are also needed to communicate what has been learned to the public. What sorts of data are these? Of course, the answer depends on the type of data that best illustrate what we have learned. Here is a simple example: if we have learned that teachers who spend a great deal of time learning about a new curriculum do a better job of teaching it than teachers who spend little time learning about the new curriculum, we simply need to present the average number of hours spent in training of the teachers whose students learned the material well compared to the teachers whose students did not. Let's turn to a more complex example. If we learn that stating the goal of a lesson in a clear fashion at the beginning of a lesson facilitates students' under- standing of the lesson's content, we may need demonstrations of different teachers stating the goal of their lessons. Data of this sort would allow the public to get a sense of how powerful these opening goal statements can be, especially when these are compared to other teachers' opening statements, which do not include goal statements. The general point we wish to make is that the data we share with the public need to be accessible and the data need to communicate or demonstrate clearly what can be learned. Recommendations Classroom process data relevant to the needs of researchers and policy makers are scant. In general we need more data of all kinds that can feed information from the classroom back into the research and policy process. Specifically,

OCR for page 229
256 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING however, we stress the need to expand our data collection efforts beyond tradi- tional surveys. We recommend three new initiatives. First, we desperately need to collect more data on how policies are imple- mented and their effectiveness inside classrooms. We need to know whether policies are implemented or not, and we need to understand the conditions under which they succeed or fail. Student outcome data must be linked into this effort, but outcome data alone will not be enough to understand how policies work. In particular, we propose that video surveys be used, in conjunction with more traditional surveys, to study classroom processes. Through questionnaires we can find out, for example, about teachers' opportunities to learn about new poli- cies or new curricula. Through video surveys we can see what the new policy or curriculum looks like as it is implemented in classrooms. Clearly, both kinds of information are needed if we want to understand the mechanisms by which policy affects teaching and learning. Second, apart from policy, we should conduct video studies to aid in the development of theories of teaching and to validate survey instruments. Video data are especially useful for theory generation. Recall the example we presented earlier in which we discussed "describe/explain" questions. Japanese teachers asked their students to describe complete problem solutions, whereas U.S. teachers asked students to present and justify single steps in a solution. Given that Japanese students outperform their U.S. peers, we could use this information to advance our theories of learning. In particular, we could hypothesize that it is not enough to retell one portion of a problem's solution and have others tell about other portions. Instead, for deep learning to take place, students may need to put their explanations in the context of whole-problem solutions. This hypothesis, gener- ated from video data, could be tested experimentally. Video records also allow for validation of other instruments (see, e.g., Mayer, 1998~. Aside from general surveys, we can think of two kinds of data collection efforts that would be especially valuable. One would be the establishment of a national sample of "indicator" districts or schools that could serve as a testbed for developing theories of teaching and new survey instruments. We would propose to collect all sorts of data in these schools, including, but not limited to, achieve- ment data, survey data (from teachers, students, parents, and administrators), and videotaped observations of lessons. In these settings, quantitative data could be linked with rich contextual data to yield important insights. Moreover, with the availability of multiple indicators and videotaped records, new theoretical ideas could be explored. Another important use of video would be to study special classrooms: either those in which students have been shown to learn a great deal or those in which new or experimental teaching techniques are being used. Such data would not only advance our understanding of what works in classrooms but also provide guidance to teachers about what the process of changing teaching can look like. Examples of teachers who are in the process of changing allow other teachers to

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 257 see what it is like to have mixed (i.e., new and old) practices (e.g., Cohen, 1990) and can provide teachers with direct knowledge of what may be problematic in adopting something new. In addition, examples of teachers who have accom- plished a successful change can provide a model, replete with explicit tactics for instructional success. Our point is simply that special cases may well be more useful than random samples in advancing our knowledge of teaching and how to . . Improve it. Our third recommendation is to conduct international studies in order to increase our exposure to novel variations in teaching practices. New ideas are essential if we are to improve teaching. Systems, and individuals, have a difficult time learning without a steady diet of variability (Siegler, 1996~. Innovations, alternative images, different ways of doing things, and new information are all needed to create new experiences from which the system can learn (Stigler and Hiebert, 1999~. Looking across cultures can be an especially useful source of new ideas about what is possible in classrooms, but only if we use research methods that can spot what is new. Questionnaires are not well suited to this goal because on them teachers can only answer the questions the researchers were clever enough to ask. Video data, especially those that are collected outside our own country, can serve this function of generating new ideas and new hypotheses about teaching. DATA FOR CLASSROOM PRACTITIONERS We have described the role that data can play in helping researchers and policy makers understand the chain of influence that relates policy to classroom practice to student learning. But what about classroom teachers? What role can data play, if any, in teachers' efforts to improve their own practice? The traditional view is that teachers can use the findings from research, and the recommendations of policy makers, to improve their teaching. So, for ex- ample, teachers are assumed to read documents such as the NCTM Professional Standards for Teaching Mathematics and be able to use the recommendations therein as a guide for improvement. Recent data and a lot of experience suggest, however, that teaching is not easily changed by having teachers read such docu- ments (e.g., Stigler and Hiebert, 1997~. The reason, we believe, is that general research findings, because they are general, are not situated in the complexities of classroom life. As we pointed out earlier, there are few features of instruction that are always desirable or always undesirable; it depends on the lesson context. We propose an alternative to the traditional view. Because teaching is so complex, general research findings will have limited applicability to the improve- ment of practice. Such findings can serve as a guide, but they will not be sufficient. Teachers need a different kind of knowledge as well, knowledge we might refer to as localized theories grounded in practice. Teachers themselves will be the ones to develop this kind of knowledge.

OCR for page 229
258 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING What Teachers Need to Know to Improve Practice Much has been written about what teachers need to know to perform their craft (e.g., Shulman, 1986~. We will not review that literature here except to point out that there is a marked difference between the kind of knowledge teachers use, as indicated by post hoc analysis, and the kind of knowledge teachers have available in their quest to become better teachers. Most attempts to improve teaching through workshops, courses, and so forth, provide knowledge that is of limited relevance in the classroom. On the one hand, teachers are exposed to theories, generated by researchers, that are decontextualized and difficult to link to classroom practice. On the other hand, teachers are given models or examples of what they "should do" in their classrooms and asked to copy them. But in these cases the examples are not grounded in theory and thus are not easily adaptable in local classroom contexts. Our view is that teachers, to improve their practice, need a kind of knowl- edge that has been in short supply to this point: theories linked with examples. This is what we mean by localized theories of teaching. To be useful, such knowledge needs to be organized around curricular goals and needs to be pack- aged in units that are shareable across teachers and classrooms. Currently we have no means of generating this kind of knowledge, no means of accumulating and storing this knowledge, and no mechanism for sharing this knowledge across teachers. A major goal of data collection about teaching, therefore, should be to produce data that can contribute to producing theories of teaching linked with examples, and that can help in the accumulation and sharing of this knowledge. Role of Data for Improving Teaching We believe that teachers must play a central role in the generation of local- ized theories of teaching and learning in classrooms. Teachers are the ones with the best access to relevant information about classrooms, and they are in the best position to evaluate the validity of localized theories. In addition, there are many more teachers in the country than there are educational researchers. Unless teachers are involved in a central way in this process, progress will be exceed- ingly slow. Of course, it will take more than data to engage teachers in this process, but data can play a central role. Generating localized theories of teaching will require prolonged reflection and discussion of examples of classroom practice. Video can play a central role in these discussions because it allows what is normally a complex and transitory phenomenon to be slowed down and replayed for study. The theoretical descrip- tions of teaching that can result from analysis of classroom videos will naturally be linked to actual examples of classroom practice. Thus, what teachers learn from joint analysis of such examples will be easier to situate in terms of their own classrooms. The collaboration is important, too, for it means that teachers will be

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 259 developing a shared language for describing the events and activities they see on video. This shared language is critical as it becomes the foundation on which localized theories of teaching can be stored, accessed, and communicated about with other teachers. In the process we envision by which teachers could use classroom videos, it is interesting to ponder what kinds of examples ought to be collected. Some might think that the most important videos to analyze would be those that teachers collect in their own classrooms (see, e.g., Lampert and Ball, 1998~. Although there certainly is a place for such examples in the teacher development process, they are by no means the only or even the most important examples. Because teaching is a cultural activity, and because variation in teaching methods might therefore be limited in a single culture, it is probably most important that teachers gain exposure to genuine alternatives, examples that depart significantly from what they are accustomed to seeing. Even risking possible misinterpretation, videos of lessons from other cultures, and videos of lessons in which serious efforts to reform are evident, would be a high priority for teachers because these present clear alternatives to typical and/or culture-bound lessons. For teachers, contextual data about the lessons taped are even more critical than for researchers and policy makers. Teachers need to know what happened yesterday and what the students knew and understood before the lesson started. Test data and interview data from students both before and after a lesson would be highly relevant to teachers' analyses. Interviews with the teacher on the video would also be important, especially questions that elicit from the teacher explana- tions of what she or he was intending to accomplish with each part of the lesson. For teachers, the key is not sampling: lessons need not be representative, and the number of lessons need not be large. What is important is that the cases be selected to expand and inform teachers' developing understandings of teaching and learning in classrooms. Finally, there is one more function that can be served by access to video examples. As noted by Cohen and Hill (1998), analysis of the possibilities exemplified by other teachers can provide a powerful incentive for teachers to improve their own teaching. We are reminded of the beginning Japanese teacher described by Lewis and Tsuchida (1997) who broke down in tears after watching one of her senior colleagues teach a science lesson. She explained that she thought the other teacher was so skilled that she felt badly for her own students, who, through the luck of the draw, ended up in her class. The result was a strong feeling of wanting to improve, coupled with concrete images of what improved teaching might look like. Recommendations Teachers can videotape themselves at the local level, but the federal govern- ment can play an important role in collecting, and then giving teachers access to,

OCR for page 229
260 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING variant examples of teaching in different cultures, different subject areas, and so forth. The federal government also can document and collect examples from teachers who are unique, either through some special talent or through participa- tion in systematic programs of reform. The National Center for Education Statistics also should consider accumulat- ing examples into a national database of video cases that could be accessed by teachers over the Internet. If rules were established to control quality, it would be possible to build and maintain a database to which classroom teachers could add their own examples. Nothing would do as much as such a database to facilitate the development and sharing of curriculum-based localized theories of teaching. VIDEO AND THE EXISTING NAEP Having discussed new methods of studying classroom processes and having thought through how data on classroom processes might be used by different audiences to improve teaching, we return to the question of the NAEP. In particular, we wish to address the issue of how new methods, particularly video, might be used in conjunction with the existing NAEP. The primary focus of NAEP has been on student achievement. For more than a quarter of a century, NAEP has documented national trends in what students know and are able to do in various academic subject areas. Yet there has also been a growing interest in documenting changes in the context of achieve- ment at a national level. Student and teacher questionnaires are now included in the NAEP as a means of measuring everything from student demographics to teacher preparation, instructional practices, school policies, and out-of-school activities. We believe that video surveys can be integrated into the NAEP framework and that they can contribute greatly to the study of instructional practices over time. Of course, it is not feasible to videotape in every classroom included in NAEP, but collecting video records of lessons in a substantial subsample of NAEP classrooms is both practical and useful. Using techniques similar to those in the TIMSS video study, videotaping in national samples of classrooms can provide the first reliable means of tracking changes in instructional practices over time. Meanwhile, before data can be accumulated on instructional trends, video surveys can provide a means of studying the classroom mediators of such vari- ables as race and social class. For example, NAEP already provides a means of tracking racial gaps in achievement over time. But are such gaps correlated with gaps in teaching quality and instructional practices? Video records would clearly be the best means of asking such a question, especially over time. One way to implement such an effort would be to send videographers around the country, much as was done in TIMSS. But another possibility is even more intriguing: just as the Nielsen ratings measure television viewing by placing continuous monitoring devices in a sample of homes, NAEP could place video

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 261 cameras in a sample of classrooms and conduct continuous monitoring of class- room processes. This idea is not as farfetched as it sounds. Cameras are cheap, and the technology for connecting them to the Internet also is cheap. It would not be necessary to record all of the camera images. Instead, sampling plans could be devised to get valid and reliable pictures of what goes on inside classrooms. If NAEP assessments could be administered more frequently in this subsample of classrooms for example, three times a year we would have the best data ever available for studying the relation of instruction and learning inside real class- rooms. This idea is feasible and should be considered seriously. Another use of video surveys in NAEP should be to aid in the development and validation of better traditional measures of classroom practices such as ques- tionnaires. A well-designed sample of video data could serve both immediate research purposes and instrument development purposes, provided the two are integrated in their conception and design. It may be that some aspects of class- room practice are well measured by questionnaires, but validity studies to docu- ment this possibility are scant. Over time, using video in the development of questionnaires will increase the power of both methods of studying classroom practice. One way to approach this goal is to fund the development of a thesaurus of teaching practices. The problem of developing a shared language for indexing complex materials is a common one in library and information science. Library scientists have resolved the problem by relying on thesauruses, the meanings of which are painstakingly developed over time. Using similar techniques, we propose a project in which researchers, subject-matter specialists, teachers, and the public contribute to constructing a thesaurus of teaching practices linked with video examples. We believe that such a thesaurus could provide a foundation for developing new measures of instructional processes. Yet another use of videos collected as part of NAEP would be in the commu- nication of study results to the public. Although testing of student achievement is a complex and difficult task, the public nevertheless has some intuitive sense of what achievement tests measure. Moreover, achievement measures themselves have been validated over many years. The study of instructional practices is different on both counts. There is little agreement as to what the basic constructs are, and, as noted earlier, we lack a public vocabulary for describing teaching practices. Not only do teachers need to develop such a vocabulary if question- naires are ever to be a useful means of studying classroom practice, but the public must do so as well if it wants to understand the information collected about classroom practices. In terms of cost, we reiterate the fact that the cost of video data primarily resides in the analysis phase, not in the collection. For this reason we encourage the collection of larger quantities of video data, even if funds are insufficient to support in-depth analyses. Our reasoning is that an archive of nationally repre- sentative videos will become more and more valuable over time. Imagine if we had video data of instructional practices over the past 100 years. It would not be

OCR for page 229
262 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING the analyses of 100 years ago that would interest us but the opportunity for analysis now. Education is a field in which many "facts" are never really estab- lished as such, most especially those that pertain to the way things "used to be." Solid data from classrooms can play a key role in mediating and dampening the polarization that characterizes most educational debate in this country. CONCLUSION Data on classroom processes are critical if we are to improve education, either through policy channels, research, or teacher professional development. All attempts to improve education must, if they are to work, pass through the final common pathway that is the classroom. If we fall to collect information on what is happening in classrooms, we risk missing the key processes that could effect change. But simply collecting data is not enough. We must, before we collect any data at all, develop an understanding of how the data will be used, and by whom, to improve education. We have ruminated on how classroom process data might be used by policy makers, researchers, and classroom practitioners, but this is only the beginning. The way data are used is a subject of study in and of itself. We need more empirical studies of this process. We also need to realize that there are multiple models of data use, and so we must be flexible in collecting the data we need for different purposes. REFERENCES Barr, R., and R. Dreeben 1983 How Schools Work. Chicago: University of Chicago Press. Brophy, J., and C. Evertson 1976 Learning from Teaching: A Developmental Perspective. Boston: Allyn and Bacon. Brophy, J., and T.L. Good 1986 Teacher behavior and student achievement. In Handbook of Research on Teaching, M.C. Wittrock, ed. New York: MacMillan. Burstein, L., L.M. McDonnell, J. Van Winkle, T. Ormseth, J. Mirocha, and G. Guitton 1995 Validating National Curriculum Indicators. Santa Monica: RAND Corp. California State Department of Education 1985 Mathematics Framework for California Public Schools: Kindergarten Through Grade 12. Sacramento: California State Department of Education. Cohen, D.K. 1990 A revolution in one classroom: The case of Mrs. Oublier. Educational Evaluation and Policy Analysis 12:311-329. 1995 What is the system in systemic reform? Educational Researcher 24(9):11-17, 31. Cohen, D.K., and H.C. Hill 1998 Instructional Policy and Classroom Performance: The Mathematics Reform in California. Paper presented at the NCTM Research Presession, April, Washington, D.C. Fernandez, C. 1994 Students' Comprehension Processes During Mathematics Instruction. Unpublished doc- toral dissertation, University of Chicago.

OCR for page 229
JAMES W. STIGLER AND MICHELLE PERRY 263 Flevares, L.M., and M. Perry 1999 Seeing what place value means: Building students' understanding through nonverbal rep- resentations. Poster presented at the biennial meeting of the Society for Research in Child Development, April, Albuquerque. 2000 How many do you see? The use of nonspoken representations in first-grade mathematics lessons. Manuscript under review for publication. Gallimore, R. 1996 Classrooms are just another cultural activity. Pp. 229-250 in Research on Classroom Ecologies: Implications for Inclusion of Children with Learning Disabilities, D.L. Speece and B.K. Keogh, eds. Mahwah, N.J.: Lawrence Erlbaum Associates. Lampert, M.L., and D.L. Ball 1998 Teaching, Multimedia, and Mathematics: Investigations of Real Practice. New York: Teachers College Press. Leighton, M.S. 1994 Measuring Instruction: The Status of Recent Work. Unpublished manuscript, Policy Studies Associates, Inc., Washington, D C Leinhardt, G., and R.T. Putnam 1987 The skill of learning from classroom lessons. American Educational Research Journal 24:372-387. Lewis, C., and I. Tsuchida 1997 Planned educational change in Japan: The shift to student-centered elementary science. Journal of Education Policy 12(5):313-331. Light, R.J., J.D. Singer, and J.B. Willett 1990 By Design: Planning Research on Higher Education. Cambridge,MA: HarvardUniver- sity Press. Little, J.W. 1992 Opening the black box of professional community. Pp. 157- 178 in The Changing Contexts of Teaching, A. Lieberman, ed. Chicago: University of Chicago Press. Mandel, D.R. 1996 Teacher education, training, and staff development: Implications for national surveys. Pp. 3-29 to 3-42 in From Data to Information: New Directions for the National Center for Education Statistics, G. Hoachlander, J.E. Griffith, and J.H. Ralph, eds. Washington, D.C.: U.S. Department of Education. Mayer, D.P. 1999 Measuring instructional practice: Can policy makers trust survey data? Educational Evaluation and Policy Analysis 21:29-45. Palincsar, A.S., S.J. Magnusson, N. Marano, D. Ford, and N. Brown 1998 Designing a community of practice: Principles and practices of the GIsML community. Teaching and Teacher Education 14(1):5-19. Perry, M. 1988 Problem assignment and learning outcomes in nine fourth-grade mathematics classes. Elementary School Journal 88:413-426. Rosenshine, B., and N. Furst 1973 The use of direct observation to study teaching. In Second Handbook of Research on Teaching, R.M.W. Travers, ed. Chicago: Rand McNally. Shavelson, R.J., N.M. Webb, and L. Burstein 1986 Measurement of teaching. Pp. 50-91 in Handbook of Research on Teaching, Third Edition, M.C. Wittrock, ed. New York: MacMillan. Shulman, L.S. 1986 Paradigms and research programs in the study of teaching: A contemporary perspective. Pp. 3-36 in Handbook of Research on Teaching, Third Edition, M.C. Wittrock, ed. New York: MacMillan.

OCR for page 229
264 CLASSROOM PROCESS DATA AND THE IMPROVEMENT OF TEACHING Siegler, R.S. 1996 Emerging Minds. New York: Oxford University Press. Stigler, J.W. 1996 Large-scale video surveys for the study of classroom processes. Pp. 7.1 to 7.29 in From Data to Information: New Directions for the National Center for Education Statistics, G. Hoachlander, J.E. Griffith, andJ.H. Ralph, eds. Washington, D.C.: U.S. Department of Education. Stigler, J.W., and C. Fernandez 1995 Learning mathematics from classroom instruction: Cross-cultural and experimental per- spectives. Pp. 103-130 in Basic and Applied Perspectives on Learning, Cognition, and Development, C.A. Nelson, ed. Mahwah, N.J.: Lawerence Erlbaum Associates. Stigler, J.W., and J. Hiebert 1997 Understanding and improving classroom mathematics instruction: An overview of the TIMSS video study. Phi Delta Kappan 79(Sept.): 1, 14-21. 1999 The Teaching Gap: What Teachers Can Learn from the World's Best Educators. New York: Free Press. Stigler, J.W., S.Y. Lee, and H.W. Stevenson 1987 Mathematics classrooms in Japan, Taiwan, and the United States. Child Development 58: 1272-1285. Stigler, J.W., and M. Perry 1988 Mathematics learning in Japanese, Chinese, and American classrooms. Pp. 27-54 in Children's Mathematics, New Directions for Child Development, G.B. Saxe and M. Gearhart, eds. San Francisco: Jossey-Bass.