5
Designs for the Conduct of Scientific Research in Education

The salient features of education delineated in Chapter 4 and the guiding principles of scientific research laid out in Chapter 3 set boundaries for the design and conduct of scientific education research. Thus, the design of a study (e.g., randomized experiment, ethnography, multiwave survey) does not itself make it scientific. However, if the design directly addresses a question that can be addressed empirically, is linked to prior research and relevant theory, is competently implemented in context, logically links the findings to interpretation ruling out counterinterpretations, and is made accessible to scientific scrutiny, it could then be considered scientific. That is: Is there a clear set of questions underlying the design? Are the methods appropriate to answer the questions and rule out competing answers? Does the study take previous research into account? Is there a conceptual basis? Are data collected in light of local conditions and analyzed systematically? Is the study clearly described and made available for criticism? The more closely aligned it is with these principles, the higher the quality of the scientific study. And the particular features of education require that the research process be explicitly designed to anticipate the implications of these features and to model and plan accordingly.

RESEARCH DESIGN

Our scientific principles include research design—the subject of this chapter—as but one aspect of a larger process of rigorous inquiry. How-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 97
Scientific Research in Education 5 Designs for the Conduct of Scientific Research in Education The salient features of education delineated in Chapter 4 and the guiding principles of scientific research laid out in Chapter 3 set boundaries for the design and conduct of scientific education research. Thus, the design of a study (e.g., randomized experiment, ethnography, multiwave survey) does not itself make it scientific. However, if the design directly addresses a question that can be addressed empirically, is linked to prior research and relevant theory, is competently implemented in context, logically links the findings to interpretation ruling out counterinterpretations, and is made accessible to scientific scrutiny, it could then be considered scientific. That is: Is there a clear set of questions underlying the design? Are the methods appropriate to answer the questions and rule out competing answers? Does the study take previous research into account? Is there a conceptual basis? Are data collected in light of local conditions and analyzed systematically? Is the study clearly described and made available for criticism? The more closely aligned it is with these principles, the higher the quality of the scientific study. And the particular features of education require that the research process be explicitly designed to anticipate the implications of these features and to model and plan accordingly. RESEARCH DESIGN Our scientific principles include research design—the subject of this chapter—as but one aspect of a larger process of rigorous inquiry. How-

OCR for page 97
Scientific Research in Education ever, research design (and corresponding scientific methods) is a crucial aspect of science. It is also the subject of much debate in many fields, including education. In this chapter, we describe some of the most frequently used and trusted designs for scientifically addressing broad classes of research questions in education. In doing so, we develop three related themes. First, as we posit earlier, a variety of legitimate scientific approaches exist in education research. Therefore, the description of methods discussed in this chapter is illustrative of a range of trusted approaches; it should not be taken as an authoritative list of tools to the exclusion of any others.1 As we stress in earlier chapters, the history of science has shown that research designs evolve, as do the questions they address, the theories they inform, and the overall state of knowledge. Second, we extend the argument we make in Chapter 3 that designs and methods must be carefully selected and implemented to best address the question at hand. Some methods are better than others for particular purposes, and scientific inferences are constrained by the type of design employed. Methods that may be appropriate for estimating the effect of an educational intervention, for example, would rarely be appropriate for use in estimating dropout rates. While researchers—in education or any other field—may overstate the conclusions from an inquiry, the strength of scientific inference must be judged in terms of the design used to address the question under investigation. A comprehensive explication of a hierarchy of appropriate designs and analytic approaches under various conditions would require a depth of treatment found in research methods textbooks. This is not our objective. Rather, our goal is to illustrate that among available techniques, certain designs are better suited to address particular kinds of questions under particular conditions than others. Third, in order to generate a rich source of scientific knowledge in education that is refined and revised over time, different types of inquiries and methods are required. At any time, the types of questions and methods depend in large part on an accurate assessment of the overall state of knowl- 1   Numerous textbooks and treatments map the domain of design (e.g., Kelly and Lesh, 2000) for the various types of inquiries in education. We refer to several of the seminal works on research methodology throughout the chapter.

OCR for page 97
Scientific Research in Education edge and professional judgment about how a particular line of inquiry could advance understanding. In areas with little prior knowledge, for example, research will generally need to involve careful description to formulate initial ideas. In such situations, descriptive studies might be undertaken to help bring education problems or trends into sharper relief or to generate plausible theories about the underlying structure of behavior or learning. If the effects of education programs that have been implemented on a large scale are to be understood, however, investigations must be designed to test a set of causal hypotheses. Thus, while we treat the topic of design in this chapter as applying to individual studies, research design has a broader quality as it relates to lines of inquiry that develop over time. While a full development of these notions goes considerably beyond our charge, we offer this brief overview to place the discussion of methods that follows into perspective. Also, in the concluding section of this chapter, we make a few targeted suggestions for the kinds of work we believe are most needed in education research to make further progress toward robust knowledge. TYPES OF RESEARCH QUESTIONS In discussing design, we have to be true to our admonition that the research question drives the design, not vice versa. To simplify matters, the committee recognized that a great number of education research questions fall into three (interrelated) types: description—What is happening? cause—Is there a systematic effect? and process or mechanism—Why or how is it happening? The first question—What is happening?—invites description of various kinds, so as to properly characterize a population of students, understand the scope and severity of a problem, develop a theory or conjecture, or identify changes over time among different educational indicators—for example, achievement, spending, or teacher qualifications. Description also can include associations among variables, such as the characteristics of schools (e.g., size, location, economic base) that are related to (say) the provision of music and art instruction. The second question is focused on establishing causal effects: Does x cause y? The search for cause, for example,

OCR for page 97
Scientific Research in Education can include seeking to understand the effect of teaching strategies on student learning or state policy changes on district resource decisions. The third question confronts the need to understand the mechanism or process by which x causes y. Studies that seek to model how various parts of a complex system—like U.S. education—fit together help explain the conditions that facilitate or impede change in teaching, learning, and schooling. Within each type of question, we separate the discussion into subsections that show the use of different methods given more fine-grained goals and conditions of an inquiry. Although for ease of discussion we treat these types of questions separately, in practice they are closely related. As our examples show, within particular studies, several kinds of queries can be addressed. Furthermore, various genres of scientific education research often address more than one of these types of questions. Evaluation research—the rigorous and systematic evaluation of an education program or policy—exemplifies the use of multiple questions and corresponding designs. As applied in education, this type of scientific research is distinguished from other scientific research by its purpose: to contribute to program improvement (Weiss, 1998a). Evaluation often entails an assessment of whether the program caused improvements in the outcome or outcomes of interest (Is there a systematic effect?). It also can involve detailed descriptions of the way the program is implemented in practice and in what contexts (What is happening?) and the ways that program services influence outcomes (How is it happening?). Throughout the discussion, we provide several examples of scientific education research, connecting them to scientific principles (Chapter 3) and the features of education (Chapter 4). We have chosen these studies because they align closely with several of the scientific principles. These examples include studies that generate hypotheses or conjectures as well as those that test them. Both tasks are essential to science, but as a general rule they cannot be accomplished simultaneously. Moreover, just as we argue that the design of a study does not itself make it scientific, an investigation that seeks to address one of these questions is not necessarily scientific either. For example, many descriptive studies—however useful they may be—bear little resemblance to careful scientific study. They might record observations without any clear conceptual viewpoint, without reproducible protocols for recording data, and so

OCR for page 97
Scientific Research in Education forth. Again, studies may be considered scientific by assessing the rigor with which they meet scientific principles and are designed to account for the context of the study. Finally, we have tended to speak of research in terms of a simple dichotomy— scientific or not scientific—but the reality is more complicated. Individual research projects may adhere to each of the principles in varying degrees, and the extent to which they meet these goals goes a long way toward defining the scientific quality of a study. For example, while all scientific studies must pose clear questions that can be investigated empirically and be grounded in existing knowledge, more rigorous studies will begin with more precise statements of the underlying theory driving the inquiry and will generally have a well-specified hypothesis before the data collection and testing phase is begun. Studies that do not start with clear conceptual frameworks and hypotheses may still be scientific, although they are obviously at a more rudimentary level and will generally require follow-on study to contribute significantly to scientific knowledge. Similarly, lines of research encompassing collections of studies may be more or less productive and useful in advancing knowledge. An area of research that, for example, does not advance beyond the descriptive phase toward more precise scientific investigation of causal effects and mechanisms for a long period of time is clearly not contributing as much to knowledge as one that builds on prior work and moves toward more complete understanding of the causal structure. This is not to say that descriptive work cannot generate important breakthroughs. However, the rate of progress should—as we discuss at the end of this chapter—enter into consideration of the support for advanced lines of inquiry. The three classes of questions we discuss in the remainder of this chapter are ordered in a way that reflects the sequence that research studies tend to follow as well as their interconnected nature. WHAT IS HAPPENING? Answers to “What is happening?” questions can be found by following Yogi Berra’s counsel in a systematic way: if you want to know what’s going on, you have to go out and look at what is going on. Such inquiries are descriptive. They are intended to provide a range of information from

OCR for page 97
Scientific Research in Education documenting trends and issues in a range of geopolitical jurisdictions, populations, and institutions to rich descriptions of the complexities of educational practice in a particular locality, to relationships among such elements as socioeconomic status, teacher qualifications, and achievement. Estimates of Population Characteristics Descriptive scientific research in education can make generalizable statements about the national scope of a problem, student achievement levels across the states, or the demographics of children, teachers, or schools. Methods that enable the collection of data from a randomly selected sample of the population provide the best way of addressing such questions. Questionnaires and telephone interviews are common survey instruments developed to gather information from a representative sample of some population of interest. Policy makers at the national, state, and sometimes district levels depend on this method to paint a picture of the educational landscape. Aggregate estimates of the academic achievement level of children at the national level (e.g., National Center for Education Statistics [NCES], National Assessment of Educational Progress [NAEP]), the supply, demand, and turnover of teachers (e.g., NCES Schools and Staffing Survey), the nation’s dropout rates (e.g., NCES Common Core of Data), how U.S. children fare on tests of mathematics and science achievement relative to children in other nations (e.g., Third International Mathematics and Science Study) and the distribution of doctorate degrees across the nation (e.g., National Science Foundation’s Science and Engineering Indicators) are all based on surveys from populations of school children, teachers, and schools. To yield credible results, such data collection usually depends on a random sample (alternatively called a probability sample) of the target population. If every observation (e.g., person, school) has a known chance of being selected into the study, researchers can make estimates of the larger population of interest based on statistical technology and theory. The validity of inferences about population characteristics based on sample data depends heavily on response rates, that is, the percentage of those randomly selected for whom data are collected. The measures used must have known reliability—that is, the extent to which they reproduce results. Finally, the value of a data collection instrument hinges not only on the

OCR for page 97
Scientific Research in Education sampling method, participation rate, and reliability, but also on their validity: that the questionnaire or survey items measure what they are supposed to measure. The NAEP survey tracks national trends in student achievement across several subject domains and collects a range of data on school, student, and teacher characteristics (see Box 5-1). This rich source of information enables several kinds of descriptive work. For example, researchers can estimate the average score of eighth graders on the mathematics assessment (i.e., measures of central tendency) and compare that performance to prior years. Part of the study we feature (see below) about college women’s career choices featured a similar estimation of population characteristics. In that study, the researchers developed a survey to collect data from a representative sample of women at the two universities to aid them in assessing the generalizability of their findings from the in-depth studies of the 23 women. Simple Relationships The NAEP survey also illustrates how researchers can describe patterns of relationships between variables. For example, NCES reports that in 2000, eighth graders whose teachers majored in mathematics or mathematics education scored higher, on average, than did students whose teachers did not major in these fields (U.S. Department of Education, 2000). This finding is the result of descriptive work that explores the correlation between variables: in this case, the relationship between student mathematics performance and their teachers’ undergraduate major. Such associations cannot be used to infer cause. However, there is a common tendency to make unsubstantiated jumps from establishing a relationship to concluding cause. As committee member Paul Holland quipped during the committee’s deliberations, “Casual comparisons inevitably invite careless causal conclusions.” To illustrate the problem with drawing causal inferences from simple correlations, we use an example from work that compares Catholic schools to public schools. We feature this study later in the chapter as one that competently examines causal mechanisms. Before addressing questions of mechanism, foundational work involved simple correlational results that compared the performance of Catholic high school students on standardized mathematics tests with their

OCR for page 97
Scientific Research in Education BOX 5-1 National Assessment of Educational Progress Simply collecting data is not in and of itself scientific. It is the rigorous organization and analysis of data to answer clearly specified questions that form the basis of scientific description, not the data themselves. Quantitative data appear in many ways in education research; their most common form of organization is as a “units-by-variables” array. The National Assessment of Educational Progress (NAEP) is an instructive example. This large survey (implemented and maintained by the National Center for Education Statistics) of 4th, 8th, and 12th graders in the United States collects information on a variety of academic subject areas, including mathematics and literacy, from samples drawn from these grades on a regular schedule. There are several types of units*, for example, students and teachers. Information is systematically collected from both students and teachers in areas that are appropriate to each type of unit. For students, NAEP collects data on academic performance as well as background information. Teachers are surveyed about their training and experience and their methods of instruction. The units-by-variables organization of data is important because each row corresponds to all the data for each unit and the columns correspond to the information represented by a single variable across all the units in the study. Modern psychometric methods are available to summarize this complex set of information into reports on student achievement and its relation to other factors. This combination of rigorous data collection, analysis, and reporting is what distinguishes scientific description from casual observation. *   “Unit” is strictly a technical term that refers to the class or type of phenomena being studied, such as student, teacher, or state.

OCR for page 97
Scientific Research in Education counterparts in public schools. These simple correlations revealed that average mathematics achievement was considerably higher for Catholic school students than for public school students (Bryk, Lee, and Holland, 1993). However, the researchers were careful not to conclude from this analysis that attending a Catholic school causes better student outcomes, because there are a host of potential explanations (other than attending a Catholic school) for this relationship between school type and achievement. For example, since Catholic schools can screen children for aptitude, they may have a more able student population than public schools at the outset. (This is an example of the classic selectivity bias that commonly threatens the validity of causal claims in nonrandomized studies; we return to this issue in the next section.) In short, there are other hypotheses that could explain the observed differences in achievement between students in different sectors that must be considered systematically in assessing the potential causal relationship between Catholic schooling and student outcomes. Descriptions of Localized Educational Settings In some cases, scientists are interested in the fine details (rather than the distribution or central tendency) of what is happening in a particular organization, group of people, or setting. This type of work is especially important when good information about the group or setting is non-existent or scant. In this type of research, then, it is important to obtain first-hand, in-depth information from the particular focal group or site. For such purposes, selecting a random sample from the population of interest may not be the proper method of choice; rather, samples may be purposively selected to illuminate phenomena in depth.2 For example, to better understand a high-achieving school in an urban setting with children of predominantly low socioeconomic status, a researcher might conduct a detailed case study or an ethnographic study (a case study with a focus on culture) of such a school (Yin and White, 1986; Miles and Huberman, 2   This is not to say that probability sampling is always irrelevant with respect to case studies. A collection of case studies selected randomly from a population may be developed.

OCR for page 97
Scientific Research in Education 1994). This type of scientific description can provide rich depictions of the policies, procedures, and contexts in which the school operates and generate plausible hypotheses about what might account for its success. Researchers often spend long periods of time in the setting or group in order to understand what decisions are made, what beliefs and attitudes are formed, what relationships are developed, and what forms of success are celebrated. These descriptions, when used in conjunction with causal methods, are often critical to understand such educational outcomes as student achievement because they illuminate key contextual factors. Box 5-2 provides an example of a study that described in detail (and also modeled several possible mechanisms; see later discussion) a small group of women, half who began their college careers in science and half in what were considered more traditional majors for women. This descriptive part of the inquiry involved an ethnographic study of the lives of 23 first-year women enrolled in two large universities. Scientific description of this type can generate systematic observations about the focal group or site, and patterns in results may be generalizable to other similar groups or sites or for the future. As with any other method, a scientifically rigorous case study has to be designed to address the research question it addresses. That is, the investigator has to choose sites, occasions, respondents, and times with a clear research purpose in mind and be sensitive to his or her own expectations and biases (Maxwell, 1996; Silverman, 1993). Data should typically be collected from varied sources, by varied methods, and corroborated by other investigators. Furthermore, the account of the case needs to draw on original evidence and provide enough detail so that the reader can make judgments about the validity of the conclusions (Yin, 2000). Results may also be used as the basis for new theoretical developments, new experiments, or improved measures on surveys that indicate the extent of generalizability. In the work done by Holland and Eisenhart (1990), for example (see Box 5-2), a number of theoretical models were developed and tested to explain how women decide to pursue or abandon nontraditional careers in the fields they had studied in college. Their finding that commitment to college life—not fear of competing with men or other hypotheses that had previously been set forth—best explained these decisions was new knowledge. It has been shown in subsequent studies to

OCR for page 97
Scientific Research in Education BOX 5-2 College Women’s Career Choices In the late 1970s cultural anthropologists Dorothy Holland and Margaret Eisenhart set out to learn more about why so few women who began their college careers in nontraditional majors (e.g., science, mathematics, computer science) ended up working in those fields. At the time, several different explanations were being proposed: Women were not well prepared before coming to college; women were discriminated against in college; women did not want to compete with men for jobs. Holland and Eisenhart (1990) first designed ethnographic case studies of a small group of freshman women at two public, residential universities—one historically black, one historically white. From volunteers on each campus, matched groups were selected—based on a survey of their high school grades, college majors, college activities, and college peers. All of the 23 women who participated had at least a B+ average in high school. Half from each campus were planning traditional majors for women; half were planning nontraditional majors. Based on analysis of the ethnographic data obtained from a year of participant observation and open-ended interviews with the women, models were developed to describe how the 23 women participated in college life. The models depicted three different kinds of commitment to school work in college. Each model included: (1) the women’s views about the value of schoolwork; (2) their reasons for doing schoolwork; (3) and the perceived costs (both financial and social) of doing schoolwork. Extrapolating from the models, the researchers predicted what each woman would do after college—continue in school, get a job in her field, get a job outside of her field, get married, etc. At the end of 4 years and again after 3 more years, the researchers followed up with telephone interviews with each woman. In all 23 cases, their predictions made based on the models of commitment to schoolwork were confirmed. Also, in all cases, the models of commitment were better predictors of the future than precollege preparation (grades, courses taken), discrimination against women, or feelings about competing with men.

OCR for page 97
Scientific Research in Education BOX 5-3 Teacher Salaries and Student Outcomes In several comprehensive reviews of research on the effects of educational expenditures on student outcomes, Hanushek (1986, 1997) found that student outcomes were not consistently related either to per-pupil outlays or to teacher salaries. Grogger (1996), Betts (1995), and Altonji (1988), using national longitudinal data sets, produced similar results. However, Loeb and Page (2000) noted a discrepancy between these findings and studies that found school and non-salary teacher effects (e.g., Altonji, 1988; Ehrenberg and Brewer, 1994; Ferguson, 1991). Indeed, Hanushek, Kain, and Rivkin (1998) found a reliable relationship between teacher quality and students’ achievement. For Loeb and Page, these findings add a new dimension to the puzzle. “If teacher quality affects student achievement, then why do studies that predict student outcomes from teacher wages produce weak results?” (2000, p. 393). Loeb and Page pointed out that the previous education expenditure studies failed to account for nonmonetary job characteristics and opportunities that might be open to wouldbe teachers in the local job market (“opportunity costs”). Both might affect a qualified teacher’s decision to teach. Consequently, they tested two competing models, the commonly used “production function” model, which predicted outcomes from expenditures and had formed the theoretical basis of most prior work on the topic, and a modified production-function model that incorporated opportunity costs. They replicated prior findings using traditional production-function procedures from previous studies. However, once they statistically adjusted for opportunity costs, they found that raising teacher wages by 10 percent reduced high school dropout rates by 3-4 percent. They suggested that previous research on the effects of teacher wages on student outcomes failed to show effects because they lacked adequate controls for nonwage aspects of teaching and market differences in alternative occupational opportunities.

OCR for page 97
Scientific Research in Education children to schools that pay teachers more, Loeb and Page found that raising teacher wages by 10 percent reduced high school dropout rates by 3 to 4 percent. WHY OR HOW IS IT HAPPENING? In many situations, finding that a causal agent (x) leads to the outcome (y) is not sufficient. Important questions remain about how x causes y. Questions about how things work demand attention to the processes and mechanisms by which the causes produce their effects. However, scientific research can also legitimately proceed in the opposite direction: that is, the search for mechanism can come before an effect has been established. For example, if the process by which an intervention influences student outcomes is established, researchers can often predict its effectiveness with known probability. In either case, the processes and mechanisms should be linked to theories so as to form an explanation for the phenomena of interest. The search for causal mechanisms, especially once a causal effect has garnered strong empirical support, can use all of the designs we have discussed. In Chapter 2, we trace a sequence of investigations in molecular biology that investigated how genes are turned on and off. Very different techniques, but ones that share the same basic intellectual approach to casual analysis reflected in these genetic studies, have yielded understandings in education. Consider, for example, the Tennessee class-size experiment (see discussion in Chapter 3). In addition to examining whether reduced class size produced achievement benefits, especially for minority students, a research team and others in the field asked (see, e.g., Grissmer, 1999) what might explain the Tennessee and other class-size effects. That is, what was the causal mechanism through which reduced class size affected achievement? To this end, researchers (Bohrnstedt and Stecher, 1999) used classroom observations and interviews to compare teaching in different class sizes. They conducted ethnographic studies in search of mechanism. They correlated measures of teaching behavior with student achievement scores. These questions are important because they enhance understanding of the foundational processes at work when class size is reduced and thus

OCR for page 97
Scientific Research in Education improve the capacity to implement these reforms effectively in different times, places, and contexts. Exploring Mechanism When Theory Is Fairly Well Established A well-known study of Catholic schools provides another example of a rigorous attempt to understand mechanism (see Box 5-4). Previous and highly controversial work on Catholic schools (e.g., Coleman, Hoffer, and BOX 5-4 Effective Schooling: A Comparison of Catholic Schools and Public Schools In the early 1980s two influential books (Coleman, Hoffer, and Kilgore, 1982; Greeley, 1982) set off years of controversy and debate in academic and policy circles about the relative effectiveness of Catholic schools and public schools. In a synthesis of several lines of inquiry over a 10-year period, Bryk and colleagues (Byrk, Lee, and Holland, 1993) focused attention on how Catholic schools functioned to better understand this prior work and to offer insights about improving schools more generally. This longitudinal study is an excellent example of the use of multiple methods, both quantitative and qualitative, to generate converging evidence about such a complex topic. It featured in-depth case studies of seven particularly successful Catholic schools, descriptive profiles of Catholic schools nationally, and sophisticated statistical modeling techniques to assess causal mechanism. One line of inquiry within this multilayered study featured a quasi-experiment that compared the mathematics achievement of Catholic high school students and public high school students. Using simple correlational techniques, the researchers showed that the social distribution of academic achievement was more equalized in Catholic than non-Catholic schools: for

OCR for page 97
Scientific Research in Education Kilgore, 1982) had examined the relative benefits to students of Catholic and public schools. Drawing on these studies, as well as a fairly substantial literature related to effective schools, Bryk and his colleagues (Byrk, Lee, and Holland, 1993) focused on the mechanism by which Catholic schools seemed to achieve success relative to public schools. A series of models were developed (sector effects only, compositional effects, and school effects) and tested to explain the mechanism by which Catholic schools successfully achieve an equitable social distribution of academic achievement. The example, the achievement gap between minority and non-minority students was smaller in Catholic schools than in public schools. To better understand the possible causes behind these “sector” differences, Bryk and his colleagues used data from a rich, longitudinal data set to test whether certain features of school organization explained these differences and predicted success. Because students in this data set were not randomly assigned to attend Catholic or public schools, the researchers attempted to ensure fair comparisons by statistically holding constant other variables (such as student background) that could also explain the finding about the social distribution of achievement. Three potential explanatory models were developed and tested with respect to explaining the relative effectiveness of Catholic schools: sector effects only (the private and spiritual nature of Catholic schools); compositional effects (the composition of the student body in Catholic schools); and school effects (various features of school operations that contribute to school life). In combination, analyzing data with respect to these three potential theoretical mechanisms suggested that it is the coherence of school life in Catholic schools that most clearly accounts for its relative success in this area. Nonetheless, controversy still exists about the circumstances when Catholic schools are superior, about how to control for family differences in the choice of schools, and about the policy implications of these findings.

OCR for page 97
Scientific Research in Education researchers’ analyses suggested that aspects of school life that enhance a sense of community within Catholic schools most effectively explained the differences in student outcomes between Catholic and public schools. Exploring Mechanism When Theory Is Weak When the theoretical basis for addressing questions related to mechanism is weak, contested, or poorly understood, other types of methods may be more appropriate. These queries often have strong descriptive components and derive their strength from in-depth study that can illuminate unforeseen relationships and generate new insights. We provide two examples in this section of such approaches: the first is the ethnographic study of college women (see Box 5-2) and the second is a “design study” that resulted in a theoretical model for how young children learn the mathematical concepts of ratio and proportion. After generating a rich description of women’s lives in their universities based on extensive analysis of ethnographic and survey data, the researchers turned to the question of why women who majored in nontraditional majors typically did not pursue those fields as careers (see Box 5-2). Was it because women were not well prepared before college? Were they discriminated against? Did they not want to compete with men? To address these questions, the researchers developed several theoretical models depicting commitment to schoolwork to describe how the women participated in college life. Extrapolating from the models, the researchers predicted what each woman would do after completing college, and in all cases, the models’ predictions were confirmed. A second example highlights another analytic approach for examining mechanism that begins with theoretical ideas that are tested through the design, implementation, and systematic study of educational tools (curriculum, teaching methods, computer applets) that embody the initial conjectured mechanism. The studies go by different names; perhaps the two most popular names are “design studies” (Brown, 1992) and “teaching experiments” (Lesh and Kelly, 2000; Schoenfeld, in press). Box 5-5 illustrates a design study whose aim was to develop and elaborate the theoretical mechanism by which ratio reasoning develops in young children and to build and modify appropriate tasks and assessments that

OCR for page 97
Scientific Research in Education BOX 5-5 Elementary School Students and Ratio and Proportion In a project on student reasoning on ratio and proportion, Confrey and Lachance (2000) and colleagues examined a group of 20 students over a 3-year period in one classroom. Beginning with a conjecture about the relative independence of rational number structures (multiplication, division, ratio and proportion) from additive structures (addition and subtraction), the investigators sought the roots of ratio reasoning in a meaning of equivalence unfamiliar to the children. Consider how a 9-year-old might come to understand that 4:6 is equivalent to 6:9. Using a series of projects, tasks and challenges (such as designing a wheelchair access ramp or tourist guide to a foreign currency) researchers documented how students moved from believing that equivalence can be preserved through doubling (4:6 = 8:12) and halving (4:6 = 2:3), to the identification of a ratio unit (the smallest ratio to describe the equivalence in a set of proportions), to the ability to add and subtract ratio units (8:12 = 8+2:12+3), to the ability to solve any ratio and proportion challenge in the familiar form a:b :: c:x. This operational description of the mechanism behind ratio reasoning was used to develop instructional tasks—like calculating the slopes of the handicapped access ramps they had designed—and to observe students engaged in them. Classroom videotaping permitted researchers to review, both during the experiment and after its completion, the actual words, actions, and representations of students and teachers to build and elaborate the underlying conjectures about ratio reasoning. At the same time, students’ performance on mathematics assessments was compared with that of students in other classes and schools and to large-scale measures of performance on items designed to measure common misconceptions in ratio and proportion reasoning. The primary scientific product of the study was a theoretical model for ratio and proportion learning refined and enriched by years of in-depth study.

OCR for page 97
Scientific Research in Education incorporate the models of learning developed through observation and interaction in the classroom. The work was linked to substantial existing literature in the field about the theoretical nature of ratio and proportion as mathematical ideas and teaching approaches to convey them (e.g., Behr, Lesh, Post, and Silver, 1983; Harel and Confrey, 1994; Mack, 1990, 1995). The initial model was tested and refined as careful distinctions and extensions were noted, explained, and considered as alternative explanations as the work progressed over a 3-year period, studying one classroom intensively. The design experiment methodology was selected because, unlike laboratory or other highly controlled approaches, it involved research within the complex interactions of teachers and students and allowed the everyday demands and opportunities of schooling to affect the investigation. Like many such design studies, there were two main products of this work. First, through a theory-driven process of designing—and a data-driven process of refining—instructional strategies for teaching ratio and proportion, researchers produced an elaborated explanatory model of how young children come to understand these core mathematical concepts. Second, the instructional strategies developed in the course of the work itself hold promise because they were crafted based on a number of relevant research literatures. Through comparisons of achievement outcomes between children who received the new instruction and students in other classrooms and schools, the researchers provided preliminary evidence that the intervention designed to embody this theoretical mechanism is effective. The intervention would require further development, testing, and comparisons of the kind we describe in the previous section before it could be reasonably scaled up for widespread curriculum use. Steffe and Thompson (2000) are careful to point out that design studies and teaching experiments must be conducted scientifically. In their words: We use experiment in “teaching experiment” in a scientific sense…. What is important is that the teaching experiments are done to test hypotheses as well as to generate them. One does not embark on the intensive work of a teaching experiment without having major research hypotheses to test (p. 277). This genre of method and approach is a relative newcomer to the field of education research and is not nearly as accepted as many of the other

OCR for page 97
Scientific Research in Education methods described in this chapter. We highlight it here as an illustrative example of the creative development of new methods to embed the complex instructional settings that typify U.S. education in the research process. We echo Steffe and Thompson’s (2000) call to ensure a careful application of the scientific principles we describe in this report in the conduct of such research.9 CONCLUDING COMMENTS This chapter, building on the scientific principles outlined in Chapter 3 and the features of education that influence their application in education presented in Chapter 4, illustrates that a wide range of methods can legitimately be employed in scientific education research and that some methods are better than others for particular purposes. As John Dewey put it: We know that some methods of inquiry are better than others in just the same way in which we know that some methods of surgery, arming, road-making, navigating, or what-not are better than others. It does not follow in any of these cases that the “better” methods are ideally perfect…We ascertain how and why certain means and agencies have provided warrantably assertible conclusions, while others have not and cannot do so (Dewey, 1938, p. 104, italics in original). The chapter also makes clear that knowledge is generated through a sequence of interrelated descriptive and causal studies, through a constant process of refining theory and knowledge. These lines of inquiry typically require a range of methods and approaches to subject theories and conjectures to scrutiny from several perspectives. We conclude this chapter with several observations and suggestions about the current state of education research that we believe warrant attention if scientific understanding is to advance beyond its current state. We do not provide a comprehensive agenda for the nation. Rather, we 9   We are aware of several efforts, funded by both federal agencies and foundations, aimed at further development of this approach to ensure its standard and rigorous practice.

OCR for page 97
Scientific Research in Education wish to offer constructive guidance by pointing to issues we have identified throughout our deliberations as key to future improvements. First, there are a number of areas in education practice and policy in which basic theoretical understanding is weak. For example, very little is known about how young children learn ratio and proportion—mathematical concepts that play a key role in developing mathematical proficiency. The study we highlight in this chapter generated an initial theoretical model that must undergo sustained development and testing. In such areas, we believe priority should be given to descriptive and theory-building studies of the sort we highlight in this chapter. Scientific description is an essential part of any scientific endeavor, and education is no different. These studies are often extremely valuable in themselves, and they also provide the critical theoretical grounding needed to conduct causal studies. We believe that attention to the development and systematic testing of theories and conjectures across multiple studies and using multiple methods—a key scientific principle that threads throughout all of the questions and designs we have discussed—is currently undervalued in education relative to other scientific fields. The physical sciences have made progress by continuously developing and testing theories; something of that nature has not been done systematically in education. And while it is not clear that grand, unifying theories exist in the social world, conceptual understanding forms the foundation for scientific understanding and progresses—as we showed in Chapter 2—through the systematic assessment and refinement of theory. Second, while large-scale education policies and programs are constantly undertaken, we reiterate our belief that they are typically launched without an adequate evidentiary base to inform their development, implementation, or refinement over time (Campbell, 1969; President’s Committee of Advisors on Science and Technology, 1997). The “demand” for education research in general, and education program evaluation in particular, is very difficult to quantify, but we believe it tends to be low from educators, policy makers, and the public. There are encouraging signs that public attitudes toward the use of objective evidence to guide decisions is improving (e.g., statutory requirements to set aside a percentage of annual appropriations to conduct evaluations of federal programs, the Government Performance and Results Act, and common rhetoric about “evidence-based” and “research-based” policy and practice). However, we believe stronger

OCR for page 97
Scientific Research in Education scientific knowledge is needed about educational interventions to promote its use in decision making. In order to generate a rich store of scientific evidence that could enhance effective decision making about education programs, it will be necessary to strengthen a few related strands of work. First, systematic study is needed about the ways that programs are implemented in diverse educational settings. We view implementation research—the genre of research that examines the ways that the structural elements of school settings interact with efforts to improve instruction—as a critical, underfunded, and underappreciated form of education research. We also believe that understanding how to “scale up” (Elmore, 1996) educational interventions that have promise in a small number of cases will depend critically on a deep understanding of how policies and practices are adopted and sustained (Rogers, 1995) in the complex U.S. education system.10 In all of this work, more knowledge is needed about causal relationships. In estimating the effects of programs, we urge the expanded use of random assignment. Randomized experiments are not perfect. Indeed, the merits of their use in education have been seriously questioned (Cronbach et al., 1980; Cronbach, 1982; Guba and Lincoln, 1981). For instance, they typically cannot test complex causal hypotheses, they may lack generalizability to other settings, and they can be expensive. However, we believe that these and other issues do not generate a compelling rationale against their use in education research and that issues related to ethical concerns, political obstacles, and other potential barriers often can be resolved. We believe that the credible objections to their use that have been raised have clarified the purposes, strengths, limitations, and uses of randomized experiments as well as other research methods in education. Establishing cause is often exceedingly important—for example, in the large-scale deployment of interventions—and the ambiguity of correlational studies or quasi-experiments can be undesirable for practical purposes. In keeping with our arguments throughout this report, we also urge that randomized field trials be supplemented with other methods, including in-depth qualitative approaches that can illuminate important nuances, 10   The federal Interagency Education Research Initiative was developed to tackle this thorny issue.

OCR for page 97
Scientific Research in Education identify potential counterhypotheses, and provide additional sources of evidence for supporting causal claims in complex educational settings. In sum, theory building and rigorous studies of implementations and interventions are two broad-based areas that we believe deserve attention. Within the framework of a comprehensive research agenda, targeting these aspects of research will build on the successes of the enterprise we highlight throughout this report.