Appendix: Scientific Evidence
THE SCIENTIFIC METHODS
METHODS USED BY SOCIAL SCIENTISTS to gain knowledge are very diverse. Especially in the field of education, a field that calls on several social sciences in order to constitute its knowledge base, a variety of methods are relevant and useful. Each method has its strengths and is best suited to a particular set of questions, and less well suited to a different set of questions. Despite the relative merits of all social science methods, their application in the service of research requires that several basic standards be met if the answers they yield are to be considered valid and valuable.
This note on evidence makes explicit those standards of the scientific communities to which we are accountable in conducting research on young children, and to which we have held researchers accountable in conducting our review. We identify and briefly describe those standards, highlighting areas in which the consensus is not as strong, as well as areas in which important advances have been made in recent decades.
Empiricism: Theory Building
Scientists pose hypotheses based on their observations in the world and in the laboratory. In order to test their hypotheses and
refine theories, they design research studies that entail the collection of data in some form. Those data are then analyzed, results or findings are arrived at, and interpretations of those results are made. These interpretations then can be used to frame future research and guide policy making and program design and implementation.
Replicability and Falsifiability
All theories must be falsifiable. In other words, any theory derived from a study must be sufficiently elaborated so that other scientists can replicate the study and collect additional empirical data to either corroborate or contradict the original theory. It is this willingness to abandon or modify a theory in the face of new evidence that is one of the most central defining features of the scientific method.
The extent to which one particular theory can be viewed as uniquely supported by a particular study depends on the extent to which alternative explanations have been ruled out. A particular research result is never equally relevant to all competing theoretical explanations. A given experiment may be a very strong test of one or two alternative theories but a weak test of others.
Validity and Generalizability
Validity is defined as the extent to which the instrument is actually measuring what the researcher intends it to measure. External validity concerns the generalizability of the conclusions to the larger population and setting of interest. Internal and external validity are often traded off across different methodologies. The alleged trade-off between internal and external validity presents some interesting questions. In what sense can a biased estimate (one that is inaccurate for the whole population) be said to be generalizable? What we mean is that we are willing to risk a small amount of bias for a large increase in confidence that the estimate generalizes to a much larger set of children and programs. Willingness to take that risk requires some confidence that the size of the bias introduced by lack of experimental control is small relative to the bias introduced by applying an unbi-
ased estimate obtained from a narrow set of children and programs to a broader set of programs and children.
Scientists and those who apply scientific knowledge must often make a judgment about where the preponderance of evidence points. When this is the case, the principle of converging evidence is an important tool, both for evaluating the state of the research evidence and also for deciding how future research should be designed.
Research is highly convergent when a series of studies consistently supports a given theory while collectively eliminating the most important competing explanations. Although no single study can rule out all alternative explanations, taken collectively, a series of partially diagnostic studies can lead to a strong conclusion if the data converge. This aspect of the convergence principle implies that we should expect to see many different methods employed in all areas of educational research. A relative balance among the methodologies used to arrive at a given conclusion is desirable because the various classes of research techniques have different strengths and weaknesses. The results from many different types of investigation are usually weighed to derive a general conclusion, and the basis for the conclusion rests on the convergence observed from the variety of methods used. This is particularly true in the domains of classroom and curriculum research.
Types and Uses of Empirical Methods
There are several ways to categorize the empirical methods used in research on early childhood development and education. They may be classified according to:
the purpose of the study (e.g., evaluation of a program, open-ended inquiry for hypothesis or theory building, hypothesis testing, comparison of groups or of individuals),
the design aspects of the study (e.g., the number of times
data are collected: longitudinal, cross-sectional; the type of data that are collected: quantifiable, qualitative), and
the data analysis aspects and the unit of analysis used when the data are analyzed (e.g., univariate, bivariate, and multivariate analyses, qualitative analyses).
Across these groups of studies, methodological rigor can be defined and ensured through attention to the standards outlined above (replicability, generalizability, convergence).
Purposes of Research
Open-ended Inquiry: Qualitative, Ethnographic Research
In order to record and collect data in a naturalistic setting, social scientists conduct various types of ethnographic or qualitative research. These include case studies of individual learners or teachers, classroom ethnographic observations, open-ended and introspective interviews, and combinations of these methods. Qualitative research is most useful for in-depth descriptions of complex processes, such as teaching and learning. It may be important, for example, to assess the beliefs and attitudes of the adults involved in an intervention in order to evaluate the role of those adults in the implementation of a particular educational intervention.
The strengths of qualitative inquiry include a focus on depth, attention to the meaning of phenomena to the people being studied, and a quality of openness that enables new questions and perspectives to be uncovered throughout the research process. In most cases, however, qualitative studies sacrifice breadth for depth, and it is difficult to judge if the results are applicable or generalizable to a different population.
Identifying Causal Relationships: Experimental and Quasi-Experimental Design
If the purpose of the research is to identify cause-and-effect relationships between variables, then experimental and quasi-experimental studies are useful. An experimental study is one in
which the researchers randomly select a control group and a treatment group, administer an intervention (such as an educational program) to the treatment group, and then compare the results by measuring before-and-after treatment variables on both the control and treatment groups. A true experiment is one in which all extraneous variables are controlled and only the single variable of interest is allowed to vary, so that the effect of that variable on the outcome variable can be clearly measured. This pure experimental design is the strongest inferential tool for statistical analysis.
In social science research, and especially in the education field, is often difficult and even unethical to ensure that the control group remains a true control throughout the duration of the intervention. There are several reasons why this may be the case and which therefore justify quasi-experimental or other types of research. First, there are logistical difficulties associated with carrying out classroom and curriculum research that may preclude true experimental designs. For example, members of control groups may engage in an alternative program, not that of the treatment group, but which will have some effect on those members. In some cases, ensuring a true control group would be unethical, as it would require withholding treatment from children even though the purpose of the research may be to gain knowledge that will help those same children in the future. Also, variables such as birth order, sex, and age cannot be manipulated, and therefore the relationships among these can only be correlational. By collecting observational and interview data from all participants, and by using statistical control mechanisms to neutralize the effects of the alternative programs on the control group members, researchers can overcome these limitations.
Researchers can also plan the study so as to minimize such problems. For example, the research plan may require providing a treatment that is of much higher quality and intensity than ordinary child care or even public preschool education and Head Start where these are provided. When service availability varies geographically, study locations might be chosen based on the lack of close substitutes for the treatment.
In any case, it is vital that researchers document all of the potentially significant educational activities that both the treat
ment and the control groups experience. The Abecedarian study provides a good example of a successful experiment in which much of the control group attended other early childhood programs. In this study, the difference in quality and intensity was so large that program effects were apparent. Moreover, an estimate of the diminishment of group differences due to control group experiences was produced. However, it may well be that the critical public policy issue is what is the effect of a program without taking into account the child care and preschool education experiences of the control group. If the research is to investigate the impact of providing a particular program, as it is currently implemented, given what is already available, then what the control group receives is irrelevant (assuming appropriate sampling procedures), and experimental studies produce good answers. Thus, whether a true experiment is useful depends on (a) the expected difference between the treatment and what occurs naturally and (b) the precise question being asked.
Quasi-experimental studies often suffer some of the same problems in assessing treatment effects as experimental studies. An example of this is when a comparison group is not examined carefully enough to determine interventions that they have received. Some correlational studies of Head Start and other preschool programs have failed to take into account the attendance of children in child care centers, despite the fact that these may not be particularly different from the “treatment” in terms of the child’s educational experiences and given the fact that children tend to spend longer hours in child care.
Evidence can be combined across studies looking at different parts of causal chains that might not be completely encompassed by very many studies. For example, studies that link smoking and cancer need not follow subjects all the way to premature death, when there are many studies linking the kinds of cancer caused by smoking to premature death.
Identifying Relationships and Patterns: Correlational Studies
Although experimental studies represent a most powerful design for drawing causal inferences, their limitations must be
recognized. A not uncommon misconception is that correlational (i.e., nonexperimental) studies cannot contribute to knowledge. This is false for a number of reasons.
First, many scientific hypotheses are stated in terms of correlation or lack of correlation, so that such studies are directly relevant to these hypotheses. Second, although correlation does not imply causation, causation does imply correlation. That is, although a correlational study cannot definitively prove a causal hypothesis, it may rule one out. Third, correlational studies are more useful than they once were due to more recently developed correlational designs. For example, the technique of partial correlation, widely used in studies cited in this report, makes possible a test of whether a particular third variable is accounting for a relationship.
RESEARCH IN EARLY CHILDHOOD EDUCATION
Researchers in early childhood education study a vast number of questions. For example: What are the processes through which knowledge is transmitted to young children? What are the effects of educational experiences and of different types of programs on young children? How do factors such as gender, social class, culture, and ethnicity affect the development and education of young children? Given this wide scope of investigation and the inherent complexity of studying young children’s development, the design of precise and accurate measurements is a challenging task. Below we elaborate on a number of questions that should be addressed both in designing research studies and in evaluating the quality of research results.
Precision of the Questions Being Asked in the Research
Did the researcher have a defined purpose for comparing results? Are the questions being asked too broad in nature and are inappropriate measures used? Are we clearly specifying the multiple variables that might underlie the expected change? Are we defining relevant dimensions that may or may not be factors within the child? How do measurement indexes relate to the goals of programs?
The Variability of Young Children’s Performance
Variability in any sample of living organisms should initially be examined in terms of the phenomenon or phenomena under study before attributing variability to measurement error (Farran, 2000). The consequence of ignoring within-group variability or of neglecting the potential significance of outliers in an aggregated data base is a missed opportunity. By focusing so narrowly on the “normal,” a great deal of potentially useful information is overlooked, and understanding of the phenomena under study is thereby greatly handicapped. Within-group variability is not necessarily a random, inconsequential event. An inadequate understanding of the sources of variation should not automatically lead to an interpretation of random error. The argument for randomization is based on the assumption that randomness ensures that the phenomenon under study has an equal chance of being distributed in the entire population. The samples employed in many studies are too small, however, to uphold the validity of this assumption.
Use of Common Measures Versus Trying Innovative Measures
Using measures that are commonly used by others in similar studies allows communication and comparison among different research groups and studies. A persistent use of measures known to have serious limitations, however, may allow these measures to gain acceptance and “incremental validity” simply by the fact that everyone uses these measures to answer a particular set of research questions. In other words, measures often become institutionalized, or part of a research culture.
Designing innovative measures, however, also has potentially negative consequences. If these measures are entirely new and therefore still under question within the scientific community, it may be difficult to interpret the results that they yield to the satisfaction of all. In addition, new measures present difficulties when it comes to training those who will administer them.
We suggest that the solution to this dilemma lies in the use of multiple measures. For example, measuring verbal intelligence
among young children would include administration of a commonly used measure such as the Peabody Picture Vocabulary Test in combination with conducting clinical interviews of at least a subsample of children.
Triadic Nature of Early Childhood Education
In a recent comparative study of preschool programs, it was found that children in classrooms in which teachers strongly believed in the curriculum model they were implementing did better on standardized measures of development than children whose teachers were torn between conflicting models. This finding is supported in the literature showing how belief systems create environments in which particular beliefs are resistant to change even when the data support alternative points of view. The work of Shepard and Smith from the University of Colorado is an example of such research. Evaluations of the effects of preschool education on children should therefore take account of the mode of implementation of the programs being evaluated. In other words, the unit of analysis in such assessments is not only the program, or the child, but rather a triad composed of teacher, child or group of children, and program in the context of classroom.
The factors that constrain or facilitate the interactions among these three factors include the social characteristics of the children and of the teacher and the target and/or goal of the program relative to the transactions in the classroom. The social characteristics of the child are for the most part characteristics of the household, race and/or ethnicity, income and access to other economic resources, and even the level of parental education that influences the processes that take place in the home. These factors are important for a number of reasons: first, they shape the processes that occur in the home. Second, they shape the interactions between the parents and children and their environments (including access to and choice of nonparental care and education arrangements). Finally, the social characteristics of the child shape the perceptions (or interpretations) of the experiences of the child and parents in all of their environments.
Conceptual Orientation of the Investigator
In addition to taking full account of this triad, the orientation of the investigator must also be considered when examining evaluation or other types of early education studies. Research scientists approach their studies from a particular perspective with particular assumptions and understandings that guide their investigations. In evaluating their research, we feel it is important to ask such questions as: What is the ideological or conceptual orientation of the investigator? Is he or she studying children in context, or in isolation from the natural social environment? Is the perspective dominated by a search for universals, or rather for a search for differences between groups and/or cultures? Is the researcher interested in describing a dynamic model of the processes involved, or is the research instead interested in capturing a more static picture? Is the researcher more interested in endogenous or exogenous variables? Finally, does the researcher hold an individualist orientation, focusing on the child as the center of the model, or a more interactionist perspective, in which systems including the child, his or her family, and the school interact to shape development?