4

Making Causal Connections

Studies of child development encompass an enormously varied universe of research strategies drawn from disciplines as diverse as economics and anthropology. These strategies include moment-by-moment ratings of interactions between adults and children and among peers, administration of psychological tests and questionnaires, ethnographic field work, laboratory research using standardized protocols, and clinical observations. Researchers select these strategies to address different goals. They may be most interested in elucidating associations among different facets of development, identifying emerging capacities of children as they develop, or describing the contexts in which children grow up, to name several objectives that studies are designed to address. In this chapter, we focus on studies that seek to identify causal connections between a specific influence (e.g., mothers' talk to children, an intervention program) and child development (e.g., the child's vocabulary, scores on a test of school readiness).

The subset of studies that attempt to establish causal connections are often critical in testing theories about the role of early experience in child development, and they absorb much of the interest of policy makers and practitioners. They can, however, be exceedingly difficult to implement in practice and sometimes involve ethical problems. Currently, a great deal of controversy surrounds the role of experimental studies in understanding the effects of early interventions, in part as a result of the high-stakes policy decisions regarding program funding that are often involved. In this con-



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development 4 Making Causal Connections Studies of child development encompass an enormously varied universe of research strategies drawn from disciplines as diverse as economics and anthropology. These strategies include moment-by-moment ratings of interactions between adults and children and among peers, administration of psychological tests and questionnaires, ethnographic field work, laboratory research using standardized protocols, and clinical observations. Researchers select these strategies to address different goals. They may be most interested in elucidating associations among different facets of development, identifying emerging capacities of children as they develop, or describing the contexts in which children grow up, to name several objectives that studies are designed to address. In this chapter, we focus on studies that seek to identify causal connections between a specific influence (e.g., mothers' talk to children, an intervention program) and child development (e.g., the child's vocabulary, scores on a test of school readiness). The subset of studies that attempt to establish causal connections are often critical in testing theories about the role of early experience in child development, and they absorb much of the interest of policy makers and practitioners. They can, however, be exceedingly difficult to implement in practice and sometimes involve ethical problems. Currently, a great deal of controversy surrounds the role of experimental studies in understanding the effects of early interventions, in part as a result of the high-stakes policy decisions regarding program funding that are often involved. In this con-

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development text, our discussion aims to clarify the logic and contributions of studies of causality to the understanding of developmental processes and to interventions aimed at affecting these processes. Something close to a consensus has emerged within statistical science on the logic of causal inference: its definition, the conditions required for valid causal inference, and generalization of causal inferences. Appendix B discusses the statistical issues involved in defining and estimating causal effects. In the committee's view, this consensus has implications for all studies making causal comparisons, basic and applied, experimental and nonexperimental. Here we sketch the essential ideas in this emerging consensus and consider how these ideas can be applied to improving early childhood research. This focus is not intended to minimize the importance of other research strategies and goals. Research is most appropriately viewed as a sequential process, properly starting with exploratory observation, moving through correlational work aimed at tracing associations among variables of interest, to more rigorous designs permitting causal inference. Indeed, the richness of developmental science derives from the field's reliance on multiple methods of inquiry, and its greatest insights often emerge at the convergence of diverse strands of evidence. We begin by considering causal inference in basic and applied developmental research. Basic research attempts to uncover fundamental processes of development and change, while applied research aims to help policy makers and practitioners evaluate practical efforts to improve children 's experiences and outcomes. We emphasize the importance of integrating basic and applied research in building a strong science of early childhood development. Insights from basic science are crucial in the design of practical programs, while the evaluation of programs can provide new evidence essential to basic science about casual connections. We then discuss the problem of generalizing from intervention studies to the populations of children, to the settings and personnel, and to the historical times and social contexts that might ultimately characterize a new program if its adoption became more widespread. Well-designed studies can answer important questions about the generalizability of a study result. Nevertheless, because strong generalizations typically can emerge only from a stream of related studies, we also discuss the importance of synthesizing evidence across multiple studies. Finally, we consider the particularly thorny issue of causal inference as it applies to growing children. CAUSAL INFERENCE IN BASIC RESEARCH The theory and evidence contained in this report are connected by chains of causal reasoning. We consider how prenatal and neonatal environments affect early brain development and behavior and how these early

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development effects, together with the child's early relationships, affect self-regulation, social competence, language development, and reasoning. The consequences of early experiences for later behavioral functioning, including the ability to initiate and sustain relationships and to succeed in school and at the workplace, are of central interest to theory and policy. This report and developmental science more generally integrate empirical findings regarding such causal propositions and evaluate alternative theoretical explanations that tie these propositions together. Despite their importance, however, causal connections are difficult to nail down. Suppose, for example, that we are interested in how high-quality relations between caregivers and infants affect later cognitive or social functioning. For simplicity, we refer to the quality of such relations as “quality of care.” Let us assume that we have taken great pains to define and validly measure quality of care as well as the outcome of interest: a specific aspect of cognitive or social functioning. When quality of care is found to be associated with an enhanced outcome, we may be inclined to think that the quality of care is the cause of this outcome. But children who enjoy high-quality care are likely to have other advantages that may also shape such outcomes. For example, they may benefit from favorable genetic endowment, highly involved parents, or ample family incomes, all of which may contribute to the cognitive and social outcomes of interest. These other causal factors are called “confounding variables” or “confounders” for short. A confounding variable in the context of this example is a child characteristic or feature of the child 's environment that (a) predicts who will receive high-quality care and (b) also predicts the outcome of interest. The failure to control for confounders leads to an error called “selection bias.” Scientists try hard to devise research strategies that reduce or eliminate selection bias. That is, they try to separate the effects of the main variable of interest—in this case, quality of care—from effects of confounders. The surest way to eliminate selection bias is to conduct an experiment. In it, one would randomly assign children to either high-quality or low-quality care, carefully provide the kind of care assigned, and then, at some later point, assess the outcome of interest. Random assignment would eliminate all confounding variables. To be quite specific, random assignment would ensure that the probability of assignment to high-quality or low-quality care is utterly unaffected by any preexisting characteristic of the child.1 1   Some have argued that random assignment eliminates selection bias only in large samples, but this is not true. By ensuring that previous variables are unrelated to the probability of assignment to each treatment condition, random assignment ensures that tests of statistical significance accurately quantify the uncertainty about the causal question. Thus, when treatment groups are compared on the outcome of interest, significance tests yield p-values that convey the probability of obtaining a sample result of the type actually obtained if there were no causal effect. In small samples, these p-values will, appropriately, tend to be larger than in large samples, but in either case, the p-value and related confidence intervals are fair indicators of uncertainty about the causal effect of interest.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development It is, of course, unethical to assign children to receive low-quality care. Thus, many of the causal factors that are most important to theory are not amenable to experimentation for ethical reasons. Prenatal substance use, poor nutrition, and lack of early affection are three such examples of potentially important causal factors whose effects on humans cannot be assessed experimentally. Moreover, even when it might be ethically defensible to experiment, it may be practically impossible. For example, a controversial hypothesis is that keeping a 3-year-old at home with his or her mother is better for the child than sending him or her to child care (even if the quality of care is good). Since it is not known which kind of experience is superior, one might ethically conduct an experiment. Yet it is usually impossible to assign children at random to stay at home or attend child care; parents will not allow it. When experiments on humans are impossible for ethical or logistical reasons, scientists use a variety of alternative strategies to eliminate selection bias. One is to create special circumstances in which human experimentation becomes both ethical and feasible, for example, by using wait-list controls.2 Another possibility is to conduct experiments on animals. A great deal has been learned, for example, about the effects of highly stressful rearing circumstances on infant development using randomized experiments on monkeys (see Chapter 5). The key problem, of course, is that findings from such experiments may not generalize to human populations. In another example, pregnant women who smoke could be randomly assigned to a smoking-cessation program to evaluate the effects of prenatal smoking on child outcomes. Again, however, generalization may be tenuous, as the special circumstances may not represent the contexts of greatest interest scientifically. Those who volunteer to participate in the evaluation and are assigned to either the experimental or control group may be different from other mothers who smoke but do not volunteer to participate. The results of the experiment may not generalize to those other mothers. Moreover, not all participants will “comply”: some assigned to quit smoking will smoke anyway, and some not assigned to the program will quit, leading to a biased estimate of the effect of smoking. 2   A possible scenario for experimentation arises when a large number of parents seek child care and only a small number of places in child care centers are available. Then children can be randomly assigned to receive child care or to be placed on a waiting list. Constructing a wait-list control in this way can be a very effective research strategy, but any conclusions must be restricted to the set of parents who are actively seeking care. Such parents may provide a different kind of home care than would parents who are not interested in child care.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development The strategy that is perhaps most common for coping with selection bias, however, abandons experimentation entirely. Now the goal is to identify and control for the most plausible confounding variables, preferably by design or, alternatively, by clever statistical analysis, such as propensity score analysis or reliance on other statistical techniques that adjust estimates of treatment impact for other influences related to the outcome. To return to the quality of care example, researchers would ideally take great care to obtain information on many aspects of a child's experience, including the prenatal behavior of the mother, the child's birthweight, early nutrition, the parents ' cognitive skill, parenting behavior, education, occupation, and income level. They would also assess the child's previous status on the outcome variables of interest at least once (see discussion of timeseries designs below). They would then make a concerted effort to construct comparison groups of treatment and control children or families that are as similar as possible on these pretreatment variables. In addition, they may attempt to adjust for such confounding variables ex posteriori, using statistical adjustments when assessing the effects of quality of care. Studies using these nonexperimental designs and analytic strategies are extremely numerous and have yielded a wealth of evidence about the predictors of key childhood outcomes. In these nonexperimental approaches, statistical adjustments after the fact can seldom make up for failures to design as strong a quasi-experiment as possible, particularly if the groups being compared are highly disparate prior to program participation. As others have noted, “no matter how precise your measurement or how sophisticated your analyses, you risk failure if your research is not well planned. You can't fix by analysis what you have bungled by design” (Light et al., 1990). Unfortunately, even with a strong quasi-experimental design, one can never be sure whether key confounders have been overlooked or whether the method of adjustment has effectively removed potential selection biases. Selection bias is not the only threat to valid causal inference. Another is called “simultaneity bias.” Consider a study in which the quality of a child 's relationships with his or her parents as well as the child's behavior are repeatedly assessed. Suppose one finds that changes in the quality of parenting predict changes in the child's behavior. Selection bias is not an issue because comparisons are made within the same children. That is, the child's behavior when parents are providing the best care is compared with the same child's behavior when the parents are not doing such a good job. The problem, however, is that the causal variable—parental care—will to some extent be caused by previous child behavior (see Bell, 1968; Bell and Chapman, 1986; Lytton, 1990; Rutter et al., 1997). Thus, parents will have learned to tailor their care to the past behavior of their child. It then becomes very difficult to ascertain the extent to which parental care is truly

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development a cause of future child behavior, rather than a result of past child behavior. This is often called the problem of simultaneous causation, and ignoring it can lead to simultaneity bias. As a result, researchers have come to appreciate the crucial importance of testing for the direction of causal influence. Simultaneity bias can lead to absurd findings. For example, one might infer that talking baby talk to a 3-year-old slows down expressive vocabulary when, in fact, a child 's failure to speak has driven a parent to use baby talk in a frantic attempt to elicit speech. As in the case of selection bias, researchers have devised a variety of clever strategies for controlling simultaneity bias (Duncan et al., 2000). The most satisfactory is the randomized experiment, but again, such experiments on humans may be impossible for ethical or practical reasons. Careful, repeated assessments of both the causal factor and the outcome, combined with sophisticated statistical analyses, can be very helpful, although, once again, undetected sources of simultaneity may always exist (Robins and Greenland, 1992). In sum, detecting causal connections is basic to developmental science, yet threats to valid causal inference, including selection bias and simultaneity bias, are often substantial. A variety of strategies can cope with these biases, including experimentation on animals, experimentation on humans under special circumstances, and nonexperimental studies that are designed to address threats to causal inference and may also rely on statistical adjustments. Studies using these strategies have different strengths and weaknesses. For this reason, strong causal inferences are rarely justified by a single study. Rather, evidence in favor of a causal connection becomes convincing when the findings from a variety of studies having varied strengths and weaknesses converge, especially when the evidence is consistent with the best available theory. The connection between prenatal substance use and infant outcomes is a good example: although experimentation on humans is difficult, convergent evidence from a variety of animal and human studies supports quite strong conclusions about effects. CAUSAL INFERENCE IN APPLIED RESEARCH The evaluation of interventions designed to improve children's early experiences and outcomes is an important component of early childhood research. Intervention studies can provide an especially strong means for testing theories about the developmental significance of early experiences. Government-subsidized early childhood intervention programs, nutritional supplements, home visitation programs, and parent training programs are but a few examples. Program evaluations enable policy makers to assess how program funds are being spent, whether and to what extent programs are being implemented as planned, and, ultimately, whether program par-

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development ticipation is having positive effects on those served and, if so, why. Knowing that an intervention met its goals is a step in the right direction, but the real need is to move from this general conclusion to specific conclusions about which aspects of the intervention have which effects, to what degree, and under which circumstances (Rutter et al., in press). Assessing program impact on those served is once again a causal question. As in more basic research, threats to valid causal inference arise. Selection bias occurs when characteristics that predict program participation are also associated with outcomes. Simultaneity bias can also occur, especially when program activities and outcomes are studied over time. There is, however, an important distinction between the kinds of causal questions that arise in basic research and those that arise in program evaluations. Basic developmental science typically assesses causal connections between events that unfold naturally over time. Opportunities for experimentation are limited. In contrast, program evaluations assess the effects of deliberately planned interventions —activities that would not occur in the absence of a new policy. For this reason, it is often more plausible, both ethically and logistically, to conduct experiments in program evaluation than in basic developmental science. Such experimentation not only provides strong causal inferences about the impact of the program, but it can also provide new insights of great relevance to basic research. Experiments such as the Infant Health and Development Program (Gross et al., 1997), the High/Scope Project (Schweinhart et al., 1993; see Chapter 13) study of the long-term effects of high-quality child care, the Abecedarian Program (Campbell and Raney, 1994, 1995), and the Nurse Home Visitation Program (Olds et al., 1986, 1999) provide a wealth of knowledge about how early environmental enrichment affects short-and long-term cognitive and social development, knowledge that would otherwise be unavailable. Experimental evaluations have also shown that some promising ideas do not appear to translate into better childhood outcomes, a result that requires a deeper reflection on the validity of the theory behind the program, as well as on program implementation. This interplay between basic and applied research is essential to the vitality of the field. This discussion may seem to imply that all program evaluations should be randomized experiments. Although we strongly suspect that randomized experiments are underutilized in program evaluation research, they are not the right tool for addressing all questions about interventions and special conditions must hold before a randomized experiment is feasible or desirable. Moreover, well-planned experiments can unravel. Randomized experiments are of use when a clearly stated causal question is on the table. Many program evaluations are designed to answer other kinds of questions. Early in the life of a new program, the key

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development question may be whether the program can be implemented as planned, whether the participants for whom the program is designed actually participate, and how much the program costs. A test of the impact of the program generally makes sense only when the program is based on sound theory regarding modifiable mechanisms that are associated with the outcomes of interest (e.g., reducing maternal smoking will enhance newborn outcomes), when one is confident that the program can be faithfully implemented, and when there is reasonable assurance that the children and families of interest will participate as planned. A premature and expensive test of impact can be a waste of money and can demoralize those who are trying to invent promising new programs. The results from such an evaluation are difficult to interpret, creating confusion rather than clarification about policy and theory. Indeed, premature causal evaluation can undermine potentially promising programs. Logistical and political considerations are also extremely important. Suppose that a new program is ready for a test of its impact. A randomized experiment often becomes an attractive option, yet the decision about how to design a causal-comparative study must be made on a case-by-case basis. Randomized experiments are often feasible and ethically defensible. For example, when funds become available for a promising new intervention, there will often be considerable interest among parents in participating but insufficient resources to accommodate all interested families. In this setting, a lottery can be used to select who will participate —in effect, a randomized selection. If the results of the randomized experiment are promising, resources may then become available to accommodate more families. However, in other cases, randomized experiments may not be feasible or desirable for logistical or political reasons. In still other cases, it may already be known from previous experimentation that a program works under the special conditions of the experiment. The question then may be whether the program produces significant effects in a routine (nonexperimental) setting. Nonexperimental methods are then required to cope with selection and simultaneity biases. It is also important to recognize that an initially randomized experiment can deteriorate under the impact of noncompliance, becoming a nonrandomized experiment, also called a “quasi-experiment.” An intervention often calls for a degree of investment on the part of participants that some, or many, find difficult to manage; they may drop out quickly, or attend training meetings sporadically, or “forget” to be at home when the home visitor is scheduled to arrive. The same processes are not at work with the comparable set of control families not receiving the program. Thus, in longitudinal evaluations, “differential attrition” arises and selection bias remains a problem despite efforts to conduct a randomized experiment. Even in these cases, however, it is important to keep in mind that the

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development resulting quasi-experiment is likely to be much less biased than if no random assignment is attempted and parents are free to enroll children in the program or not. Moreover, few experimental evaluations are now implemented without ongoing monitoring of attrition and other forms of treatment attenuation. This means that attrition can be detected early, efforts can be undertaken to reduce it, and these efforts can be used to improve the quality of subsequent implementation (see Shadish et al., in preparation). Finally, statistical techniques are now available for obtaining relatively unbiased estimates of program effects despite noncompliance (see Little and Yau, 1998). In brief, they involve assessing effects of “the intent to treat” and examining the effects of the program on participants who received different “dosages” or amounts of the program. These advances have gone a long way toward addressing some of the problems to which randomized experiments can succumb. Even when a randomized experiment is impossible or unadvisable, however, “experimental thinking” is central to success of causal or comparative studies. Nonexperimental evaluations of program impacts can be viewed as more or less accurate approximations of “the experiment we wish we could conduct but cannot. ” The more accurate the approximation, the stronger the confidence that the evaluation has produced a valid causal inference. To understand why this is so requires an understanding of current thinking in statistical science about the nature of causation and the logic conditions for valid causal inference (see Appendix B). In short, this involves thinking hard about the randomized study one would conduct if it were feasible and ethical. First, we must be able to imagine an experiment in which each participant is randomly assigned to the treatment groups. In studying the effects of divorce on children, for example, we cannot randomly assign parents to obtain or not obtain a divorce. But we can imagine such an experiment and thus conceive of a child's potential outcome under both conditions (the child's outcome if the parents were or were not divorced). In a randomized experiment, the propensity for one's parents to divorce would be independent of the potential outcomes—the outcomes that would be observed if divorce did or did not occur. Although such an experiment is impossible, it can be approximated by studying the propensity of couples to get divorced. For example, it might be possible to find, for each child of a divorced family, a child in a nondivorced family whose parents had the same predicted propensity to be divorced. These matched pairs might then be compared on outcomes. In imagining this study of divorce, it becomes clear that there will be many children of nondivorced parents with very low propensities to be divorced. In an analysis using propensity-score matching, such cases are likely to be discarded, because there may be no good match —no child whose parents were divorced but who had a low propensity to be divorced.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development One might ask whether discarding those cases with low risk of divorce is sensible. Although statisticians generally dislike throwing away data, the argument here is that discarding those cases is indeed sensible. The causal question really has meaning only for those families for which divorce is a plausible course of action. It makes little sense to compare children of parents with strong marriages to children of parents who divorce as a strategy for understanding the effects of divorce. A vivid demonstration of this point is available in Cherlin et al. (1991). This last paragraph raises a crucial point about causal inference. There are often causal questions that are of interest only for a subset of the population. Whether to have heart bypass surgery is not a relevant question for persons with good cardiovascular health. And no one would conduct a randomized experiment in which persons with such good health, along with others, were randomly assigned to heart bypass surgery. This is not only an ethical concern. The impact of heart bypass surgery on persons with good cardiovascular health is not an interesting question for policy. Yet it is quite common to find researchers using survey data, for example, to examine “the effects of divorce” or the effect of low birthweight or the effects of infant care in an analysis using all participants. Such an analysis would use statistical procedures to control for extraneous variables. Yet participants with no chance or a very small chance of experiencing divorced parents or low birthweight or infant child care really contribute little or no useful information about the causal question of interest. Thus, the inclusion of such cases may distort findings. Thinking about the populations for whom the “treatment” is relevant is intimately connected to the problem of generalizing from an experimental study to a different population, with a different variant of the intervention or treatment, with a different kind of outcome measure, in a new setting, and sometime in the future. We now turn to these issues of causal generalization. CAUSAL GENERALIZATION Studies of causal connections in early childhood research involve explicit generalizations from the sampled domains of people, settings, times and contexts, causes, and effects to other situations in which the results might be applied (see Cook, 1990, 1993). Generalizability involves the inference that a research result arising in a given setting at a given time for a given set of participants, under a specific version of a treatment and for specific outcomes is likely to hold in some other setting at a later time with some other participants, under somewhat different versions of the treatment, and for somewhat different ways of assessing the outcomes. When applying research results from an intervention study to a natural setting, one is assuming that: (a) the “treatment” would be implemented in the

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development natural setting similarly, but not necessarily identically, to how it was implemented in the study, (b) the participants in the natural setting would respond similarly to how the participants in the study responded, (c) the effects of the treatment would be assessed similarly, and (d) events have not transpired over time to change the broader context in which the treatment is being implemented and assessed. In research on early childhood interventions, the effectiveness of the intervention often depends on the knowledge and skill of the practitioners—those who provide home visits, child care, or parental counseling—and on other services available in the community (see Olds et al., 1998b). It is essential that researchers vividly describe the characteristics of those implementing the treatments; the training, skill, and supervision required to implement them effectively; and the resources available within the community where the program is being replicated. Such descriptions will help determine whether the conditions that may facilitate the success of the program are present in the natural setting. Defining the target population—that is, the families and children intended to benefit from a program (or assumed to be affected by the risk factor of interest)—is also essential, and a well-crafted statistical analysis can provide useful information on how children of varying backgrounds respond to a causal variable. Ideally, study participants would be a probability sample randomly selected from a well-defined target population—that is, the universe of families and children for whom the causal question is relevant. A probability sample is obtained when every element (e.g., family or child) in the target population has a known, nonzero probability of being included in the study. Such a sample makes it possible to compute estimates of effects that are unbiased for the target population. Randomized experiments, however, rarely involve probability samples from well-defined populations. Some effort is generally required to convince child care providers or pediatricians or parents or children to participate in a randomized experiment, generally making random selection from a population impossible. And cost concerns and logistics generally require that randomized experiments be carried out in local settings. A nationally or regionally representative sample is generally far too dispersed to be used in an experiment. The trade-off between causal credibility and representativeness is known in the methodological literature as the trade-off between internal and external validity (Cook, 1993; Cook and Campbell, 1979). “Internal validity” is the validity of a causal inference, the kind of credibility obtained from a well-run randomized experiment. “External validity ” is the validity of generalizations made from the study to the practical setting in which a new treatment or program might be implemented, the kind of credibility that might come from a survey based on a probability sample. The trade-off arises in part because it is usually impossible to conduct experiments on

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development participants who have been sampled with known probability from the population of interest. It also arises because the special circumstances required to construct a randomized experiment often create a research scenario (settings, implementers, and participants) that is quite different from the settings of practical interest. Yet even having a probability sample of settings, implementers, and participants from the population of interest would not, in itself, guarantee a high level of external validity—what is called “generalizability.” To illustrate this point, suppose that a home visitation program works very well for families of type A but very badly for all other families (families of type B). Also suppose that the researcher is unaware of this fact and has the luxury of conducting a true experiment on a random sample of families. The researcher might then report that “the average treatment effect is near zero.” While such a statement may be true, it would disguise the reality that the treatment had a very good or a very bad effect, depending on the type of family. Thus, the conclusion, even though based on a seemingly perfect design, would be misleading. The generalization would apply to no one, neither to families of type A nor of type B. In this situation, it is essential to consider the concept of a “moderator,” a preexisting characteristic of families or children on which the impact of the treatment and magnitude of the treatment effect depend. It is possible, for example, that seriously depressed women are less responsive to home visiting interventions. In this case, maternal depression moderates the treatment effect and it would be advisable to assess effects separately for depressed and nondepressed mothers. Using the concept of a moderator, one can assess the generalizability of a treatment effect in some detail within a single-site study, across sites of a multisite study, and across studies in a research synthesis. This kind of investigation is far more manageable when randomization is feasible and ethical in each study, because nonrandomized studies involve confounders as well as moderators. Moderators can, however, be overused. It is critical to choose a priori moderators suggested by previous research that are most plausible theoretically for subsequent exploration in order to avoid random hunts for moderating influences in the absence of evidence of program effectiveness. One of the most common procedures for studying the generalizability of findings from multiple studies is “meta-analysis” (see the comprehensive review of Cooper and Hedges, 1994). A meta-analysis can be thought of as an unplanned multisite study. If it had been jointly planned by all of the investigators, care would have been taken to ensure that similar outcome variables were used in every study; that treatment conditions were standardized; and that key dimensions of site-level variation were incorporated into the design. As a retrospective form of inquiry, meta-analyses do not have the luxury of capitalizing on such planning. Nevertheless, a stream of

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development inquiry on a common set of hypotheses in developmental research usually includes interesting variation in participants, implementers, sites, and treatment conceptions, as well as interesting variation in methodological approaches. Using meta-analysis, it is possible to exploit this variation to study intensively the degree of generalizability of a treatment effect and the specific sources of variation in the treatment effect. CAUSAL INFERENCE AND GROWING CHILDREN Growth and change are pervasive and typically rapid during early childhood. For this reason, studies using repeated measures on each outcome are common. There are good reasons to do longitudinal studies: crosssectional differences in height, weight, vocabulary, quantitative reasoning, and motor control may be of little interest compared with understanding children's growth trajectories on each of these outcomes. An intervention to enhance growth in any of these areas can affect cross-sectional status only by deflecting the growth trajectory. Over time, a shifted trajectory—what we refer to throughout this report as shifting the odds—will produce substantial shift in expected status, but the shift in growth rate is the leading indicator. Studies of causal effects on growth may be significantly more powerful than studies of status, particularly when the number of participants is strongly constrained by cost or logistical issues. In principle, all of the ideas we have discussed apply to studies of growth as well as developmental status, if we simply reconceive the outcome as some interesting aspect of growth, such as an average rate of change or an acceleration rate rather than a cross-sectional outcome. However, assessing growth poses special problems of measurement, design, and analysis. Measuring growth is challenging for such psychological outcomes as vocabulary or quantitative reasoning, less so for physical characteristics such as height and weight. New design challenges arise in experimental studies because repeated measurements pose a risk of attrition, and because subtle forms of confounding can arise that are not present in cross-sectional research. Methods of analysis are typically more challenging as well. Growth curve analysis has a long history in biology and medicine. Models for growth in stature during childhood, for example, have been developed and refined over many years. In measuring human height (or weight or lung capacity, for example), there is little disagreement about the meaning of the construct being measured or about the units of measurement (e.g., centimeters, grams, cubic centimeters). Unreliability of measurement is not a large problem. Measuring growth in psychological domains (e.g., vocabulary, quantitative reasoning, verbal memory, hand-eye coordination, self-regulation) is more problematic. Disagreement is more likely to arise about the definition

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development of the construct to be assessed. This occurs, in part, because there are often no natural units of measurement (i.e., nothing comparable to the use of inches when measuring height). As a result, units of measurement must be created and defended, and errors of measurement are likely to be quite large. This becomes especially problematic when the outcome of interest changes as children mature—as is the case with achievement outcomes—or when transitions are involved, as with the development of literacy. For example, once a child acquires visual recognition memory (between 3 and 9 months), it becomes more appropriate to assess the number of words the child knows and, later, to assess prereading skills. To compound this problem, it can be hard to reach agreement about the appropriate age range for which a particular psychological construct is relevant. Nevertheless, growth in these psychological domains is of great interest. Many important interventions are designed to enhance psychological growth, and theories of development depend on hypothesized causal chains that explain human variation in the rates of such growth. Another obstacle to studies of change is the cross-sectional orientation of psychometrics. When social scientists speak of reliability of measurement, they are almost invariably describing cross-sectional reliability: that is, the reliability with which one can distinguish among individuals at any given time. The study of cross-sectional individual differences, especially differences in cognitive functioning, has had a powerful and enduring influence on the theory and practice of measurement in psychology. Only recently have researchers begun to take seriously the reliability with which one can distinguish among individuals in rates of change, acceleration, or other aspects of developmental trajectories (see Willett, 1988, for a review). An example may prove instructive. Consider Figure 4-1, which displays expressive vocabulary as a function of age for three children, based on the work of Huttenlocher et al. (1991). The researchers took pains to estimate the total number of words in a child's expressive vocabulary on each of multiple occasions during the second year of life, a period during which vocabulary rapidly grows from a starting point near zero at age 12 months. Note that a curved line with positive acceleration neatly fits the repeated measures for each child. What is distinctive about each child's growth record is not the starting point (vocabulary is near zero at age 12 months for all three children) nor the standing of the child at any time point, but rather the rate of acceleration for each child. This rate of acceleration is increasingly well measured as time points are added. Subsequent analyses found a strong relationship between maternal speech and vocabulary acceleration. The statistical power of such analyses was strengthened by the fact that it effectively incorporated all the data in a single analysis. That is, every occasion of measurement contributed to understanding a single crucial aspect of growth (acceleration), enabling the

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development researchers to discover relations that had proved elusive in studies of change in relative status. The following appear to be the key ingredients in studies of quantitative growth: A clear definition of the outcome variable or construct on which children are believed to be growing. A measurement unit or scale that has constant meaning over the age range of interest (e.g., height in inches or the number of words in a child's expressive vocabulary). An outcome that can be measured on a common scale across ages, such that the alternative, age-appropriate forms of the assessment can be equated, that is, put onto the same meaningful scale.3 A statistical model for individual change over time. During the second year of life, for example, the appropriate model for vocabulary is a positively accelerating curve, as depicted in Figure 4-1. A longitudinal study that is optimally designed to ensure a given level of statistical precision for the question at hand. Trade-offs among the length of the study, the frequency of observation, and the sample size are invariably involved. These design choices strongly affect the reliability and validity of individual measures of change. These choices can also affect the internal validity of quasi-experimental studies. Experts on developmental change have emphasized the value of interrupted time-series designs when children are growing, especially when randomized experiments are not feasible (Bryk and Weisberg, 1977; Campbell and Erlebacher, 1970; Glass et al., 1972; Porter, 1967; see also Blumberg and Porter, 1983). In these designs, multiple pretreatment observations are taken in order to establish a pretreatment trajectory for the children. Figure 4-2 illustrates the value of this approach. Designs that include only one pretest before a treatment or intervention, followed by a posttest after the treatment (see shaded portion), cannot distinguish whether the apparent gains made by the participants (thick line) compared with the 3   Standardization within age, as is common in IQ tests, eliminates the possibility of a meaningful scale with respect to the construct of interest (e.g., cognitive ability) and therefore distorts the study of growth on that construct. Such standardized scales can exhibit shifts in the relative standing of persons, but they cannot reveal rates of growth with respect to the behavioral domain. One typical result is that individual differences in estimates of change become substantially less reliable after standardization, undermining the capacity of intervention studies to discover effects.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development FIGURE 4-1 A sample of individual vocabulary growth trajectories. SOURCE: Huttenlocher et al., 1991. NOTE: •, Δ, and ❑ represent actual word counts for three individual children. FIGURE 4-2 Distinguishing treatment effects from growth in time-series designs.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development controls (thin line) are attributable to the intervention. If an additional pretest had been given, however, it would be possible to tell if the treatment actually accelerated the growth of the treated children relative to the controls (see thick line from the pre-pretest to the pretest for the participants) or if the children were already showing different rates of growth prior to the treatment (see dashed line). In this later case, the treatment actually had no effect; these patterns of growth would have been predicted without the intervention. In sum, it is often essential in studies of early childhood development to recognize that children are rapidly growing. Causal inference on aspects of child growth poses important issues that extend beyond efforts to make causal connections between an intervention and a set of child outcomes at a given age. When the growth of interest is psychological, it is challenging to define clearly the dimensions on which children are growing, to devise assessments that are sensitive to growth, and to evaluate the capacity of alternative designs to reliably gauge individual differences in growth. Formulating and criticizing statistical models is essential to defining causal effects and considering threats to valid inference. Explicit models are especially important in cases in which participants are rapidly growing, because the meaning of growth and of causal effects on growth must be made explicit if progress is to be made in assessing the quality of the assessments or the utility of alternative designs for capturing these causal effects. CONCLUSIONS At the beginning of this chapter, we emphasized the importance of combining insights from basic and applied research to gain a fuller understanding of early development and the influences that guide and affect it. Basic research is designed to provide detailed observations of development and to test theories about causal mechanisms. It is often difficult, however, to meet the conditions that lead to strong causal inferences. In contrast, applied research avails itself of interventions and natural experiments that can often provide better evidence of causation and, when studies are designed appropriately, can help to specify the mechanisms involved. The challenge to researchers is twofold. The first involves designing studies and evaluations that successfully capture causal information. The second is to integrate the evidence from basic and applied research to evaluate alternative explanations for development and discern their implications for policies aimed at improving children's life chances. In the final analysis, knowledge is advanced not through a single, decisive study, but by integrating evidence generated by different strategies, with different strengths and weaknesses. The research that generates this knowledge is, under the best of circumstances, a cumulative process that

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development starts with rich descriptive data about the phenomena of interest, moves to understanding connections between outcomes and important influences on them, and finally seeks to identify causal relations and mechanisms. This chapter has focused on the final stage of this sequence, given its importance to both theoretical and political debates about the role of early experience in child development. Its purpose has not been to assert the superiority of causal studies, but rather, when causal questions are being addressed by research, to illustrate the key issues that arise and the critical importance of being tough-minded about ensuring that the conditions for making valid causal inferences are met. Only when the limits of current knowledge and the best thinking about improved designs are clear can we plan research that will contribute significantly to knowledge in the future.

OCR for page 70
From Neurons to Neighborhoods: The Science of Early Childhood Development This page in the original is blank.