Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
MAKING CAUSAL CONNECTIONS 71 text, our discussion aims to clarify the logic and contributions of studies of causality to the understanding of developmental processes and to interven- tions aimed at affecting these processes. Something close to a consensus has emerged within statistical science on the logic of causal inference: its defini- tion, the conditions required for valid causal inference, and generalization of causal inferences. Appendix B discusses the statistical issues involved in defining and estimating causal effects. In the committeeâs view, this consen- sus has implications for all studies making causal comparisons, basic and applied, experimental and nonexperimental. Here we sketch the essential ideas in this emerging consensus and consider how these ideas can be applied to improving early childhood research. This focus is not intended to minimize the importance of other research strategies and goals. Re- search is most appropriately viewed as a sequential process, properly start- ing with exploratory observation, moving through correlational work aimed at tracing associations among variables of interest, to more rigorous designs permitting causal inference. Indeed, the richness of developmental science derives from the fieldâs reliance on multiple methods of inquiry, and its greatest insights often emerge at the convergence of diverse strands of evidence. We begin by considering causal inference in basic and applied develop- mental research. Basic research attempts to uncover fundamental processes of development and change, while applied research aims to help policy makers and practitioners evaluate practical efforts to improve childrenâs experiences and outcomes. We emphasize the importance of integrating basic and applied research in building a strong science of early childhood development. Insights from basic science are crucial in the design of prac- tical programs, while the evaluation of programs can provide new evidence essential to basic science about casual connections. We then discuss the problem of generalizing from intervention studies to the populations of children, to the settings and personnel, and to the historical times and social contexts that might ultimately characterize a new program if its adoption became more widespread. Well-designed studies can answer important questions about the generalizability of a study result. Nevertheless, because strong generalizations typically can emerge only from a stream of related studies, we also discuss the importance of synthesizing evidence across multiple studies. Finally, we consider the particularly thorny issue of causal inference as it applies to growing children. CAUSAL INFERENCE IN BASIC RESEARCH The theory and evidence contained in this report are connected by chains of causal reasoning. We consider how prenatal and neonatal envi- ronments affect early brain development and behavior and how these early
72 FROM NEURONS TO NEIGHBORHOODS effects, together with the childâs early relationships, affect self-regulation, social competence, language development, and reasoning. The conse- quences of early experiences for later behavioral functioning, including the ability to initiate and sustain relationships and to succeed in school and at the workplace, are of central interest to theory and policy. This report and developmental science more generally integrate empirical findings regard- ing such causal propositions and evaluate alternative theoretical explana- tions that tie these propositions together. Despite their importance, however, causal connections are difficult to nail down. Suppose, for example, that we are interested in how high- quality relations between caregivers and infants affect later cognitive or social functioning. For simplicity, we refer to the quality of such relations as âquality of care.â Let us assume that we have taken great pains to define and validly measure quality of care as well as the outcome of interest: a specific aspect of cognitive or social functioning. When quality of care is found to be associated with an enhanced outcome, we may be inclined to think that the quality of care is the cause of this outcome. But children who enjoy high-quality care are likely to have other advantages that may also shape such outcomes. For example, they may benefit from favorable ge- netic endowment, highly involved parents, or ample family incomes, all of which may contribute to the cognitive and social outcomes of interest. These other causal factors are called âconfounding variablesâ or âcon- foundersâ for short. A confounding variable in the context of this example is a child characteristic or feature of the childâs environment that (a) pre- dicts who will receive high-quality care and (b) also predicts the outcome of interest. The failure to control for confounders leads to an error called âselection bias.â Scientists try hard to devise research strategies that reduce or eliminate selection bias. That is, they try to separate the effects of the main variable of interestâin this case, quality of careâfrom effects of confounders. The surest way to eliminate selection bias is to conduct an experiment. In it, one would randomly assign children to either high-quality or low-quality care, carefully provide the kind of care assigned, and then, at some later point, assess the outcome of interest. Random assignment would eliminate all confounding variables. To be quite specific, random assignment would ensure that the probability of assignment to high-quality or low-quality care is utterly unaffected by any preexisting characteristic of the child.1 1Some have argued that random assignment eliminates selection bias only in large samples, but this is not true. By ensuring that previous variables are unrelated to the probability of assignment to each treatment condition, random assignment ensures that tests of statistical significance accurately quantify the uncertainty about the causal question. Thus, when treat- ment groups are compared on the outcome of interest, significance tests yield p-values that
MAKING CAUSAL CONNECTIONS 73 It is, of course, unethical to assign children to receive low-quality care. Thus, many of the causal factors that are most important to theory are not amenable to experimentation for ethical reasons. Prenatal substance use, poor nutrition, and lack of early affection are three such examples of poten- tially important causal factors whose effects on humans cannot be assessed experimentally. Moreover, even when it might be ethically defensible to experiment, it may be practically impossible. For example, a controversial hypothesis is that keeping a 3-year-old at home with his or her mother is better for the child than sending him or her to child care (even if the quality of care is good). Since it is not known which kind of experience is superior, one might ethically conduct an experiment. Yet it is usually impossible to assign children at random to stay at home or attend child care; parents will not allow it. When experiments on humans are impossible for ethical or logistical reasons, scientists use a variety of alternative strategies to eliminate selec- tion bias. One is to create special circumstances in which human experi- mentation becomes both ethical and feasible, for example, by using wait- list controls.2 Another possibility is to conduct experiments on animals. A great deal has been learned, for example, about the effects of highly stress- ful rearing circumstances on infant development using randomized experi- ments on monkeys (see Chapter 5). The key problem, of course, is that findings from such experiments may not generalize to human populations. In another example, pregnant women who smoke could be randomly assigned to a smoking-cessation program to evaluate the effects of prenatal smoking on child outcomes. Again, however, generalization may be tenu- ous, as the special circumstances may not represent the contexts of greatest interest scientifically. Those who volunteer to participate in the evaluation and are assigned to either the experimental or control group may be differ- ent from other mothers who smoke but do not volunteer to participate. The results of the experiment may not generalize to those other mothers. Moreover, not all participants will âcomplyâ: some assigned to quit smok- ing will smoke anyway, and some not assigned to the program will quit, leading to a biased estimate of the effect of smoking. convey the probability of obtaining a sample result of the type actually obtained if there were no causal effect. In small samples, these p-values will, appropriately, tend to be larger than in large samples, but in either case, the p-value and related confidence intervals are fair indica- tors of uncertainty about the causal effect of interest. 2A possible scenario for experimentation arises when a large number of parents seek child care and only a small number of places in child care centers are available. Then children can be randomly assigned to receive child care or to be placed on a waiting list. Constructing a wait-list control in this way can be a very effective research strategy, but any conclusions must be restricted to the set of parents who are actively seeking care. Such parents may provide a different kind of home care than would parents who are not interested in child care.
74 FROM NEURONS TO NEIGHBORHOODS The strategy that is perhaps most common for coping with selection bias, however, abandons experimentation entirely. Now the goal is to identify and control for the most plausible confounding variables, prefer- ably by design or, alternatively, by clever statistical analysis, such as pro- pensity score analysis or reliance on other statistical techniques that adjust estimates of treatment impact for other influences related to the outcome. To return to the quality of care example, researchers would ideally take great care to obtain information on many aspects of a childâs experience, including the prenatal behavior of the mother, the childâs birthweight, early nutrition, the parentsâ cognitive skill, parenting behavior, education, occu- pation, and income level. They would also assess the childâs previous status on the outcome variables of interest at least once (see discussion of time- series designs below). They would then make a concerted effort to con- struct comparison groups of treatment and control children or families that are as similar as possible on these pretreatment variables. In addition, they may attempt to adjust for such confounding variables ex posteriori, using statistical adjustments when assessing the effects of quality of care. Studies using these nonexperimental designs and analytic strategies are extremely numerous and have yielded a wealth of evidence about the pre- dictors of key childhood outcomes. In these nonexperimental approaches, statistical adjustments after the fact can seldom make up for failures to design as strong a quasi-experiment as possible, particularly if the groups being compared are highly disparate prior to program participation. As others have noted, âno matter how precise your measurement or how sophisticated your analyses, you risk failure if your research is not well planned. You canât fix by analysis what you have bungled by designâ (Light et al., 1990). Unfortunately, even with a strong quasi-experimental design, one can never be sure whether key confounders have been over- looked or whether the method of adjustment has effectively removed poten- tial selection biases. Selection bias is not the only threat to valid causal inference. Another is called âsimultaneity bias.â Consider a study in which the quality of a childâs relationships with his or her parents as well as the childâs behavior are repeatedly assessed. Suppose one finds that changes in the quality of parenting predict changes in the childâs behavior. Selection bias is not an issue because comparisons are made within the same children. That is, the childâs behavior when parents are providing the best care is compared with the same childâs behavior when the parents are not doing such a good job. The problem, however, is that the causal variableâparental careâwill to some extent be caused by previous child behavior (see Bell, 1968; Bell and Chapman, 1986; Lytton, 1990; Rutter et al., 1997). Thus, parents will have learned to tailor their care to the past behavior of their child. It then becomes very difficult to ascertain the extent to which parental care is truly
MAKING CAUSAL CONNECTIONS 75 a cause of future child behavior, rather than a result of past child behavior. This is often called the problem of simultaneous causation, and ignoring it can lead to simultaneity bias. As a result, researchers have come to appre- ciate the crucial importance of testing for the direction of causal influence. Simultaneity bias can lead to absurd findings. For example, one might infer that talking baby talk to a 3-year-old slows down expressive vocabulary when, in fact, a childâs failure to speak has driven a parent to use baby talk in a frantic attempt to elicit speech. As in the case of selection bias, researchers have devised a variety of clever strategies for controlling simultaneity bias (Duncan et al., 2000). The most satisfactory is the randomized experiment, but again, such experi- ments on humans may be impossible for ethical or practical reasons. Care- ful, repeated assessments of both the causal factor and the outcome, com- bined with sophisticated statistical analyses, can be very helpful, although, once again, undetected sources of simultaneity may always exist (Robins and Greenland, 1992). In sum, detecting causal connections is basic to developmental science, yet threats to valid causal inference, including selection bias and simultane- ity bias, are often substantial. A variety of strategies can cope with these biases, including experimentation on animals, experimentation on humans under special circumstances, and nonexperimental studies that are designed to address threats to causal inference and may also rely on statistical adjust- ments. Studies using these strategies have different strengths and weak- nesses. For this reason, strong causal inferences are rarely justified by a single study. Rather, evidence in favor of a causal connection becomes convincing when the findings from a variety of studies having varied strengths and weaknesses converge, especially when the evidence is consis- tent with the best available theory. The connection between prenatal sub- stance use and infant outcomes is a good example: although experimenta- tion on humans is difficult, convergent evidence from a variety of animal and human studies supports quite strong conclusions about effects. CAUSAL INFERENCE IN APPLIED RESEARCH The evaluation of interventions designed to improve childrenâs early experiences and outcomes is an important component of early childhood research. Intervention studies can provide an especially strong means for testing theories about the developmental significance of early experiences. Government-subsidized early childhood intervention programs, nutritional supplements, home visitation programs, and parent training programs are but a few examples. Program evaluations enable policy makers to assess how program funds are being spent, whether and to what extent programs are being implemented as planned, and, ultimately, whether program par-
76 FROM NEURONS TO NEIGHBORHOODS ticipation is having positive effects on those served and, if so, why. Know- ing that an intervention met its goals is a step in the right direction, but the real need is to move from this general conclusion to specific conclusions about which aspects of the intervention have which effects, to what degree, and under which circumstances (Rutter et al., in press). Assessing program impact on those served is once again a causal question. As in more basic research, threats to valid causal inference arise. Selection bias occurs when characteristics that predict program participation are also associated with outcomes. Simultaneity bias can also occur, especially when program ac- tivities and outcomes are studied over time. There is, however, an important distinction between the kinds of causal questions that arise in basic research and those that arise in program evalu- ations. Basic developmental science typically assesses causal connections between events that unfold naturally over time. Opportunities for experi- mentation are limited. In contrast, program evaluations assess the effects of deliberately planned interventionsâactivities that would not occur in the absence of a new policy. For this reason, it is often more plausible, both ethically and logistically, to conduct experiments in program evaluation than in basic developmental science. Such experimentation not only provides strong causal inferences about the impact of the program, but it can also provide new insights of great relevance to basic research. Experiments such as the Infant Health and Development Program (Gross et al., 1997), the High/Scope Project (Schweinhart et al., 1993; see Chapter 13) study of the long-term effects of high-quality child care, the Abecedarian Program (Campbell and Raney, 1994, 1995), and the Nurse Home Visitation Program (Olds et al., 1986, 1999) provide a wealth of knowledge about how early environmental en- richment affects short- and long-term cognitive and social development, knowledge that would otherwise be unavailable. Experimental evaluations have also shown that some promising ideas do not appear to translate into better childhood outcomes, a result that requires a deeper reflection on the validity of the theory behind the program, as well as on program implemen- tation. This interplay between basic and applied research is essential to the vitality of the field. This discussion may seem to imply that all program evaluations should be randomized experiments. Although we strongly suspect that random- ized experiments are underutilized in program evaluation research, they are not the right tool for addressing all questions about interventions and special conditions must hold before a randomized experiment is feasible or desirable. Moreover, well-planned experiments can unravel. Randomized experiments are of use when a clearly stated causal ques- tion is on the table. Many program evaluations are designed to answer other kinds of questions. Early in the life of a new program, the key
MAKING CAUSAL CONNECTIONS 77 question may be whether the program can be implemented as planned, whether the participants for whom the program is designed actually partici- pate, and how much the program costs. A test of the impact of the program generally makes sense only when the program is based on sound theory regarding modifiable mechanisms that are associated with the outcomes of interest (e.g., reducing maternal smoking will enhance newborn outcomes), when one is confident that the program can be faithfully implemented, and when there is reasonable assurance that the children and families of interest will participate as planned. A premature and expensive test of impact can be a waste of money and can demoralize those who are trying to invent promising new programs. The results from such an evaluation are difficult to interpret, creating confusion rather than clarification about policy and theory. Indeed, premature causal evaluation can undermine potentially promising programs. Logistical and political considerations are also extremely important. Suppose that a new program is ready for a test of its impact. A randomized experiment often becomes an attractive option, yet the decision about how to design a causal-comparative study must be made on a case-by-case basis. Randomized experiments are often feasible and ethically defensible. For example, when funds become available for a promising new intervention, there will often be considerable interest among parents in participating but insufficient resources to accommodate all interested families. In this set- ting, a lottery can be used to select who will participateâin effect, a ran- domized selection. If the results of the randomized experiment are promis- ing, resources may then become available to accommodate more families. However, in other cases, randomized experiments may not be feasible or desirable for logistical or political reasons. In still other cases, it may already be known from previous experimentation that a program works under the special conditions of the experiment. The question then may be whether the program produces significant effects in a routine (nonexperi- mental) setting. Nonexperimental methods are then required to cope with selection and simultaneity biases. It is also important to recognize that an initially randomized experi- ment can deteriorate under the impact of noncompliance, becoming a nonrandomized experiment, also called a âquasi-experiment.â An inter- vention often calls for a degree of investment on the part of participants that some, or many, find difficult to manage; they may drop out quickly, or attend training meetings sporadically, or âforgetâ to be at home when the home visitor is scheduled to arrive. The same processes are not at work with the comparable set of control families not receiving the program. Thus, in longitudinal evaluations, âdifferential attritionâ arises and selec- tion bias remains a problem despite efforts to conduct a randomized experi- ment. Even in these cases, however, it is important to keep in mind that the
78 FROM NEURONS TO NEIGHBORHOODS resulting quasi-experiment is likely to be much less biased than if no ran- dom assignment is attempted and parents are free to enroll children in the program or not. Moreover, few experimental evaluations are now imple- mented without ongoing monitoring of attrition and other forms of treat- ment attenuation. This means that attrition can be detected early, efforts can be undertaken to reduce it, and these efforts can be used to improve the quality of subsequent implementation (see Shadish et al., in preparation). Finally, statistical techniques are now available for obtaining relatively unbiased estimates of program effects despite noncompliance (see Little and Yau, 1998). In brief, they involve assessing effects of âthe intent to treatâ and examining the effects of the program on participants who received different âdosagesâ or amounts of the program. These advances have gone a long way toward addressing some of the problems to which randomized experiments can succumb. Even when a randomized experiment is impossible or unadvisable, however, âexperimental thinkingâ is central to success of causal or com- parative studies. Nonexperimental evaluations of program impacts can be viewed as more or less accurate approximations of âthe experiment we wish we could conduct but cannot.â The more accurate the approximation, the stronger the confidence that the evaluation has produced a valid causal inference. To understand why this is so requires an understanding of current thinking in statistical science about the nature of causation and the logic conditions for valid causal inference (see Appendix B). In short, this involves thinking hard about the randomized study one would conduct if it were feasible and ethical. First, we must be able to imagine an experiment in which each participant is randomly assigned to the treatment groups. In studying the effects of divorce on children, for example, we cannot randomly assign parents to obtain or not obtain a divorce. But we can imagine such an experiment and thus conceive of a childâs potential outcome under both conditions (the childâs outcome if the parents were or were not divorced). In a randomized experiment, the propensity for oneâs parents to divorce would be independent of the poten- tial outcomesâthe outcomes that would be observed if divorce did or did not occur. Although such an experiment is impossible, it can be approxi- mated by studying the propensity of couples to get divorced. For example, it might be possible to find, for each child of a divorced family, a child in a nondivorced family whose parents had the same predicted propensity to be divorced. These matched pairs might then be compared on outcomes. In imagining this study of divorce, it becomes clear that there will be many children of nondivorced parents with very low propensities to be divorced. In an analysis using propensity-score matching, such cases are likely to be discarded, because there may be no good matchâno child whose parents were divorced but who had a low propensity to be divorced.
MAKING CAUSAL CONNECTIONS 79 One might ask whether discarding those cases with low risk of divorce is sensible. Although statisticians generally dislike throwing away data, the argument here is that discarding those cases is indeed sensible. The causal question really has meaning only for those families for which divorce is a plausible course of action. It makes little sense to compare children of parents with strong marriages to children of parents who divorce as a strategy for understanding the effects of divorce. A vivid demonstration of this point is available in Cherlin et al. (1991). This last paragraph raises a crucial point about causal inference. There are often causal questions that are of interest only for a subset of the population. Whether to have heart bypass surgery is not a relevant ques- tion for persons with good cardiovascular health. And no one would conduct a randomized experiment in which persons with such good health, along with others, were randomly assigned to heart bypass surgery. This is not only an ethical concern. The impact of heart bypass surgery on persons with good cardiovascular health is not an interesting question for policy. Yet it is quite common to find researchers using survey data, for example, to examine âthe effects of divorceâ or the effect of low birthweight or the effects of infant care in an analysis using all participants. Such an analysis would use statistical procedures to control for extraneous variables. Yet participants with no chance or a very small chance of experiencing divorced parents or low birthweight or infant child care really contribute little or no useful information about the causal question of interest. Thus, the inclu- sion of such cases may distort findings. Thinking about the populations for whom the âtreatmentâ is relevant is intimately connected to the problem of generalizing from an experimental study to a different population, with a different variant of the intervention or treatment, with a different kind of outcome measure, in a new setting, and sometime in the future. We now turn to these issues of causal generalization. CAUSAL GENERALIZATION Studies of causal connections in early childhood research involve ex- plicit generalizations from the sampled domains of people, settings, times and contexts, causes, and effects to other situations in which the results might be applied (see Cook, 1990, 1993). Generalizability involves the inference that a research result arising in a given setting at a given time for a given set of participants, under a specific version of a treatment and for specific outcomes is likely to hold in some other setting at a later time with some other participants, under somewhat different versions of the treat- ment, and for somewhat different ways of assessing the outcomes. When applying research results from an intervention study to a natural setting, one is assuming that: (a) the âtreatmentâ would be implemented in the
80 FROM NEURONS TO NEIGHBORHOODS natural setting similarly, but not necessarily identically, to how it was implemented in the study, (b) the participants in the natural setting would respond similarly to how the participants in the study responded, (c) the effects of the treatment would be assessed similarly, and (d) events have not transpired over time to change the broader context in which the treatment is being implemented and assessed. In research on early childhood inter- ventions, the effectiveness of the intervention often depends on the knowl- edge and skill of the practitionersâthose who provide home visits, child care, or parental counselingâand on other services available in the commu- nity (see Olds et al., 1998). It is essential that researchers vividly describe the characteristics of those implementing the treatments; the training, skill, and supervision required to implement them effectively; and the resources available within the community where the program is being replicated. Such descriptions will help determine whether the conditions that may facilitate the success of the program are present in the natural setting. Defining the target populationâthat is, the families and children in- tended to benefit from a program (or assumed to be affected by the risk factor of interest)âis also essential, and a well-crafted statistical analysis can provide useful information on how children of varying backgrounds respond to a causal variable. Ideally, study participants would be a prob- ability sample randomly selected from a well-defined target populationâ that is, the universe of families and children for whom the causal question is relevant. A probability sample is obtained when every element (e.g., family or child) in the target population has a known, nonzero probability of being included in the study. Such a sample makes it possible to compute estimates of effects that are unbiased for the target population. Randomized experiments, however, rarely involve probability samples from well-defined populations. Some effort is generally required to con- vince child care providers or pediatricians or parents or children to partici- pate in a randomized experiment, generally making random selection from a population impossible. And cost concerns and logistics generally require that randomized experiments be carried out in local settings. A nationally or regionally representative sample is generally far too dispersed to be used in an experiment. The trade-off between causal credibility and representativeness is known in the methodological literature as the trade-off between internal and external validity (Cook, 1993; Cook and Campbell, 1979). âInternal validityâ is the validity of a causal inference, the kind of credibility obtained from a well-run randomized experiment. âExternal validityâ is the validity of generalizations made from the study to the practical setting in which a new treatment or program might be implemented, the kind of credibility that might come from a survey based on a probability sample. The trade- off arises in part because it is usually impossible to conduct experiments on
MAKING CAUSAL CONNECTIONS 81 participants who have been sampled with known probability from the population of interest. It also arises because the special circumstances required to construct a randomized experiment often create a research scenario (settings, implementers, and participants) that is quite different from the settings of practical interest. Yet even having a probability sample of settings, implementers, and participants from the population of interest would not, in itself, guarantee a high level of external validityâwhat is called âgeneralizability.â To illustrate this point, suppose that a home visitation program works very well for families of type A but very badly for all other families (families of type B). Also suppose that the researcher is unaware of this fact and has the luxury of conducting a true experiment on a random sample of families. The researcher might then report that âthe average treatment effect is near zero.â While such a statement may be true, it would disguise the reality that the treatment had a very good or a very bad effect, depending on the type of family. Thus, the conclusion, even though based on a seemingly perfect design, would be misleading. The generalization would apply to no one, neither to families of type A nor of type B. In this situation, it is essential to consider the concept of a âmodera- tor,â a preexisting characteristic of families or children on which the impact of the treatment and magnitude of the treatment effect depend. It is pos- sible, for example, that seriously depressed women are less responsive to home visiting interventions. In this case, maternal depression moderates the treatment effect and it would be advisable to assess effects separately for depressed and nondepressed mothers. Using the concept of a moderator, one can assess the generalizability of a treatment effect in some detail within a single-site study, across sites of a multisite study, and across stud- ies in a research synthesis. This kind of investigation is far more manage- able when randomization is feasible and ethical in each study, because nonrandomized studies involve confounders as well as moderators. Mod- erators can, however, be overused. It is critical to choose a priori modera- tors suggested by previous research that are most plausible theoretically for subsequent exploration in order to avoid random hunts for moderating influences in the absence of evidence of program effectiveness. One of the most common procedures for studying the generalizability of findings from multiple studies is âmeta-analysisâ (see the comprehensive review of Cooper and Hedges, 1994). A meta-analysis can be thought of as an unplanned multisite study. If it had been jointly planned by all of the investigators, care would have been taken to ensure that similar outcome variables were used in every study; that treatment conditions were stan- dardized; and that key dimensions of site-level variation were incorporated into the design. As a retrospective form of inquiry, meta-analyses do not have the luxury of capitalizing on such planning. Nevertheless, a stream of
82 FROM NEURONS TO NEIGHBORHOODS inquiry on a common set of hypotheses in developmental research usually includes interesting variation in participants, implementers, sites, and treat- ment conceptions, as well as interesting variation in methodological ap- proaches. Using meta-analysis, it is possible to exploit this variation to study intensively the degree of generalizability of a treatment effect and the specific sources of variation in the treatment effect. CAUSAL INFERENCE AND GROWING CHILDREN Growth and change are pervasive and typically rapid during early child- hood. For this reason, studies using repeated measures on each outcome are common. There are good reasons to do longitudinal studies: cross- sectional differences in height, weight, vocabulary, quantitative reasoning, and motor control may be of little interest compared with understanding childrenâs growth trajectories on each of these outcomes. An intervention to enhance growth in any of these areas can affect cross-sectional status only by deflecting the growth trajectory. Over time, a shifted trajectoryâ what we refer to throughout this report as shifting the oddsâwill produce substantial shift in expected status, but the shift in growth rate is the leading indicator. Studies of causal effects on growth may be significantly more powerful than studies of status, particularly when the number of participants is strongly constrained by cost or logistical issues. In principle, all of the ideas we have discussed apply to studies of growth as well as developmental status, if we simply reconceive the out- come as some interesting aspect of growth, such as an average rate of change or an acceleration rate rather than a cross-sectional outcome. How- ever, assessing growth poses special problems of measurement, design, and analysis. Measuring growth is challenging for such psychological outcomes as vocabulary or quantitative reasoning, less so for physical characteristics such as height and weight. New design challenges arise in experimental studies because repeated measurements pose a risk of attrition, and because subtle forms of confounding can arise that are not present in cross-sectional research. Methods of analysis are typically more challenging as well. Growth curve analysis has a long history in biology and medicine. Models for growth in stature during childhood, for example, have been developed and refined over many years. In measuring human height (or weight or lung capacity, for example), there is little disagreement about the meaning of the construct being measured or about the units of measure- ment (e.g., centimeters, grams, cubic centimeters). Unreliability of mea- surement is not a large problem. Measuring growth in psychological domains (e.g., vocabulary, quanti- tative reasoning, verbal memory, hand-eye coordination, self-regulation) is more problematic. Disagreement is more likely to arise about the definition
MAKING CAUSAL CONNECTIONS 83 of the construct to be assessed. This occurs, in part, because there are often no natural units of measurement (i.e., nothing comparable to the use of inches when measuring height). As a result, units of measurement must be created and defended, and errors of measurement are likely to be quite large. This becomes especially problematic when the outcome of interest changes as children matureâas is the case with achievement outcomesâor when transitions are involved, as with the development of literacy. For example, once a child acquires visual recognition memory (between 3 and 9 months), it becomes more appropriate to assess the number of words the child knows and, later, to assess prereading skills. To compound this problem, it can be hard to reach agreement about the appropriate age range for which a particular psychological construct is relevant. Nevertheless, growth in these psychological domains is of great interest. Many important interventions are designed to enhance psychological growth, and theories of development depend on hypothesized causal chains that explain human variation in the rates of such growth. Another obstacle to studies of change is the cross-sectional orientation of psychometrics. When social scientists speak of reliability of measure- ment, they are almost invariably describing cross-sectional reliability: that is, the reliability with which one can distinguish among individuals at any given time. The study of cross-sectional individual differences, especially differences in cognitive functioning, has had a powerful and enduring influ- ence on the theory and practice of measurement in psychology. Only recently have researchers begun to take seriously the reliability with which one can distinguish among individuals in rates of change, acceleration, or other aspects of developmental trajectories (see Willett, 1988, for a review). An example may prove instructive. Consider Figure 4-1, which dis- plays expressive vocabulary as a function of age for three children, based on the work of Huttenlocher et al. (1991). The researchers took pains to estimate the total number of words in a childâs expressive vocabulary on each of multiple occasions during the second year of life, a period during which vocabulary rapidly grows from a starting point near zero at age 12 months. Note that a curved line with positive acceleration neatly fits the repeated measures for each child. What is distinctive about each childâs growth record is not the starting point (vocabulary is near zero at age 12 months for all three children) nor the standing of the child at any time point, but rather the rate of acceleration for each child. This rate of acceleration is increasingly well measured as time points are added. Subse- quent analyses found a strong relationship between maternal speech and vocabulary acceleration. The statistical power of such analyses was strengthened by the fact that it effectively incorporated all the data in a single analysis. That is, every occasion of measurement contributed to understanding a single crucial aspect of growth (acceleration), enabling the
84 FROM NEURONS TO NEIGHBORHOODS researchers to discover relations that had proved elusive in studies of change in relative status. The following appear to be the key ingredients in studies of quantita- tive growth: â¢ A clear definition of the outcome variable or construct on which children are believed to be growing. â¢ A measurement unit or scale that has constant meaning over the age range of interest (e.g., height in inches or the number of words in a childâs expressive vocabulary). â¢ An outcome that can be measured on a common scale across ages, such that the alternative, age-appropriate forms of the assessment can be equated, that is, put onto the same meaningful scale.3 â¢ A statistical model for individual change over time. During the second year of life, for example, the appropriate model for vocabu- lary is a positively accelerating curve, as depicted in Figure 4-1. â¢ A longitudinal study that is optimally designed to ensure a given level of statistical precision for the question at hand. Trade-offs among the length of the study, the frequency of observation, and the sample size are invariably involved. These design choices strongly affect the reliability and validity of individual measures of change. These choices can also affect the internal validity of quasi-experimental studies. Experts on developmental change have emphasized the value of interrupted time-series designs when children are growing, especially when randomized experiments are not feasible (Bryk and Weisberg, 1977; Campbell and Erlebacher, 1970; Glass et al., 1972; Porter, 1967; see also Blumberg and Porter, 1983). In these designs, multiple pretreatment obser- vations are taken in order to establish a pretreatment trajectory for the children. Figure 4-2 illustrates the value of this approach. Designs that include only one pretest before a treatment or intervention, followed by a posttest after the treatment (see shaded portion), cannot distinguish whether the apparent gains made by the participants (thick line) compared with the 3Standardization within age, as is common in IQ tests, eliminates the possibility of a mean- ingful scale with respect to the construct of interest (e.g., cognitive ability) and therefore distorts the study of growth on that construct. Such standardized scales can exhibit shifts in the relative standing of persons, but they cannot reveal rates of growth with respect to the behavioral domain. One typical result is that individual differences in estimates of change become substantially less reliable after standardization, undermining the capacity of interven- tion studies to discover effects.
MAKING CAUSAL CONNECTIONS 85 12 14 16 18 20 22 24 26 Age (months) 800 600 400 200 0 Vo ca bu la ry s ize (n u m be r o f w o rd s) Actual word count: Child 1 Child 2 Child 3 FIGURE 4-1 A sample of individual vocabulary growth trajectories. SOURCE: Huttenlocher et al., 1991. NOTE: â¢, â, and â represent actual word counts for three individual children. FIGURE 4-2 Distinguishing treatment effects from growth in time-series designs. Pre-pretest Pretest Posttest Participants Controls
86 FROM NEURONS TO NEIGHBORHOODS controls (thin line) are attributable to the intervention. If an additional pretest had been given, however, it would be possible to tell if the treatment actually accelerated the growth of the treated children relative to the con- trols (see thick line from the pre-pretest to the pretest for the participants) or if the children were already showing different rates of growth prior to the treatment (see dashed line). In this later case, the treatment actually had no effect; these patterns of growth would have been predicted without the intervention. In sum, it is often essential in studies of early childhood development to recognize that children are rapidly growing. Causal inference on aspects of child growth poses important issues that extend beyond efforts to make causal connections between an intervention and a set of child outcomes at a given age. When the growth of interest is psychological, it is challenging to define clearly the dimensions on which children are growing, to devise assessments that are sensitive to growth, and to evaluate the capacity of alternative designs to reliably gauge individual differences in growth. For- mulating and criticizing statistical models is essential to defining causal effects and considering threats to valid inference. Explicit models are espe- cially important in cases in which participants are rapidly growing, because the meaning of growth and of causal effects on growth must be made explicit if progress is to be made in assessing the quality of the assessments or the utility of alternative designs for capturing these causal effects. CONCLUSIONS At the beginning of this chapter, we emphasized the importance of combining insights from basic and applied research to gain a fuller under- standing of early development and the influences that guide and affect it. Basic research is designed to provide detailed observations of development and to test theories about causal mechanisms. It is often difficult, however, to meet the conditions that lead to strong causal inferences. In contrast, applied research avails itself of interventions and natural experiments that can often provide better evidence of causation and, when studies are de- signed appropriately, can help to specify the mechanisms involved. The challenge to researchers is twofold. The first involves designing studies and evaluations that successfully capture causal information. The second is to integrate the evidence from basic and applied research to evaluate alterna- tive explanations for development and discern their implications for poli- cies aimed at improving childrenâs life chances. In the final analysis, knowledge is advanced not through a single, deci- sive study, but by integrating evidence generated by different strategies, with different strengths and weaknesses. The research that generates this knowledge is, under the best of circumstances, a cumulative process that
MAKING CAUSAL CONNECTIONS 87 starts with rich descriptive data about the phenomena of interest, moves to understanding connections between outcomes and important influences on them, and finally seeks to identify causal relations and mechanisms. This chapter has focused on the final stage of this sequence, given its importance to both theoretical and political debates about the role of early experience in child development. Its purpose has not been to assert the superiority of causal studies, but rather, when causal questions are being addressed by research, to illustrate the key issues that arise and the critical importance of being tough-minded about ensuring that the conditions for making valid causal inferences are met. Only when the limits of current knowledge and the best thinking about improved designs are clear can we plan research that will contribute significantly to knowledge in the future.
B89 The Nature and Tasks of Early Development II etween the first day of life and the first day of kinder- garten, development proceeds at a lightning pace like no other. Consider just a few of the transformations that occur during this 5-year period: â¢ The newbornâs avid interest in staring at other babies turns into the capacity for cooperation, empathy, and friendship. â¢ The 1-year-oldâs tentative first steps become the four-year-oldâs pir- ouettes and slam dunks. â¢ The completely unself-conscious baby becomes a preschooler who not only can describe herself in great detail but also whose behavior is partially motivated by how she wants others to view and judge her. â¢ The first adamant âno!â turns into the capacity for elaborate argu- ments about why the parent is wrong and the preschooler is right. â¢ The infant, who has no conception that his blanket came off because he kicked his feet becomes the 4-year-old who can explain the elabo- rate (if messy) causal sequence by which he can turn flour, water, salt, and food coloring into play dough. It is no surprise that the early childhood years are portrayed as forma- tive. The supporting structures of virtually every system of the human organism, from the tiniest cell to the capacity for intimate relationships, are constructed during this age period. The fundamental capabilities that en-