Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

546 APPENDIX B treatment C.1 These are called potential outcomes. Children are viewed, then, as having potential outcomes, only some of which will ever be real- ized. Several conclusions follow from this definition. First, the causal effect is defined uniquely for each child. The impact of the treatment can thus vary from child to child. Modern thinking about cause thus rejects the conventional assumption that a new treatment adds a constant effect for every child. This assumption, never realistic to scientists or practitioners, was historically made to simplify statistical analysis. Second, the causal effect cannot be observed. If a given child is assigned to E, we will observe the outcome under E but not the outcome under C for that child. But if the child is assigned to C, we will observe the outcome under C but not the outcome under E. Holland (1986) refers to the fact that only one of two potential outcomes can be observed as the fundamen- tal problem of causal inference. Third, although a given child will ultimately receive only one treat- ment, say, treatment E, it must be reasonable at least to imagine a scenario in which that child could have received C. And similarly, even though another child received C, it must be reasonable to imagine a scenario in which that child had received E. If it is not possible to conceive of each childâs response under each treatment, then it is not possible to define a causal effect. There must, then, be a road not taken that could have been taken, for each child. Thus, both the outcome under E and the outcome under C must exist in principle even if both cannot be observed in practice. Therefore, in current thinking about cause in statistical science, a fixed attribute of a child (say sex or ethnic background) cannot typically be a cause. We cannot realistically imagine how a girl would have responded if she had been a boy or how a black child would have responded if that child had been white. Epidemiologists referred to such attributes as fixed mark- ers (Kraemer et al., 1997), unchangeable attributes that are statistically related to an outcome but do not cause the outcome. This theory of causation provides new insights into why randomized experiments are valuable. It also provides a framework for how to think about the problem of causal inference when randomized experiments are not possible. According to the RRH theory, the problem of causal inference is a problem of missing data. If both potential outcomes were observable, the causal effect could be directly calculated for each participant. But one of the potential outcomes is inevitably missing. If the data were missing com- pletely at random, we could compute an unbiased estimate of the average 1The causal effect could also be defined as the ratio Yi(E)/Yi(C), depending on the scale of Y, but we limit this discussion to causal effects as differences for simplicity.

DEFINING AND ESTIMATING CAUSAL EFFECTS 547 causal effect for any subgroup. A randomized experiment ensures just that: that the missing datum is missing completely at random, ensuring unbiased estimation of the average treatment effect. Suppose, by contrast, that E or C could be selected by each childâs parents. Suppose further that more-advantaged parents tended to choose E while the less-advantaged parents tended to chose C. Then the potential outcomes would be nonrandomly missing. The outcome under E would come to be observed more often for advantaged than for disadvantaged children. Selection bias is thus a problem of nonrandomly missing data. Even more insidiously, suppose that some parents had previous knowl- edge about how well their child is likely to fare under the new day care program. For example, one parent might know that, without the new pro- gram, her child will be cared for by the paternal grandmother, who is known to be a master teacher of young children. Thus, this parent decides not to participate in the new day care program, knowing that the child will probably do better without it. Other parents who know their families do not include talented teachers with time to care for their child choose the new program. Such information is rarely available to researchers, yet it produces nonrandomly missing data. We view the probability of assignment to E to be the propensity to receive the experimental treatment or simply âthe propensity scoreâ (Rosenbaum and Rubin, 1983). Under random assignment to treatments, the propensity score is independent of the potential outcomes. In the hypo- thetical case above, by contrast, family advantage is related to both the propensity score and to the potential outcomes. This creates a correlation between the propensity and the potential outcomes. Now suppose that it is impossible to conduct a randomized experiment but it is possible to deter- mine exactly how family circumstances translates into propensityâthat is, how families get selected into the treatment. We could then implement a statistical procedure: â¢ For every possible participant, predict the propensity of being in the experimental group. â¢ Divide all sample members into subgroups having the same propen- sity. â¢ Within each subgroup, compute the mean difference between those in E and C as the average treatment effect for that group.2 2In a variant of this procedure devised by Robins, Greenland, and Hu (1999), sample weights are computed that are inversely proportional to the propensity of receiving the treat- ment actually received. Experimental and control groups are then compared with respect to their weighted means. This procedure minimizes the influence of persons with the strongest propensity to receive the treatment they received and eliminates bias in estimating treatment effects when the propensity is accurately predicted. The method has especially useful applica- tions when the treatments are time-varying.

548 APPENDIX B â¢ Average these treatment effects across all subgroups to estimate the overall average treatment effect. The resulting estimate will be an unbiased estimate of the average treatment effect. Every comparison between those in E and those in C involves subsets of children having identical propensities to experience E. Therefore, the potential outcomes of the children compared cannot be associated with their propensities, and the estimates of the treatment effect will be unbiased. This procedure also makes it easy to estimate separate treatment effects for each subgroup. When children are matched on propensity scores, the validity of the causal estimate depends strongly on the investigatorâs knowledge of the factors that affect the propensity to experience E versus C. More specifically, if some unknown characteristic of the child predicts the propensity to be in E versus C, and if that characteristic also is associated with the potential outcomes, then the estimate of the treatment effect based on propensity score matching will be biased. The assumption that no such confounding variable exists is a strong assumption. It is the responsibility of the investigator to collect the relevant background data and to provide sound arguments based on theory and data analysis that the relevant predictors of propensity have been controlled. Even then, doubts will remain in the minds of some readers. In contrast, all possible predictors of propensity are controlled in a randomized experiment, including those that would have escaped the attention of the most thoughtful investiga- tor. Rosenbaum (1995) describes procedures for examining the sensitivity of causal inferences to lack of knowledge about propensity when randomization is impossible. Perhaps the most common strategy for approximating unbiased causal inference in nonexperimental settings is the use of statistical adjustments. In early childhood research, it is very common to use linear models (regression, analysis of variance, structural equation models) to adjust estimates of treat- ment impact for covariates related to the outcome. These covariates must be pretreatment characteristics of the child or the setting, and the aim is to include all confounders in the set of covariates controlled. By statistically âholding constantâ the confounders in assessing treatment impact, one aims to approximate a randomized experiment. Under some assumptions, this strat- egy will work. In particular, if the propensity score (the probability of receiv- ing treatment E) is a linear function of the covariates used in the model, then this adjustment strategy will provide an unbiased estimate of the treatment effect. Aside from the possible fragility of this assumption, this strategy is limited, in that only a relatively small set of covariates may be included in the model. In a propensity score matching procedure mentioned earlier, it is possibleâand advisableâto use as many possible covariates as one can ob- tain in the analysis that predicts propensity.

549 Technologies for Studying the Developing Human Brain C NEUROPSYCHOLOGICAL TOOLS The strategy behind the use of neuropsychological tools is to generate a hypothesis about which area of the brain is involved in a particular behav- ior and then employ a behavioral test (or tests) to evaluate this hypothesis. Ideally one is able to dissociate one behavior from another (e.g., explicit from implicit memory) using a cluster of tasks or by applying such tasks to both normative and clinical populations. In terms of elucidating brain-behavior relations in normative samples, neuropsychological tools are frequently adopted that have first been used in animal models or in clinical populations of humans. For example, if one is interested in the type of memory subserved by the medial temporal lobe (i.e., episodic memory), one might employ tasks that have been demon- strated in monkeys or in humans in whom the hippocampus has been lesioned through surgery or through injury to result in memory impair- ments. The use of neuropsychological tools has received extensive study in the developing human. For example, Diamond has employed the Piagetian A- not-B task and its animal analogue, the delayed response task, to study the development of certain functions subserved by the prefrontal cortex (e.g., spatial working memory; see Diamond, 1990; Diamond and Doar, 1989; Diamond and Goldman-Rakic, 1989; Diamond et al., 1989). And Bachevalier (with respect to the monkey) and Nelson (with respect to the human) have utilized a set of tools (e.g., visual paired comparison; the