Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Preschool Education for Disadvantaged Children David R Weikart A continuing problem in American education is how to curb the widespread failure in school of children from disad- vantaged backgrounds. Many programs have been developed in response to this problem, a large number at the pre- school level. Although it seems fairly certain that preschool intervention may facilitate a child's adjustment to and progress in school, participation in these programs does not ensure them. This paper discusses some aspects of the history of early childhood education, describes some exemplary programs, describes methods used to evalu- ate their effectiveness, and presents some alternative methods of evaluation. The social pressures for general reform in society and especially in education produced one of the most enduring Great Society programs, Head Start, in the summer of 1965. Based on a few adventurous programs established in the early 1960s, this eight-week effort was to accelerate disadvantaged children and allow them to enter school at an intellectual and academic level equal to their middle- class advantaged peers. The fate of these expectations is well known. The Westinghouse-Ohio University study (1969) of longitudinal findings on Head Start recorded the lack of any long-term intellectual or academic impact from Head Start participation. These findings all but eliminated Head Start from a political point of view. In 1970 the program itself was saved only by the direct lobbying efforts of parents of Head Start children, who had learned their skills in local Community Action Project battles, and by the Office of Child Development (now the Administration for Children, Youth, and Families). The program's rationale was converted to the delivery of social and health services. As such, Head Start limped along with level funding for almost a decade, written off 187
188 by news media as well as politicians, carefully nursed by staff at the regional and national levels, dedicated professionals in early education, and by parents who could see in their own families the benefits that Head Start provided. EARLY EDUCATION PROGRAMS While these changes occurred in the nature of the Head Start program, a quiet revolution was under way regarding the effectiveness of early education programs in general. Information on the effects of preschool, which had been accumulating from a range of studies initiated in the 1960s, were becoming available to the public and to policy makers. Before discussing assessment issues, it may be useful to summarize the state of those data. One source of information is a collection of articles reviewing the problems, issues, processes, and successes of Head Start over the years (Zigler and Valentine, 1979). One of the major sources of information is the Consortium for Longitudinal Studies (1981). The consortium represents an effort by 14 early childhood education researchers to pool data from the early 1960s with more recent follow-up information to evaluate the impact of early education experiences on disadvantaged children. Although the studies differ greatly in terms of sample' rigor of research methodology, geographic locale, instrumentation used, etc., they represent a major body of information on effectiveness of early childhood education. This paper draws extensively on several of these studies, conducted by the High/Scope Foundation, because of their pivotal role in the design and collection of family-based data, cost data, and postschool records used by other studies. The Ypsilanti Perry Preschool Project: Preschool Years and Longitudinal Results Through Fourth Grade (Weikart et al., 1978a) is a study of the long-term effects of preschool education on a group of "high-risk" disadvantaged children as they progressed through the early elementary grades. Grounded in a rigorous methodo- logical framework, the study provides evidence that preschool made a different in these children's lives. The impact of the preschool experience on their school achievement and grade placement, compared with the control group, has been positive and sustained. (See Schweinhart and Weikart, 1980, for a follow-up of these children through ninth grade.)
189 The Ypsilanti Preschool Curriculum Demonstration Project: Preschool Years and Longitudinal Results (Weikart et al., 1978b) presents and analyzes data from an experiment designed to compare the impact of three programs, which represent the dominant approaches to preschool education during the late 1960s. The principal findings were that (1) the programs were equally effective both during and after preschool and (2) the children's cognitive gains were still being maintained five years after they entered elementary school. An Economic Analysis of the Ypsilanti Perry Preschool Project (Weber et al., 1978) is a study of the social rate of return (the return to society) of public invest- ment in the Ypsilanti Perry Preschool Project. The benefits and costs for the experimental group were compared with those for the control group using the human capital approach of economics. In the analysis the economic benefits of the preschool program were quanti- fied; then, by comparing the costs of the educational program with these economic benefits, the rate of return on the investment was calculated. Although these results are primarily illustrative, because they are based on a small sample and because the computations required some broad assumptions about the applicability of census data to the studied cohorts, the results appear to show that the costs of the program were more than compensated by benefits to society. The economic benefits were derived from (1) less costly education (i.e., less special educa- tion and institutionalized care) for the experimental group, (2) higher projected lifetime earnings for this group, and (3) time released from child care responsi- bilities for the mothers of this group. It is important to examine the methods used to determine outcomes of education programs. Standardized tests, indeed, any measurement of immediate or inter- mediate outcomes, are merely approximations of real-world goals that education purports to reach. Educators in particular and the public in general have long been enamored of tests of short-term outcomes as though they stood for something real. Early grade achievement correlates with twelfth-grade achievement "somewhere between .75 and .95" (Bloom, 1964:97), but what such correlations mean in terms of actual adult performance is unclear. The functioning of adults includes such factors as job performance capacity, ability to relate to peers, willingness to learn from experience, interest in being a contributing member of a group, capacity to earn one's
190 own way in the world, and ability to manage as an effec- tive family member. These general goals are little pre- dicted by the type of short-term tests available to educators at this time. Yet these are the goals that make a difference to both the individual and society at large. MEASUREMENT OF OUTCOMES Measurement of outcomes in early childhood education programs occurs at three points. First, assessment during the program itself guides the staff as to the development of the participating child and the effective- ness of the program. Second, at the end of the program summative measures are used to assess immediate program outcomes. Third, assessment after completion of the program is used to study its long-term validity. Formative Program Evaluation Assessments made during the program use several methods. Typical procedures include staff observation and ratings of a child's progress, focusing on the child's development and facilitating interaction. Not only can the progress of the child be rated along the theoretical dimensions demanded by the curriculum, but the classroom system or organization and management can also be appraised. Central features of program space and operational needs are arranged in checklist format so that each can be studied for presence or absence in the program being evaluated. Such evaluations can be done by trainers or by the staff itself. In addition to various checklists for teacher (and parents) to use in evaluating the progress of program development and the path of child growth within the daily experience, there are other, more systematic methods. Observation scales have been carefully developed to give a time sampling of the actual behavior of teacher and child in the classroom. These can be genuine outcome measures when the goal is to document how children spend their time in learning-teaching situations and how the life of the child in one curriculum compares with the life of another. Perhaps the best known approach is that of Jane Stalling's study (1975) of classrooms in Follow Through.
191 The problems involved in using observation schedules are sufficient to daunt even the most enthusiastic supporter. Constant supervision is necessary to obtain reliability in observations. This problem of reliability is usually solved through rigorous training, vigorous onsite supervision, and careful development of the final instrument to be used in the field. Thus, almost all observational schemes are tailored to specific programs. In addition, most observation instruments must reflect the theoretical nature of the program observed. Innova- tive preschool educational programs differ greatly, and procedures to capture the basic goals of a particular program do not necessarily generalize to other situations. A final issue is the cost of training, observing, scoring, and reporting the findings from observation procedures. (Generally such costs are prohibitive, except for well- financed research projects.) While some cost control can be achieved by carefully selecting the youngsters to be observed through small-sample, random selection proce- dures, keeping the use of the method to a minimum, systematic observations are then for program validation and not for individual child diagnosis. Other methods exist for evaluating a program during its actual operation. Practitioners skilled in the curriculum used in a classroom can be employed to give a professional assessment (see Miller and Dyer, 1975). Weikart et al. (1978b) used a system of professional consultants to summarize opinions of classroom operation based on direct observation. Parent committees, opera- tional standards, licensing officials, etc. all offer some means of gaining information on immediate operations. The more general the method, however, the less valuable the outcomes. In short, immediate information from daily operation is possible through the use of checklists and rating forms, direct time sampling of ongoing classroom opera- tions, and general opinions of those who have contact with the classroom. Such information is most useful to those responsible for the daily operation of the program and the quality of opportunity provided to the children. In addition, information can be gained on the equality of life" the children experience, and such information may be the primary basis of recommending one curriculum or another for specific children. Research has not yet related these different experiences to performance as children progress through school or to adult performance.
192 Summative Program Evaluation When programs are complete, a summative evaluation is often undertaken, although the emphasis historically given to this type of evaluation in early education projects has been questioned recently. Several issues are involved. Should early childhood programs have to defend their con- tribution to the child's development through careful evaluation if first grade or third grade, for example, have never been so evaluated? The need for summative and longitudinal data for validation of preschool has been raised only in connection with disadvantaged children. Middle-class parents seek experiences for their children and judge their effectiveness on their own impression of their child's progress and happiness; disadvantaged groups, some feel, should have the same prerogative. From another viewpoint, others have stated that long-term out- comes are what are important and end-of-project informa- tion is irrelevant (Smilansky, 1979). Although the timing of the evaluation is an issue, instrumentation raises the most questions. Assessments of preschool effectiveness have used two major types of instruments: standardized, individually administered intelligence tests, typically the Stanford- Binet (S-B), and standardized achievement tests, such as the Metropolitan Achievement Test or the California Achievement Test. These instruments have been used because of their power to predict performance in the elementary grades and their reliability. The use of these two types of instruments has generated considerable political and social debate. Whether these instruments measure the "true" abilities of disadvantaged children in general and disadvantaged minority children in particular has been at issue essentially because of the failure of disadvantaged children to "do well" on these instruments upon completion of intervention pro- grams. Many thoughtful commentators have seen the problem as one of bias in the instruments and have questioned their cultural relevance. Legal proceedings in California have proscribed the use of individualized intelligence tests as the basis for placing youngsters in special education programs. Some viewers have seen the problem as a lack of congruence between the program goals and the specific content of the measurement instrument. For example, experience-based approaches to reading do not employ or teach the standard vocabulary list that forms the basis of the reading sections of most achievement
193 tests. The book by Jensen (1979) on mental testing is likely to accelerate this debate. Figure 1 illustrates the classic pattern of a successful preschool intervention with a nontreatment control in terms of standardized IQ testing. The data are from the Ypsilanti Perry Preschool Project (Weikart et al., 1978a). While the youngsters start with nearly identical S-B scores, in the spring of their second preschool year (S2Y) the average score of the experimental group reflected a gain of 15.3 points from the fall entry year (FEY), 10.3 points more than the control group (Figure 1). One year later, in the spring of their kindergarten year, the experimental group reflected a gain of 11.7 points from FEY, only 4 points more than the control group who had gained additional points upon school entry. Although differences between the two groups remained significant through the first grade, the performance of the experimental group gradually declined once they entered elementary school. The pattern of performance in the control group merits consideration in its own right. Since the children in the sample were selected specifically because of their low socioeconomic status (SES) and low S-B scores, it was anticipated that their S-B scores would increase somewhat--"regressing toward the population mean"--upon second testing, regardless of treatment. The change in IQ of the control group from initial to second testing at the end of the first project year was +4.8 points. This gain is the best estimate of the regression toward the mean in S-B IQ for children in this sample. It seem unlikely that testing procedures or acclimation to the test situation accounts for this gain since procedures were unchanging and closely resembled Zigler and Butterfield's (1968) "optimizing" conditions. Although a practice effect might be confounded with regression toward the mean, this too seems improbable given the nature of the test and the length of time between test administrations. Assuming that the regression effect was of the same magnitude in the experimental group, then perhaps only 10.5 points of the experimental group's 15.3-point gain in S-B IQ over two years of preschool represents the impact of treatment. This estimate of "true" gain is approximately equal to the actual difference in mean IQ between experimental and control groups measured at the end of preschool.
194 1()5 1()() 95 ._ m ~ . ~ no 0 90 0 c: - 85 80 75 / Treatment | No Treatment (Preschool) - (Elementary School) 1 1 1 1 1 1 1 1 EXP =- CON = ~ FEY SKY S2Y SKG SIG S2G S3G S4G Time of Data Collection Arithmetic Means, Standard Deviations, Number of Subjects. and Significance Levels of F Tests on Group Comparisons at Each Testpoint Time of Data Collection FEY SKY S2Y SKG SIG S2G S3G S4C Mean 79.6 95.5 94.9 91.3 91.7 88.1 87.7 85.0 EXP (S.D.) ( 5.9) (11.5) (13.0) (12.2) (11.7) (13.1) (10.9) (11.3) N 58 58 44 56 58 55 56 57 Mean 78.5 83.3 83.5 86.3 87.1 86.9 86.8 84.6 CON (S.D.) ( 6.9) (10.0) (10.2) ( 9.9) (10.2) (10.7) (12.5) (11.2) N 65 65 49 64 61 62 61 57 Signtficanceof N.S. c 01 <.01 <.05 <.05 N.S. N.S. N.S. F tests presented here were obtained in three-way analyses of variance (Group x Sex x Wave) reported in the Statistical Supplement. Part A, Tables la-lc. FIGURE 1 Average Stanford-Binet intelligence scale scores for experimental and control groups. (Source Based on Weikart et al., 1978.)
195 The upward inflection in the control group's perform- ance curve upon enrollment in kindergarten deserves comment. On the average, children in the control group gained 2.8 points in IQ during kindergarten and another point during first grade. It seems likely that gains of this magnitude might be expected for any group of dis- advantaged children confronting a new and challenging educational experience. Bloom (1964) uses the term "freshman effect" to describe this impact of new environ- ment and new demands on individual intellectual perform- ance. By the end of the fourth grade, however, this school-related effect was no longer evident, and the control group's performance had dropped to the level attained at the second testing. Preliminary analyses of the Wechsler Intelligence Scale for Children (WISC) full-scale IQ scores obtained on eighth-grade children confirm the finding of no difference in measured aptitude obtained at the fourth- grade 1-ever. By this point the performance of both experimental and control children was indistinguishable from entry-level performance on the S-B. The gradual attenuation of intelligence test gains following preschool intervention in the Perry project parallel the findings of most other compensatory preschool studies. The erosion of preschool effects once children enter regular public school is now a familiar pattern in educational evaluation. Explanations of this loss include the shift in the content of the test items to include more verbal and abstract concepts and the understimulation of children as a result of the increasing isolation from ideal learning environments. Two apparent exceptions to this pattern of vanishing IQ gains are reported in the literature and should be mentioned. Karnes (1973) reports on three programs that maintained some small part (about 6 points) of their initial IQ gains through the third grade. Weikart et al. (1978b) report on three programs that maintained about 15 points of their initial IQ gains through the fourth grade; children in the programs continued gains in IQ through the eighth grade, a decade after intervention. The findings of studies using data from the Consortium for Longitudinal Studies on achievement tests tend to be positive. Several projects report either continuous achievement gains for experimental groups over control groups or a gradually evolving significance of the experimental group scores over those of the control group. This latter phenomenon is often termed a
196 "sleeper" effect. However, it might more accurately be called weak program impact as the stronger programs show initial and continuing gains in achievement. In the Perry project, these gains become stronger each year, including the last test point at age 19, when a test of general competency was given. ALTERNATIVE ASSESSMENT PROCEDURES While it appears difficult to avoid the use of standardized assessment procedures for summative testing, two alternatives seem feasible. The first is the develop- ment of instruments that measure factors outside the confines of standardized tests. Efforts to create tests of emotional development, cognitive style, self-concept, etc. have had a history of failure in early childhood assessment; the examples of Follow Through and National Planned Variation Head Start are well known. There appears to be little possibility that psychometrically sound instruments could be developed, even with a massive infusion of funds. Other testing procedures have shown potential in programs such as the Educational Development Corporation's Project Torque to assess redevelopment of mathematical concepts and in High/Scope's efforts to assess the development of language competency through generative testing procedures. (In a generative test, students provide both questions and answers or have full control over the sophistication of their responses.) The High/Scope Cognitively Oriented Curriculum is based on the idea that the child generates his or her own learning within a structure designed and supported by teachers. The dynamic learning situation is drawn from developmental theory, in part Piagetian, and includes materials for the child, encouragement by the teacher to use these materials, and questions by the teacher to extend the child's thinking or highlight underlying errors and contradictions in reasoning. The questions and activities initiated by the teacher are not meant to provide the "right way" but to allow the child to reason at the limits of his or her developmental level. Given this orientation toward education, criteria for evaluating the program must reflect the experience of the child in the classroom, for to educate one way and assess another is hardly appropriate. The evaluation procedure should reflect important variables for adult success, yet it should be perceived in a broader way than simply as
197 measurement of outcome variables. It should reflect the conditions under which the outcomes were developed. While classroom observations can be summative in nature when defined as necessary conditions for a curriculum or for specific operational goals, usually they are conceived as formative or process assessment. Basically, observation of the climate of learning is essential to determining the "cost" of whatever is learned. In designing a "generative" testing situation, several additional criteria would have to be met. The instrument would have to allow the child to express what he or she knows in a functional way. The child should be able to construct answers so that they reflect his or her capacity to think and express concepts. The situation must be supportive of whatever the student produces so that the answers are not either right or wrong but simply an expression of his or her best ability. The situation should have supportive elements in it--friends or others with whom the student can work, familiar materials, opportunities to express the strengths of his or her educational career to date. This format does not call for a sampling of the universe of possible test questions, but rather for a situation in which the student can express strengths and weaknesses by generating original material. Generative assessment has the student convey his or her knowledge and abilities by constructing a response that indicates his or her level of development. The High/Scope Productive Language Assessment Tasks (PLAT) is one example of a generative approach to curriculum assessment. Developed over the last seven years and used at the High/Scope Follow Through sites, it measures the capacity of the child to use language as an expression of conceptual ability. One form of the PLAT battery incorporates two tasks, reporting and narrating. In the reporting task, children are given identical sets of unstructured materials and asked to make anything they want to make. After 20 minutes they are asked to write about how they made whatever they made and are allowed 30 minutes to complete their stories. The children are permitted to interact with one another during all phases of the task. In the narrating task, each child is given a set of relatively unstructured materials to "help you make up a story." After about 25 minutes of free (and usually dramatic) play on a carpeted floor, the children are asked to write a make-believe or pretend story. As in the reporting task, the children are permitted to interact with one another as they play and write.
198 While not a complete instrument, PLAT does represent the type of assessment procedure that is being developed by the sponsors of Follow Through, who represent child- centered and open-framework types of curricula. Such an instrument could be widely used to tap the abilities of children not assessed by regular batteries, abilities that in many respects reflect the highest goals of most educational programs. Instruments that respect the individual in the context of the culture offer a promising area for further development. A second alternative to standardized tests is to employ direct measures of success. These are more meaningful measurement methods than either IQ scores or achievement test results, which represent success only indirectly. Such "hard" measures as placement in special education classes or other special service programs and grade retention are important because they reflect actual decisions by schools to manage youngsters and have very real cost consequences. Each year of school that a child repeats increases the costs of total education by at least 8 percent. Placement in a special education program often quadruples the cost of education each year that the young- ster remains in such a program. Once assigned to such programs most youngsters remain in them until leaving school." These costs are the direct costs of education and not some delayed future expenditure. Using the High/Scope economic cost study as a model (Weber et al. 1978), the Consortium for Longitudinal Studies pooled the information from several of the older and more complete studies of special education programs (see Figure 2). These findings demonstrate the ability of early education to affect public expenditures; they present a powerful assessment procedure to judge early education effectiveness. On the whole, summative measures generally depend on intellectual and achievement test results to assess the outcomes of early education programs. While the 1While the Education for All Handicapped Children Act of 1975 (P.L. 94-142) increases the likelihood of service for youngsters who qualify, the pressures on schools to be responsive to disadvantaged children with learning difficulties without resorting to special education placement means new--and no doubt costly--alternatives must be provided.
199 Percent 40 30 10 20 ; I I in ~ o Legend control Program Totals Gordon Gray Weikart Levenstein Miller Program Children 64 36 58 69 93 Control Children 20 17 65 23 16 Significance .052 .017 .096 .004 .689 Pooled Significance Level p .0002 (two tailed) FIGURE 2 Percent of program and control children in special education. (Source: Consortium for Longitudinal Studies. Lasting Effects After Preschool. Final report. HEW Grant 90C-1311 to the Education Commission of the States. 1978.) appropriateness of these results for either the assessment of children or the program may be questioned, they are widely employed as a means of judging a specific program-- against other programs or against its own goals. More effective criteria begin to be available as a longitudinal study continues. When children are beyond the third grade, broadly conceived economic measures, which produce data that are meaningful to both the educator and the taxpayer, can be used as a very effective means of judging long-term outcomes. Indeed, cost-benefit findings are sufficiently powerful to directly affect public policy regarding early childhood education. Their power exceeds either IQ scores or achievement records in the final analysis. LONG-TERM SUMMATIVE MEASURES Long-term summative assessment of early education effectiveness is only now taking place as the passage of time makes such studies possible. Measurements made 10
200 and 15 years after an early education experience focus almost entirely on the actual performance of the subjects. Job performance, college attendance, receipt of welfare, crime and delinquency records, family formation, relation- ship with family and friends, supervisor ratings, earnings, etc. all form a basis for evaluation. The longitudinal follow-up has now left the general field of child development and moved into a dozen specialized disciplines. All assessment procedures are characterized by concrete performance indices. Gone is the need to assess academic achievement or intellectual ability. These are only signs on the way to real-world performance. There are special assessment problems at this level. One is, of course, identifying effective indicators of "quality of life." Another problem is income. Earnings indicators must differentiate participants as to those who receive welfare, those with legitimate jobs, and those "on the cash economy. Another assessment issue is the categorization of the manner in which young adults approach economic decision making. Benefits paid to workers such as sick leave, emergency leave, unemployment compensation, etc., reflect an ethic of assistance. Young adults today make financial decisions to maximize income and personal purpose. How are young adults to be "scored" who work the economic system to maximize personal gain, taking sick leave when not ill, etc.? Thus the breakthrough to real-world measures does not simplify the assessment problem. Complex issues remain to be resolved. On the whole, long-term longitudinal assessment must move from academic "place marker" variables into the world of hard performance and economic measurement. High priority should be given to establishing baseline data for the economic performance of adults from nonmainstream backgrounds and to closer monitoring of the later perform- ance of children who experience various interventions in early childhood. This requires the involvement of disciplines outside educational psychology.
201 REFERENCES Bloom, B. (1964) Stability and Change in Human . Characteristics. New York: John Wiley & Sons, Inc. for Longitudinal Studies Lasting effects of early education. Monographs of the Society for Research in Child Development. Consortium (1981) Jensen, A. (1979) Bias in Mental Testing. Press. Karnes, M. B. (1973) Evaluation and implications of research with young handicapped and low-income children. In J. C. Stanley, ea., Compensatory Education for Children, Ages 2 to 8. New York: The Free Baltimore, Md.: The Johns Hopkins Press. Miller, B., and Dyer, J. (1975) Four preschool programs: their dimensions and effects. Monographs of the Society for Research in Child Development 40:5-6. Schweinhart, L. J., and Weikart, D. P. (1980) Young children grow up: the effects of the Perry Preschool Program on youths through age 15. Monographs of the High/Scope Educational Research Foundation (Series No. 7). l Smilansky, M. (1979) Priorities in Education: and Conclusions. Stallings, Je (1975) Implementation and child effects of teaching practices in Follow Through classrooms. Monographs of the Society for Research in . . Child Development 40 (7-8, Serial No. 163). Weber, C. U., Foster, P. S., and Weikart, D. P. (1978) An economic analysis of the Ypsilanti Perry Preschool Project. Monographs of the High/Scope Educational Research Foundation (Series No. 5). Weikart, D. P., Bond, J. T., and McNeil, J. (1978a) Ypsilanti Perry Preschool Project: preschool years and longitudinal results through fourth grade. Monographs of the High/Scope Educational Research Foundation (Series No. 3) Preschool: Evidence World Bank Paper No. 323. .
202 Weikart, D. P., Epstein, A., Schweinhart, L., and Bond, J. T. (1978b) Ypsilanti Preschool Curriculum Demonstration Project: preschool years and longitudinal results. Monographs of the High/Scope Educational Research Foundation (Series l NOe 4)e Zigler, Ee' and Butterfield, E. (1968) Motivational aspects of changes in IQ test performance of culturally deprived nursery school childrene Child Development 39:1-14 Zigler, Be, and Valentine, Je, ease (1979) Project Head Start, A Legacy of the War on Poverty New York: The Free Press e e