Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
2 Purposeful Assessment A ssessment, defined as gathering information in order to make informed instructional decisions, is an integral part of most early childhood programs. By the mid-elementary level, children in some school systems may spend several weeks every year completing district and state assessments, and those in troubled schools probably spend even more time in more formal test preparation activities designed to ensure that their high- stakes assessment outcomes are acceptable. Since assessment is such a fact of educational life, it is important to step back and ask: Why is this assessment being done? What purpose does it have? Is this particular assessment optimal for meeting that purpose? For younger children, thinking about purpose is equally central. Done well, ongoing assessment can provide invaluable information to parents and educators about how children grow and develop. Developmentally appropriate assessment systems can provide information to highlight what children know and are able to do. However, inappropriate testing of young children runs the risk of generating insufficient information for the tester and discomfort (or just wasted time) for the testee; such risks are unacceptable and can be avoided only if it is very clear why people are engaging in the activity and what benefit will accrue from it. Furthermore, specifying the purpose of an assessment activity should guide all the decisions that we write about in this volume: 27
28 EARLY CHILDHOOD ASSESSMENT what domains to assess, what assessment procedures to adopt, and how to interpret and use the information derived from the assessments. We make the case throughout this report that the selection and use of assessments, in early childhood as elsewhere, should be part of a larger system that specifies the infrastructure for distributing and delivering medical or educational services, maintaining quality, supporting professional development, dis- tributing information, and guiding further planning and decision making. Thus, while in this chapter we focus on the purposes for which one might choose and use an assessment tool, we return to the theme of purpose in thinking about designing the systems for assessment in Part IV. A wide range of tools can be used to collect information about children, classrooms, homes, or programs, and thinking about mode of assessment along with purpose is crucial. Assess- ment modes include medical procedures, observation of natural behavior, participant reports using checklists or surveys, perfor- mance in structured versions of natural tasks, and performance on standardized tests. Given the challenges of direct assessment with very young children, it is worth first considering less intrusive modes of assessment if they also meet the purposes formulated. In the following sections we discuss many purposes for which assessment of childrenâs learning and development is employed, beginning with several purposes associated with determining the level of functioning of individual children, and progressing to the purpose of guiding instruction, and then mea- suring program or societal performance. After briefly mention- ing research usesâemploying assessment to learn more about child developmentâwe present guidance to be kept in mind when assessing for individual child-focused or accountability purposes, drawing on the wisdom of many previous reports from organizations interested in promoting the education and welfare of young children.
PURPOSEFUL ASSESSMENT 29 DETERMINING an individual childâs level of functioning Individual-Focused Screening Many assessments, particularly in the infancy and toddler period, are designed to screen children for medical risks. For example, within a few days of birth, infants in the United States are screened for phenylketonuria (PKU)âa genetic disorder characterized by an inability of the body to use the essential amino acid, phenylalanineâand in the first year of life infants are screened for vision and hearing deficits. These screening assess- ments are typically carried out in pediatric settings. Because their purpose is to ensure delivery of care or appropriate services to all children with an identified problem or risk, the screening is designed to minimize false negatives. False positives are less harmful; they may alarm a parent or generate a costly follow-up, but such mistakes are less severe in consequence than missing a child who could benefit from early intervention or medical treatment. It is important to ensure that individual children who fail the screen are followed up with further assessment, both to confirm the identification and in many cases to specify the source of the difficulty. In Part II we document many of the domains for which screening instruments are available and widely used. Community-Focused Screening Although community-focused screening may use the same tools and procedures as individual-focused screening, its purpose is not individual, but rather to give a picture of risk at the com- munity level. Thus, for example, if screening for toxic levels of lead is done in an individual-focused way, the response would be to counsel parents about ways to protect children from lead exposure, as well as to treat them directly. If done in a community- focused way, the goal might be to identify neighborhoods with a high risk of lead toxicity, in order to guide the distribution of ser- vices or to plan the provision of compensatory education in those locations, or perhaps even to influence public policy; this could Screening, assessment, and other terms are defined in Appendix A.
30 EARLY CHILDHOOD ASSESSMENT co-occur with the individual-focused screening goal of informing parents about their childrenâs health. Diagnostic Testing If screening assessments indicate a childâs performance is out- side the expected range, then often further diagnostic assessment is needed to better describe the problem, to locate a cause, or both. Sometimes the screening and diagnostic instruments are the same; for example, high blood levels of lead strongly suggest a diagnosis of lead poisoning. But sometimes the screening is uninformative about a diagnosis. For example, a child who is identified by a lan- guage screening assessment as possibly having delayed language development needs further assessment to determine whether an actual delay exists, whether there are other, related delays (e.g., intellectual functioning, cognitive processing), and whether there are obvious causes (e.g., hearing loss). A particular purpose for which individual diagnostic assess- ment is increasingly being used is to determine âresponse to intervention,â in other words, to test whether interventions are successful in moderating developmental problems by using diag- nostic probes. Establishing Readiness A widely used purpose of individual assessment has been to establish the readiness of individual children to participate in particular educational programs. The concept of readiness in early childhood is complicated, as are the consequences of a finding that a child is ânot readyâ (Graue, 2006). Readiness tests (a form of achievement test) have often been used prior to kinderÂgarten entrance to ascertain childrenâs likelihood of success in kindergar- ten and as a basis on which to make recommendations to parents about whether to enroll their children in the regular program or in some form of extra-year program or to postpone kindergarten entry. Using tests for this purpose supersedes the legal establish- ment of kindergarten eligibility in state law based on age (Educa- tion Commission of the States, 2005). To the extent that readiness assessments focus on readiness to benefit from reading instruction,
PURPOSEFUL ASSESSMENT 31 they have also been criticized as embodying a discredited model of literacy development (National Research Council, 1998). Most of the instruments used to establish readiness have been found to be wanting, leading to incorrect recommendations about half the time (Meisels, 1987; Shepard, 1997). Using readiness tests to make recommendations about childrenâs access to kindergarten is especially troublesome because many of the children recom- mended for delayed entry are the ones who would most benefit from participation in an educational program. Researchers and advocates have consistently recommended against the use of readiness tests for this purpose (National Association of Early Childhood Specialists in State Departments of Education, 2000; Shepard and Smith, 1986). More recently, readiness has become a construct of interest to policy makers as they consider the needs of children with regard to access to prekindergarten education and as a measure of their status at the time of entry to kindergarten (Brown et al., 2007). A number of states now measure the readiness of children once they have entered kindergarten. It is important to distinguish this useful application of readiness assessment from that of testing for eligibility. Guiding intervention and instruction Using ongoing assessment information to guide instructional decisions is a primary purpose of early childhood assessment and should be a component of a high-quality early childhood pro- gram (National Association for the Education of Young Children and National Association of Early Childhood Specialists in State Departments of Education, 2003). Similarly, the instructional and therapy services provided to children receiving early interven- tion and early childhood special education should be based on the results of initial assessment information and regularly revised using subsequently collected information on the childâs progress (Neisworth and Bagnato, 2005). A case study in the value of reliance on assessment in plan- ning and differentiating instruction is offered by the Reading First classrooms. Providing primary grade teachers with tools that are relatively easy to administer and to interpret, as a basis
32 EARLY CHILDHOOD ASSESSMENT for grouping children and selecting instructional activities, has massively changed the nature of early literacy instruction in U.S. schools (Center on Education Policy, 2007). A similar shift to an âassessment cultureâ in preschool classrooms will enable Âteachers to identify the learning needs of their students, to provide activi- ties optimally designed to promote their development across the crucial domains (described in Part II), and to allocate time optimally to the various domains, improving childrenâs progress and promoting their engagement. For example, data from Head Start about childrenâs proficiency at the beginning of the year in the domains of emergent literacy, numeracy, and oral language skills would help teachers decide how much time should be spent in teaching letter recognition and counting versus promoting vocabulary and sharing books. In addition to using assessment information to establish a descriptive picture of childrenâs strengths and needs and to plan for instruction at program entry, teachers and others working with young children need to collect ongoing assessment information to track their learning over time. In addition, assessment infor- mation on how children are progressing in each area of the cur- riculum or with regard to individualized goals can be aggregated across children to see whether the program as implemented is, for the children as a group, meeting the needs identified and the goals defined. Using Assessments for Planning and Monitoring Childrenâs Progress Assessment data used for planning activities and tracking learning collected individually about all children in a program or classroom can be used at the individual child level (e.g., to identify a childâs strengths and areas of need) or aggregated across children and used at the classroom level (e.g., to check the appropriateness and effectiveness of the educational program; to identify strengths and weaknesses of the group as a whole) and at the center or school level. Teachers and parents are the primary audiences for assess- ment information collected to guide instruction. For the potential value of assessment to improve childrenâs learning to be realized, teachers also need adequate time to review assessment informa-
PURPOSEFUL ASSESSMENT 33 tion and reflect on its implications for practice. It is now widely recognized that those working in early childhood classrooms and programs should be purposeful in their educational planning and thus need to use assessments for planning and monitoring what children are learning. Criterion-referenced or curriculum-based measures are used to plan instructional activities and monitor what children are learning. Assessment data can be collected through observation, collection of childrenâs work, and talking to them (Dodge et al., 2004). The National Association for the Education of Young C Â hildren (NAEYC) and the Division for Early Childhood (DEC) have formulated recommendations about assessments for use in educational planning and progress monitoring. Examples of tools for this purpose include the Creative Curriculumâs DevelopÂ mental Continuum, the High/Scope Child Observation Record (COR), and the Work Sampling System. Teachers and other staff must receive training and follow-up on the use of any assessment tool to be able to obtain valid and reliable information about childrenâs performance. Response to Intervention: A New Application of Assessment for Instruction and Intervention Response to intervention (RTI) is an approach for identifying and providing systematic intervention for school-age children who are not making satisfactory progress (Fuchs and Fuchs, 2006). RTI models vary somewhat but common components include the use of multiple tiers of increasingly intense interventions, a problem-solving approach to identifying and evaluating instruc- tional strategies, and an integrated data collection and assessment system to monitor student progress and guide decisions at every level (Coleman, Buysse, and Neitzel, 2006). The tiers refer to the levels of support a child needs to succeed in the classroom. The base tier addresses the needs of children who make adequate progress in a general program, the next tier refers to supports provided to children who need additional general assistance, and the third tier refers to more specialized assistance for children not succeeding in the previous tiers. Universal screening with a tool
34 EARLY CHILDHOOD ASSESSMENT designed for this purpose is implemented in the base tier to iden- tify children who are not meeting established educational bench- marks in a high-quality instructional program. Those identified as not making progress are provided with additional empirically supported interventions or instructional strategies and their prog- ress is monitored on a regular basis to determine the effectiveness of the intervention, with additional intervention provided to those who continue to show limited progress. Although there is considerable interest in applying tiered models to preschool, how the principles would be applied has not been thoroughly developed, and there has been very little research to date on the application to early education (Coleman, Buysse, and Neitzel, 2006; VanDerHayden and Snyder, 2006). An example of an RTI application for children under age 5 is a model called ÂRecognition and Response; it is under develop- ment as an approach to early identification and intervention for children with learning disabilities (Coleman, 2006). The devel- opmental and experiential variation in young children presents challenges for the strict application of RTIâs prescribed universal screening, identification of low-performing children, and tiered intervention. One concern is whether the early and frequent use of assessment to single some children out as requiring additional assistance is necessary, or even potentially harmful, before the children have had the opportunity to benefit from a high-quality preschool experience. Much more research is needed on how to apply the assessment and intervention practices of multitiered models in a way that is consistent with what is known about young childrenâs development. EVALUATING the performance of a program or society Perhaps the most talked-about of the many purposes for which assessment can be used, especially since the passage of the No Child Left Behind Act (NCLB) in 2001, is account- ability. It is important to note that the term âaccountabilityâ encompasses a number of distinct purposes, which we attempt to distinguish here.
PURPOSEFUL ASSESSMENT 35 Program Effectiveness If a government or an agency is investing money in a program, it makes sense to ask the questions âIs this program effective? Is it meeting our goals?â Assessment designed to evaluate program effectiveness against a set of externally defined goals is one form of accountability assessment. This may look a lot like progress monitoring assessment, and indeed the selection of tools for the two purposes might be identical. But evaluation differs from progress monitoring in two key ways. First, progress monitoring assessment is meant to be useful to those inside the program who are responsible for day-to-day decisions about curriculum and pedagogy, whereas evaluation of program effectiveness is useful to those making decisions about funding, extending, or terminat- ing programs. Second, progress monitoring requires data on all relevant domains from all children in a program, whereas in many cases it is possible to evaluate a programâs effectiveness by sam- pling children rather than testing them all, or by using a matrix design to sample different abilities in different children. Using assessments for accountability purposes may seem simple, but in fact interpreting test data as reflecting the value of a program can be risky. There are many challenges to the conclu- sion that a program in which children perform poorly at the end of the year should be terminated. What if they were extremely low scorers at program entry and made notable progress, just not enough to reach the norm or criterion? What if the program is basically sound but disruptions to financing or staffing led to poor implementation in this particular year? What if the pro- gram is potentially good but investments in needed professional development or curricular materials were denied? What if the alternative program in which the children would end up if this one is terminated is even worse? Challenges like this have been widely discussed in the context of accountability consequences for school-age children under NCLB, and they are equally applicable to programs for preschoolers. In other words, establishment of program-level accountability is a legitimate and important purpose for assessment, but not one that can be sensibly met by sole reliance on child-focused assess- ment data. Accountability is part of a larger system and cannot be
36 EARLY CHILDHOOD ASSESSMENT derived from outcome data alone, or even from pre- and posttest data, on a set of child assessments. We say more about the impor- tance of the larger system in Chapter 10. Program Impacts A more specific purpose for assessing children participating in a particular program is to evaluate the impact of that program, ideally in comparison to another well-defined treatment (which might be no program at all), and ideally in the context of random assignment of individuals or classrooms to the two conditions. Under these circumstances, it is possible to evaluate the impact of the program on childrenâs performance on the assessments used. Under these (relatively rarely encountered) ideal experimental circumstances, it is appropriate to sample children in programs rather than testing them all, and it is possible, if one is willing to limit claims about program effectiveness to subsets of children, to exclude groups of children (English language learners, for exam- ple, or children with disabilities) from the assessment regimen. Social Benchmarking Another purpose for early childhood assessment that relates to accountability at a societal level is social benchmarkingâ a Â nswering questions like âAre 3-year-olds healthier than they were 20 years ago?â or âHow do American 4-year-olds perform compared with Australian 4-year-olds on emergent literacy tasks?â Social benchmarking efforts include projects like those launched by the National Center for Education Statistics (the Birth Cohort Study, the Early Childhood Longitudinal Study- Kindergarten) and individual states (Californiaâs Desired Results Developmental Profile). These efforts provide profiles of âexpectable developmentâ that can be used for comparisons with smaller groups in particular studies and also as a baseline for comparison with data collected at a later time. Furthermore, these studies provide policy makers and the public with a view of what the society is doing well and not so well at. The movement to develop early learning guidelines can be seen as a contribution to the social benchmarking effort;
PURPOSEFUL ASSESSMENT 37 early learning guidelines represent a set of aspirations about what children should be able to do, and the social benchmarking assess- ments provide information about the reality. AdvancING knowledge of child development Finally, a major purpose of assessmentâand a major source of the assessments widely used for the purposes discussed in this chapterâis for research to advance knowledge of child develop- ment. It goes far beyond our charge to discuss in any detail the use of assessments for research purposes. Furthermore, there exist robust mechanismsâpeer review of journal articles, peer review of grant proposals, institutional review boards for the use of human subjectsâfor providing guidance to researchers in select- ing, administering, and interpreting the results of assessments of young children. Nonetheless, because researchers of child development have indeed innovated and in many cases refined the tools adopted for use by education practitioners and policy makers, it seems churlish not to acknowledge this important and generative line of work. Guidelines for Administering and Using Child Assessments Appropriately for Various Purposes Organizations concerned with early childhood development and learning have recognized the potential good that can come of child assessment as well as the harm that incorrect uses or interpretations of such assessments can cause. Several of them have developed position statements or guidelines for the use of assessments with young children, with the intention of maximiz- ing the benefits and preventing harm. Some of these documents are listed in Box 2-1. The more recent of them incorporate and expand on earlier ones to a large extent. Thus, the entire set represents a relatively coherent set of guidelines for selection, use, and interpretation of early childhood assessments. Several of these documents agree, for example, on the following important guidelines for individual assessment:
38 EARLY CHILDHOOD ASSESSMENT BOX 2-1 Guidelines of Documents Promulgated by Major Early Childhood Professional Groups â¢ Principles and Recommendations for Early Childhood Assess- ments (Shepard, Kagan, and Wurtz, 1998). Goal 1 Early Child- hood Assessments Resource Group document. â¢ Early Childhood Curriculum, Assessment, and Program Evalu- ation (and an accompanying extension for English language learners), a position statement promulgated by the National As- sociation for the Education of Young Children and the National Association of Early Childhood Specialists in State Departments of Education (2003). â¢ Promoting Positive Outcomes for Children with Disabilities: Recommendations for Curriculum, Assessment, and Program Evaluation from the Division for Early Childhood (2007). â¢ Council of Chief State School Officers set of documents on Building an Assessment System to Support Successful Early Learners (undated, but circa 2003a, 2003b). â¢ Assessments should benefit children: National Education Goals Panel (NEGP), NAEYC, DEC. â¢ Assessments should meet professional, legal, ethical stan- dards: NAEYC, DEC. â¢ Assessments should be designed for a specific purpose and be shown to be psychometrically sound for that purpose: NEGP, NAEYC, DEC. â¢ Assessments should be age-appropriate or developmentally/ individually appropriate: NEGP, NAEYC, DEC. â¢ Parents/family should be involved in assessment when possible: NEGP, NAEYC, DEC. â¢ Assessments should be linguistically and culturally appropriate/responsive: NEGP, NAEYC, DEC. â¢ Assessments should assess developmentally/educationally significant content: NEGP (in narrative), NAEYC, DEC.
PURPOSEFUL ASSESSMENT 39 â¢ Assessment information should be gathered from familiar contexts (NEGP), realistic settings and situations (NAEYC), or be âauthenticâ (DEC). â¢ Information should be gathered from multiple sources: NEGP, NAEYC, DEC. â¢ Assessment results should be used to improve instruction and learning: NAEYC, DEC, NEGP. â¢ Screening should be linked to follow-up assessment: NEGP, NAEYC. Special Considerations When Using Child Assessments for Accountability Particular care is needed in moving from child-focused to accountability-focused purposes for assessment. Data collected for accountability purposes are never meant as a basis for draw- ing conclusions or informing program personnel about individual children. Instead, they are meant to be useful to funders, state and federal policy makers, and others responsible for making decisions about a program or policy, and for this purpose it is completely appropriate to use sampling. However, in many cases, states are attempting to use the same data for accountability and for progress monitoring purposes. The wisdom of this approach is questionable, although the apparent efficiencies are understand- ably seductive. Progress monitoring, however, requires data at the individual child level from all children. Decisions about accountability should never rest solely on findings from child-directed assessments. Information about the conditions under which the program is operating and about the characteristics of the families and children it is serving are crucial to making valid inferences from child performance to program quality. (Many other safeguards must also be in place, which are discussed in Part III.) Considerable guidance about accountability assessment is available from the documents listed in Box 2-1, as well as from a recent Pew Foundation report (National Early Childhood Accountability Task Force, 2007). The tools used for various accountability purposes are often adaptations of tools developed for other purposes. The large- scale, large-sample assessment sweeps needed for Âaccountability
40 EARLY CHILDHOOD ASSESSMENT purposes impose a particular set of requirements: relatively brief assessments that can be administered and interpreted in standardized and straightforward ways. These requirements are particularly difficult to meet when assessing young children. Standardization of administration conflicts with establishing a trusting relationship with a child, for example, and standardiza- tion of interpretation conflicts with using all the information available. The reliability of standardized tests is threatened when they are shortened for use with large groups, and brief forms may generate information too sparse to be interpretable, in particular for children from language and cultural minorities and children with disabilities. Thus such abbreviation or adaptation requires careful evaluation of the psychometric properties of the adapted or abbreviated instruments. Nonetheless, tools developed for other purposes (e.g., Peabody Picture Vocabulary TestâDunn and Dunn, 2007; Bayley Scales of Infant and Toddler Developmentâ Bayley, 2005; MacArthur-Bates Communicative Development InventoriesâFenson et al., 1993) are often adapted for use in large-scale evaluations and social benchmarking efforts. As noted above, the validity of conclusions about account- ability, evaluation, and social benchmarking extends only to groups that are represented in sufficient numbers among those on whom the instruments were normed and among those assessed. Language and cultural-minority children and children with dis- abilities must typically be either oversampled or excluded from consideration; neither solution is entirely without problems. Conclusions about the status or development of children in these groups are also of concern in large-scale assessments because they are highly standardized and often norm-referenced. Some children with disabilities may not be included because they need accommodations or because the floor of the assessment is too high. English language learners may not be included because the assessment is given or exists only in English. Any conclusion about program accountability requires data about initial as well as final performance. Another key issue in accountability-related assessment is the selection of the assessment tools to be used. This step should be as purposeful as the other decisionsâwhen to assess, whom to assess, how to assessâinvolved in establishing accountability.
PURPOSEFUL ASSESSMENT 41 Too often these decisions are made by committees or with input from multiple stakeholders; even with the best intentions, mul- tiple parties may end up compromising on poor tests. We hope this report provides some guidance to groups making decisions about instruments to choose for any of the purposes they may be addressing.