Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 233
Early Childhood Assessment: Why, What, and How 8 Assessing All Children All children deserve to be served equitably by early care and educational services and, if needed, by intervention services. This requires that there be fair and effective tools to assess their learning and development and identify their needs. In this chapter we address the challenges to assessment posed by groups of children who differ from the majority population in various ways. For all of the groups discussed here, assessment has been problematic. This chapter has three major sections. In the first section, we review issues around the assessment of young children who are members of ethnic and racial minority groups in the United States and the research that has been done on them, chiefly on black children. The next section deals with assessment of young children whose home language is not English, to whom we refer as English language learners. The final section treats the assessment of young children with disabilities. MINORITY CHILDREN Conducting assessments for all children has both benefits and challenges, but when it comes to assessing young children from a cultural, ethnic, or racial minority group, unique concerns apply related to issues of bias. There is a long history of concern
OCR for page 234
Early Childhood Assessment: Why, What, and How related to the potential for, and continued perpetuation of, unfair discriminatory practices and outcomes for minority children. The topic has struck political, legal, and emotional chords, with many in the minority population holding deep-seated skepticism about the positive benefits of assessing their children (Green, 1980; Reynolds, 1983). Some of the features that distinguish minority children in United States include racial/ethnic background, socioeconomic status (SES), cultural values, dialect/linguistic differences, historical and current discrimination, current geographic isolation, and other characteristics that marginalize a population to the majority society. In this section we provide a brief overview of the concerns about assessment of young minority children and examine the available empirical evidence on potential bias in assessing young children from birth to age 5. Fairness The primary concerns about the assessment of this population are fairness and equality across groups. That is, there is concern that assessment tools, by their inherent properties, could contribute to the over- or underidentification of children differently across different minority population groups. Since the first assessment tools were developed, there has been long-standing concern that test scores may not necessarily reflect differences in ability or developmental milestones among children and the populations they represent, but rather demonstrate problems in the construction, design, administration, and interpretation of the assessment tests that lead them to be unfair and untrustworthy (Brown, Reynolds, and Whitaker, 1999; Garcia and Pearson, 1994; Gipps, 1999; National Association of Test Directors, 2004; Skiba, Knesting, and Bush, 2002). Most of what is known about potential bias in assessing minority children is based on school-age children and youth. Less is known about children younger than age 5 and assessment score differences between whites and blacks (Brooks-Gunn et al., 2003). Children ages 5-14 are the most extensively examined for cultural bias, mostly in intelligence testing, with most of the empirical focus on ages 7-11 (Valencia and Suzuki, 2001). It is important for us to clarify the many definitions of “unfair”
OCR for page 235
Early Childhood Assessment: Why, What, and How and “untrustworthy” assessment problems that are typically termed “bias,” because they are often confused by researchers and the public alike (Reynolds, Lowe, and Saenz, 1999). There is bias as in being unfair or as “partiality toward a point of view or prejudice,” and there is bias defined as a statistical term: “systematic error in measurement of a psychological attribute as a function of membership in one or another cultural or racial subgroup” (Reynolds, Lowe, and Saenz, 1999, p. 550). Many of the definitions of bias as defined by statistical terms are tied to psychometric validity and reliability theory (discussed in Chapter 7); however, they are often confounded with philosophical definitions of bias related to fairness and views of prejudice (Brown, Reynolds, and Whitaker, 1999). Types of Biases Several categories of biases are particularly relevant for minority populations (Reynolds, 1982; Reynolds, Lowe, and Saenz, 1999). Inappropriate Content and Measuring Different Constructs Bias may arise when the content of the test is unfamiliar to or inappropriate for minority children; test content is inappropriate for a population as a result of contextual differences (Neisworth and Bagnato, 2004). The assumption is that since tests are designed for cultural values and practices of middle-class white children, minority children will be at a disadvantage and more likely to perform poorly because of a lack of exposure to, and a mismatch with, content included in the testing situation. A lack of success in an assessment may be due to the fact that the assessment instrument does not reflect the local and cultural experiences of the children taking the test, resulting in flawed examinations and misrepresentation of minority children’s true ability and performance (Hagie, Gallipo, and Svien, 2003). For example, differences in culture between racial minority and white majority groups in communication patterns, child-rearing practices, daily activities, identities, frames of reference, histories, and environmental niches may influence child develop-
OCR for page 236
Early Childhood Assessment: Why, What, and How ment and how development is assessed (Gallimore, Goldenberg, and Weisner, 1993; Hiner, 1989; Ogbu, 1981, 2004; Slaughter-Defoe, 1995; Weisner, 1984, 1998). Hilliard (1976, 2004) has provided several conceptual arguments about the role of contextual factors that differ among racial/ethnic groups, such as reasoning styles, conceptions of time and space, and dependence on and use of nonverbal communication (Castenell and Castenell, 1988). The dominant, majority group members may stigmatize the food, clothing, music, values, behaviors, and language or dialect of minorities as inferior to theirs or inappropriate, creating a collective group of “minorities” as a separate segment of society that is “not like” the majority (Ogbu, 2004). Variations in ecological circumstances suggest that assessments may be culturally loaded because they reflect the (typically white, majority) developers’ experiences, knowledge, values, and conceptualizations of the developmental domains being examined (intelligence, aggressive behavior, etc.). This can lead to a mismatch between the cultural content of the test and the cultural background of the person being assessed, so test items are not accurately reflective of the developmental experiences of the minority population. The idea that all children have been exposed to the same constructs that the assessment tries to measure, regardless of different socialization practices, early literacy experiences, and other influences, is a fallacy (Garcia and Pearson, 1994; Green, 1980; Laing and Kamhi, 2003; Valencia and Suzuki, 2001). So, for example, bias may arise on the Peabody Picture Vocabulary Test-III (PPVT-III) because of a lack of familiarity with pointing at pictures to communicate, unfamiliarity with English vocabulary, or a combination of these (Laing and Kamhi, 2003). Not all children are exposed to the unspoken expectations for communication and behavior in school settings, such as the early exposure to oral and written linguistic experiences of the mainstream. As such, children who may have cultures with strong oral traditions for learning (American Indians, Haitian Creoles) may be at risk for biased assessments (Notari-Syverson, Losardo, and Lim, 2003). Evidence has long suggested that children from many minority racial groups do not, as a group, perform as well as children from the majority white group on school achievement and formal,
OCR for page 237
Early Childhood Assessment: Why, What, and How standardized tests, even controlling for socioeconomic background and proficiency in standard American English (Garcia and Pearson, 1994; Rock and Stenner, 2005). The list of theories related to such disparities is long; however, one reason relevant to this report is that differences in test scores (e.g., between black and white children) may be due to striking disparities in ecological conditions and to instruments that are not designed to be sensitive to those cultural variations. Such contextual variations, if not considered in the assessment instrument design, can lead to systematic biases (Brooks-Gunn et al., 2003). Such a bias may actually perpetuate or increase social inequalities because it legitimates them by designing a test that has content and measures reflecting the values, culture, and experiences of the majority (Gipps, 1999). Inappropriate Standardization Sample and Methods Hall (1997) argues that Western psychology tends to operate from an ethnocentric perspective that research and theories based on the majority, white, population are applicable to all groups. These paradigms are seen as templates to be used on all groups to derive parallel conclusions. As such, often the standardization samples of tests are primarily drawn from white populations, and often minorities are included in insufficient numbers for them to have a significant impact on item selection or to prevent bias. For example, there is a great deal of concern about accurate identification of language disorders among black children using standardized, norm-referenced instruments, because many literacy tests are developed based on mainstream American English and do not recognize dialect differences. The tests have been normed on children from white, middle-class backgrounds (Fagundes et al., 1998; Qi et al., 2003; Washington and Craig, 1992). Often validity and sampling tests do not include representative samples of nonmainstream English speakers, so the statistical ability to find items that are biased is limited (Green, 1980; Seymour et al., 2003). It may be that the large proportion of minority children who score poorly on some standardized language assessment tools may have to do more with the fact the tests have been normed
OCR for page 238
Early Childhood Assessment: Why, What, and How on children from primarily white, middle-class language backgrounds than with true differences in children’s language abilities (Qi et al., 2003). Minority groups may be underrepresented in standardization samples relative to their proportions in the overall population, or their absolute number may be too small to prevent bias. Standardized tests based on white middle-class normative data have inevitable bias against children from minority and lower SES groups, providing information on their status in comparison to mainstream children. They do not take into account cultural differences in values, beliefs, attitudes, and cultural influences on assessment content; contextual influences of measuring behavior; or alternative pathways in development (Notari-Syverson et al., 2003, p. 40). In addition, the fact that a minority group is included in a normative sample does not mean the assessment tool is unbiased and appropriate to use with that group (Stockman, 2000). It is a common misconception that, because a test is “normed,” it is unbiased toward minorities. The norming process, by its nature, leans toward the mainstream culture (Garcia and Pearson, 1994). When test companies draw strict probability samples of the nation, very small numbers of particular minorities are likely to be included, increasing the likelihood that minority group samples will be unrepresentative. Even if a test is criterion-referenced instead of norm-referenced, the performance standards (cutoff scores) by which the children’s performance is evaluated are likely to be based on professional judgments about what typical (that is, mainstream) children know and can do at a particular developmental level (Garcia and Pearson, 1994). Inappropriate Testing Situation and Examiner Bias Rarely examined is the assessor’s influence on child assessments and whether assessor familiarity or unfamiliarity exerts a bias against different population groups. For example, situational factors may systematically enhance or depress the performance of certain groups differently, such as familiarity with the testing situation, the speed of the test, question-answer communication style, assessor personal characteristics, and the like (Green, 1980, p. 244). Assessor and language bias is present particularly if the
OCR for page 239
Early Childhood Assessment: Why, What, and How assessor speaks only standard English, which may be unfamiliar, intimidating, or confusing to minority children (Graziano, Varca, and Levy, 1982; Sharma, 1986; Skiba, Knesting, and Bush, 2002). For example, a meta-analysis by Fuchs and Fuchs (1986) of 22 empirical studies on assessor effects on intelligence tests for children ages 4-16 suggested that children scored higher when tested by familiar assessors. SES was a vital variable: children from low SES backgrounds performed much better with a familiar assessor, whereas high SES children performed similarly across assessor conditions (Fuchs and Fuchs, 1986). Some researchers have suggested that assessment format and test-taking style can be threatening to some minority populations by its unusual or foreign format and procedure, leading to direction bias (directions for the test misinterpreted by the child) (Castenell and Castenell, 1988; Fagundes et al., 1998). These characteristics may not be equally present in all test-taking populations. Also, the test-taking style dictated by standardized procedures may influence the performance of children from diverse cultural backgrounds, such that their performance may not represent their true ability because they lack familiarity with the test-taking situation (Qi et al., 2003). Inequitable Social Consequences Use of assessments that are not free from bias may result in minority groups being over- or underrepresented in services or educational tracks. Most often the conversation is focused on inappropriate overrepresentation in services (e.g., special education) or on minorities being relegated to inferior programs or services because of test performance (Hilliard, 1991). Historically, test scores have been used to keep black and Hispanic children in segregated schools (Chachkin, 1989). More recently, excessive reliance on test scores for placement purposes has sent disproportionate numbers of minority children into special education programs and low tracks in middle and high school (Chachkin, 1989; Garcia et al., 1989; Rebell, 1989), cited in Garcia and Pearson (1994). Also, the opposite is possible: some children (e.g., Asians) may be overrepresented in advanced programs and high tracks. As Gopaul-McNicol and Armour-Thomas (2002) write: “The chal-
OCR for page 240
Early Childhood Assessment: Why, What, and How lenge for equity in assessment is to ensure that the judgments made about behavior of individuals and groups are accurate and that the decisions made do not intentionally or unintentionally favor some cultural group over another” (p. 10). Differential Predictive Validity To ensure the absence of bias requires that errors in prediction are independent of group membership, and that tests predict important outcomes or future behaviors for minority children. Claims have been made that tests do not accurately predict relevant criteria for minorities and that the criteria against which tests are typically correlated, being from the majority culture, are themselves biased against minority group members (Brown, Reynolds, and Whitaker, 1999; Reynolds, Lowe, and Saenz, 1999). The psychometric methods described in Chapter 7 are among those that may be used to detect such bias in existing instruments and to avoid them when developing and norming new instruments. EMPIRICAL EVIDENCE ABOUT POTENTIAL BIAS In 1983 Reynolds laid out the types of assessment test bias that may occur with minority populations and the need for empirical testing of assessment instruments. Twenty-five years later, this call for empirical research about bias has largely gone unanswered. Empirical evidence does not provide a consistent answer about the potential bias of assessments of minority populations. In addition, most of the work examining test bias has been focused on school-age and adult populations (e.g., intelligence testing, entrance exams, employment tests; Reynolds, 1983). As Reynolds quipped (1983, p. 257), “For only in God may we trust; all others must have data.” What empirical evidence is available about the potential bias of assessments for minority children from birth to age 5? The quick answer: very little. A Search for Evidence Despite a wealth of conceptual and theoretical arguments and the need to be cautious using assessments with minority popula-
OCR for page 241
Early Childhood Assessment: Why, What, and How tions (e.g., Hilliard, 1979, 1994, 2004), the availability of published empirical evidence testing potential bias for minority populations, particularly in assessment tools used for children between birth and age 5, is sparse. In our search, we developed a list of commonly used early childhood measures from several comprehensive sources (Child Trends, 2004; National Child Care Information Center, 2005). We used the EBSCO search engine (also called Academic Search Premier) to find empirical studies that examined bias and fairness assessment for minority children. Search results were filtered on the basis of four criteria: (1) an empirical design, (2) examination of an individually administered assessment tool, (3) testing of minority participants, and (4) a focus on children from birth to age 5. Only studies published in refereed scholarly journals were examined. All studies were assessed by reading the title and abstracts. If the abstract didn’t provide enough information to judge the article’s match to the established criteria, the full article was reviewed. Table 8-1 lists the number of empirical articles found on test bias with minority populations by core developmental domains. A total of 64 assessment tools were searched across a number of developmental domains for empirical evidence about potential bias or fairness of the tool with English-speaking, minority populations. In all, 30 empirical articles were found that meet the committee’s criteria. In addition to searching for empirical evidence, the committee reviewed several test manuals of child assessment tools, looking at the empirical approaches test developers reported to consider the potential for bias for different ethnic and minority populations. Some findings: (1) There was little reported evidence that the performance of minority children was examined separately from the larger standardization group. (2) Sometimes detailed data from the normative sample of the current assessment tool version are not available. (3) Standardization samples of minority children are small. (4) Race and class may be confounded in the normative sample. Methodological Issues In our review of the 30 empirical studies, several key methodological issues emerged that may contribute to why there is no
OCR for page 242
Early Childhood Assessment: Why, What, and How TABLE 8-1 Peer-Reviewed Articles Found on Test Bias with Minority Populations Across Major Developmental Domains Developmental Domain Number of Assessment Tools Searched Number of Bias Testing Articles Found Assessment Tools with Articles Meeting Committee Criteria Cognitive 11 16 Kaufman Assessment Battery for Children (K-ABC) (n = 5) Peabody Individual Achievement Test-Revised (PIAT-R) (n = 2) Stanford-Binet Intelligence Scales, Fourth ed. (SB-IV) (n = 3) Wechsler Preschool and Primary Scale of Intelligence, Third ed. (WPPSI-III) (n = 3) Woodcock-Johnson III (WJ-III) (n = 3) Language 15 9 Expressive Vocabulary Test (n = 3) Peabody Picture Vocabulary Test III (n = 5) Preschool language scale (n = 1) Socioemotional 21 5 Behavioral Assessment System for Children (n = 1) Bayley Scales of Infant Development (n = 1) Child behavior checklist 1½-5 (n = 1) Attachment Q-set (n = 1) Peen Interactive Peer Play Scale (n = 1) Approaches to learning 4 0 0
OCR for page 243
Early Childhood Assessment: Why, What, and How unified conclusion about the role of bias in assessment tests for children. The lack of agreement on the definition of bias. Often it is not clearly specified what type of bias and validity is being tested for, and, if it is, only one type of bias may be addressed. Most of the attention is focused on construct validity and testing for biases related to inappropriate content, followed by biases related to an improper normative sample. Cultural groups may have conceptions or meanings of constructs that are not aligned with what is represented in the assessment (Gopaul-McNicol and Armour-Thomas, 2002). Or there is no commonly agreed-on use of the term “bias” from a multicultural testing perspective or agreement on how to measure it (Stockman, 2000, p. 351). Psychometric tests alone cannot address all potential issues of construct threats—problems about the validity of the constructs themselves, not just whether they are being assessed equivalently. These include contextual nonequivalence, conceptual nonequivalence, and linguistic nonequivalence. A related issue is mono-operation of bias and measures of bias. That is, many studies use only a single variable or a single technique to examine bias effects (Cook and Campbell, 1979). Methods used to empirically test for bias vary widely, from simple comparisons of means and standard deviations with the normative sample, partial correlation between subgroups and item scores to conduct t-tests, to multiple regression and methodological approaches controlling for potential confounding variables. Depending on what type of bias is being examined, the simple presence or absence of differences in mean scores between two different minority groups does not directly say anything about the fairness of the test (Qi et al., 2003; Reynolds, Lowe, and Saenz, 1999). Lack of consistent use of psychometric research and theory in testing for bias. Empirical evidence for potential bias with minority groups may be a result of the type of psychometric
OCR for page 270
Early Childhood Assessment: Why, What, and How Assessment is individualized and appropriate for the child and family. Assessment provides useful information for intervention. Professionals share information in respectful and useful ways. Professionals meet legal and procedural requirements and meet recommended practices guidelines. Assessment Challenges Children with special needs are assessed in large numbers and by a varied array of practitioners, yet little information about actual assessment practices is available. It would be useful to know what tools are being used, how child behaviors are being judged, how eligibility decisions are being reached, to what extent children with special needs are included in accountability assessments, and so on. The use of norm-referenced standardized assessments for children with special needs creates particular challenges. Standardized assessments require that items be administered the same way to all children, requiring them to show competence on demand, possibly in an unfamiliar setting and at the request of a stranger. The structure and requirements of traditional norm-referenced measures present numerous problems for the assessment of young children in general, but especially for young children with special needs (Bagnato, 2007; Macy, Bricker, and Squires, 2005; McLean, 2004; Meisels and Atkins-Burnett, 2000; Neisworth and Bagnato, 2004). In fact, Bagnato concluded that “conventional testing has no valid or justifiable role in early care and education” (Bagnato and Yeh-Ho, 2006, p. 618). A discussion of some of these problems follows. One of the problems is based on the extent and number of response demands that the testing situation makes on the child. Standardized testing often requires verbal fluency, expressive communication, fully functioning sensory systems, as well as comprehension of the assessment cues including the verbal and visual cues being given by the examiners (Bagnato, 2007; Division for Early Childhood, 2007; Meisels and Atkins-Burnett, 2000). Many young children with special needs are not capable of complying with all of the demands of the testing situation.
OCR for page 271
Early Childhood Assessment: Why, What, and How A national study of eligibility practices of over 250 preschool psychologists with over 7,000 children found that nearly 60 percent of the children would have been untestable if the psychologists had followed standardized procedures. Children could not respond as they were expected to because of lack of language, poor motor skills, poor social skills, and lack of attention and other self-control behaviors (Bagnato and Neisworth, 1995). One of the basic principles of good assessments is that an assessment must have demonstrated validity for the purposes for which it is used (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999). Norm-referenced measures are often used with young children to determine eligibility for IDEA services. As explained previously, state definitions for eligibility for early intervention services employ criteria (e.g., percent delay) that necessitate the use of norm-referenced measures. In 1987, a landmark paper examined the test manuals of 27 aptitude and achievement tests and found that publishers provided very little information on the use of the test with children with disabilities (Fuchs et al., 1987). More recently, Bagnato and colleagues (Bagnato, McKeating-Esterle, and Bartolomasi, 2007; Bagnato et al., 2007a, 2007b) published syntheses of both published and unpublished research on testing and assessment methods for early intervention, with funding from the U.S. Department of Education, Office of Special Education Programs. They concluded that no research has been conducted to support the use of conventional tests for early intervention eligibility. Only three studies have been conducted to support the use of authentic assessment methods and clinical judgment methods for this purpose. Bailey (2004) suggests that the factor structure used to develop age levels for developmental assessments may not be appropriate for children with developmental delays. He cites a study that found only three factors for children with severe developmental disabilities rather than the five factors reported in the manual (Snyder et al., 1993). Weak or imprecise measurement during eligibility determinations may lead to denial of access to services. One possible way to mitigate some of the limitations of using norm-referenced assessments for eligibility determinations is
OCR for page 272
Early Childhood Assessment: Why, What, and How through the use of clinical judgment, which, in those states that allow it, can be used instead of or in conjunction with formal assessments. Dunst and Hanby (2004) compared the percentage of children served in the 28 states and the District of Columbia that allow the use of informed clinical opinion with those that do not and found no differences in the percentage of children served, suggesting that professionals in the states that allow for informed clinical opinion may not take advantage of this eligibility determination practice. Another practice problem associated with standardized norm-referenced assessments is that they do not provide information that is relevant for program planning because the items are chosen for their ability to discriminate among children. In other words, ideal items on norm-referenced tests are passed by half the children and failed by half the children in the norming group. Because norm-referenced tests lack treatment or instructional validity (Bailey, 2004; Botts, Losard, and Notari-Syverson, 2007; Neisworth and Bagnato, 2004), service providers need to give additional assessments to develop intervention plans. One study, which represents a possible new direction for eligibility assessment, examined the use of a curriculum-based measure for eligibility as an alternative to norm-referenced assessment (Macy, Bricker, and Squires, 2005). It found support for the potential of alternative forms of assessment for making eligibility decisions. All of the problems with using norm-referenced assessment notwithstanding, at least professionals administering traditional tools to young children for diagnostic purposes have the option to select a particular instrument on the basis of the characteristics of the individual child to be tested and should be augmenting that information with information from other sources. The examiner also can modify the assessment procedures to accommodate fatigue or lack of interest. Although such changes in administration violate the standard administration procedures, they may be the only way to get usable information from the assessment (Bagnato and Neisworth, 1995). Often no such option for individualization exists when children with disabilities are assessed for research, evaluation, or accountability purposes—the other reasons why children with special needs would be administered standardized assessments.
OCR for page 273
Early Childhood Assessment: Why, What, and How For the aggregated data to be meaningful, all children must be administered the same assessment according to the same guidelines. The issue of aggregating data is somewhat less problematic for researchers or program evaluators studying a homogeneous subpopulation of children with special needs, such as young children with blindness, because the study designers may have the option to select a measure that has been developed and validated with the subpopulation of interest (assuming such measures exist). For large data collections encompassing the entire range of young children with disabilities, the challenges related to instrument selection and administration are substantial, as are the challenges of recruiting assessment administrators and interpreters with the full range of relevant knowledge and experience. Designers of large-scale data collections may respond to the assessment challenges posed by the diversity of children with special needs by excluding them from either the entire study sample or from one or more of the assessments. Another approach is to include only those children with special needs deemed capable of participating in the general assessments and either exclude or administer an alternate assessment to those who cannot take part in the regular assessment. The Early Childhood Longitudinal Study-Kindergarten Cohort, for example, included all children with special needs, provided a set of accommodations for those who needed them, and included an alternate assessment for children who could not participate in the regular assessment (Hebbeler and Spiker, 2003). Given that the data in large-scale studies will be aggregated across children and possibly disaggregated by subgroups, it is imperative that accurate conclusions be drawn about the performance of children with special needs. Even though there are no data on the validity of using standardized norm-referenced assessments with children with special needs for this purpose, national and statewide evaluation efforts, including the Head Start’s National Reporting System, have used such measures with this population for these purposes. Currently, an assessment system developed by the state of California contains the only assessment tools that have been developed explicitly for large-scale data collection with young
OCR for page 274
Early Childhood Assessment: Why, What, and How children, including those with special needs. These observation-based tools are unique because they were designed from the beginning to ensure that young children with disabilities could be included in the data collection (see http://www.draccess.org for more information). In addition to these general problems, we describe below several challenges of special relevance to the assessment of children with disabilities. Construct-Irrelevant Skills and the Interrelatedness of Developmental Domains For a young child to demonstrate competency on even a single item on an assessment requires a combination of skills, yet some of them may not be relevant to the construct being assessed. To the extent that items on an assessment require skills other than the construct being assessed (e.g., problem solving), construct-irrelevant variance exists in the scores. Some examples of this in assessments of young children with special needs are obvious. A child who cannot hear or who has no use of her arms will not be able to point to a picture of a cat when asked. The item requires hearing and pointing as well as knowledge of a cat, even though these are not the skills being tested. The child who cannot point will fail the item, regardless of what he or she knows about cats. Other occurrences of construct-irrelevant variance may not be so obvious. All assessments that require children to follow and respond to the examiner’s directions require some degree of language processing. Even though test developers attempt to address this by keeping instructions simple, all young children are imperfect language processors because they are still learning language. Many young children with special needs have impairments related to communication, meaning their capacity to process language is even less than the restricted capacity of a typical peer. Unlike deafness, blindness, or a motor impairment, language processing problems may present no visible signs of impact on the assessment process. Construct-irrelevant variance is a major problem for the assessment of young children because many assessments are organized and scored around domains of development. Domains
OCR for page 275
Early Childhood Assessment: Why, What, and How are a construct created to describe areas of development. They do not exist independently in the child, and therefore measurement tools that assume independence of domains will have some degree of construct-irrelevant variance due to overlap across domains. Ironically, the impact of construct-irrelevant skills is greater for children with disabilities, because their development across domains may be less connected than it is for typically developing children. For example, completing a two-piece puzzle requires both cognitive and motor skills, skills that develop in tandem in typically developing children. The puzzle is challenging for the same-age child with limited motor skills, even though that child may have a very solid understanding of how the pieces fit together. Functional Outcomes and Domain-Based Assessments For many years the emphasis in working with young children with special needs has been on identifying and improving functional, rather than domain-based, outcomes. The concept of an appropriate outcome of intervention for a young child with disabilities has evolved over time. One approach used previously by service providers was to write outcomes drawn from domain-based developmental milestones (Bailey and Wolery, 1984). Two examples of milestones as outcomes are “Places round piece in a form board” or “Nests two then three cans.” Although some lists of milestones can provide useful skills, milestones do not make good instructional targets for numerous reasons. They are not derived from a theory of development. Many were originally developed because of their ability to differentiate the performance of children of different ages on standardized tests. And the sequence of development for typically developing children may not represent the best sequence for children with disabilities. A contrasting approach to outcome identification, which is now considered recommended practice, is to develop outcomes that are functional (McWilliam, 2004). Functional outcomes (a) are immediately useful, (b) enable a child to be more independent, (c) allow a child to learn new, more complex skills, (d) allow a child to function in a less restrictive environment, and (e) enable a child to be cared for more easily by the family and others (Wolery,
OCR for page 276
Early Childhood Assessment: Why, What, and How 1989). An example of a functional outcome is “Natalie will be able to sit in her high chair, finger feed herself, and enjoy dinner with her family.” Outcomes like this are important because they allow a child to participate more fully in a variety of community settings (Carta and Kong, 2007). Unlike a set of developmental milestones that may have limited utility to a child on a day-to-day basis, functional skills are usable across a variety of settings and situations with a variety of people and materials that are part of the child’s daily environment (Bricker, Pretti-Froniczak, and McComas, 1998). Functional outcomes are at odds with domain-based assessments because they recognize the natural interrelatedness across domains as essential to children’s being able to accomplish meaningful tasks in their daily lives. A functional outcomes approach does not try to deconstruct children’s knowledge and skills into types of items reflected in many domains-based assessment frameworks; the units of interest are the more complex behaviors that children must master to be able to function successfully in a variety of settings and situations. The International Classification of Functioning, Disability and Health—Children and Youth Version (ICF-CY) (World Health Organization, 2007) is based on an emerging international consensus that characterization of individuals’ health and ability or disability should be grounded in functions, activities, and participation and provide methods for characterizing these in children. The emphasis in many assessment tools on discrete skills and their organization into domains can operate as a barrier to recommended practice for practitioners, who are to use the results in partnership with families to identify the child’s areas of need and plan interventions addressing meaningful functioning. Universal Design and Accommodations Universal design is a relatively new phenomenon that has direct application to assessment design for all children, especially young children with special needs. Ideally, all assessments should be designed in accord with principles of universal design, thereby minimizing the need for accommodations. Universal design has its origins in architectural efforts to design physical environments
OCR for page 277
Early Childhood Assessment: Why, What, and How to be accessible to all. According to the Center for Universal Design (1997), universal design is “the design of products and environments to be usable by all people, to the greatest extent possible, without the need for adaptation or specialized design.” Universal design is reflected in the community in sidewalks that have curb cuts, allowing people with wheelchairs to cross streets. The goal in applying principles of universal design to assessments is to develop assessments that allow for the widest range of participation and allow for valid inferences about performance (Thompson and Thurlow, 2002). Applying the principles of universal design to the development of assessments for accountability for elementary and secondary school-age children, Thompson and Thurlow identified seven elements of universally designed assessments (Table 8-2). Some of the principles, such as maximum readability and maximum legibility, are primarily applicable to assessments in which the child will be reading passages of text, but most of these principles can be applied to early childhood assessment design. A principle of special relevance for young children is the need for precisely defined constructs. Just as physical environments are to be designed to remove all types of barriers to access and use, assessments are to be designed so that cognitive, sensory, emotional, and physical barriers that are not related to the construct being tested are removed (Thompson, Johnstone, and Thurlow, 2002), which relates to the previous discussion on construct-irrelevant skills. Application of universal design principles is intended to minimize construct-irrelevance variance. Universal design principles are especially relevant for standardized assessments but also apply to criterion-based assessments. For example, objectives for children can be described with regard to “communication” rather than spoken language and “mobility” rather than walking. Many of the assessment tools in use today with young children predate the concept of universal design and thus were not developed to reflect these principles (California’s Desired Results System being a notable exception). Even with the application of universal design principles, the need may remain to develop accommodations to allow some children with special needs to be assessed with a particular instrument and for their scores to accurately reflect their capabilities.
OCR for page 278
Early Childhood Assessment: Why, What, and How TABLE 8-2 Elements of Universally Designed Assessments Element Explanation Inclusive assessment population Tests designed for state, district, or school accountability must include every student except those in the alternate assessment, and this is reflected in assessment design and field testing procedures. Precisely defined constructs The specific constructs tested must be clearly defined so that all construct-irrelevant cognitive, sensory, emotional, and physical barriers can be removed. Accessible, nonbiased items Accessibility is built into items from the beginning, and bias review procedures ensure that quality is retained in all items. Amenable to accommodations The test design facilitates the use of needed accommodations (e.g., all items can be Brailled). Simple, clear, and intuitive instructions and procedures All instructions and procedures are simple, clear, and presented in understandable language. Maximum readability and comprehensibility A variety of readability and plain language guidelines are followed (e.g., sentence length and number of difficult words are kept to a minimum) to produce readable and comprehensible text. Maximum legibility Characteristics that ensure easy decipherability are applied to text, to tables, figures, and illustrations, and to response formats. SOURCE: Thompson and Thurlow (2002). An accommodation is never intended to modify the construct being tested. Accommodations can include modifications in presentation, in response format, in timing, and in setting. They are generally associated with standardized testing, with its stringent administration requirements. Criterion-based measures, which tend to be more observation-based, provide children with many and varied ways to demonstrate competence as part of the assessment procedures, an approach that reduces but may not eliminate the need for accommodations. An extensive body of literature has developed in the last 20 years on the use of accommodations of various kinds with various subgroups of school-age children with disabilities, as
OCR for page 279
Early Childhood Assessment: Why, What, and How states moved to include children with disabilities in statewide accountability testing programs (see http://www2.cehd.umn.edu/NCEO/accommodations). There is no corresponding literature for young children, probably because the process of building a system of ongoing large-scale assessment of young children for accountability is only beginning in many states (National Early Childhood Accountability Task Force, 2007), and it is the implementation of large-scale data collection that precipitates the need for accommodations. Other Assessment Characteristics Individual assessment tools differ with regard to other features that have implications for their appropriateness for some children with special needs. The tool must have a low enough floor to capture the functioning of children who are at a level that is far below their age peers. Not having enough items low enough for children with severe disabilities can be a problem on a norm-referenced or curriculum-referenced measure. Similarly, the assessment must have sufficient sensitivity to capture small increments of growth for children who will make progress at far slower rates than their peers (Meisels and Atkins-Burnett, 2000). Identifying a tool that has a sufficiently low floor, provides adequate sensitivity, and covers the target age range will be challenging for any large-scale assessment that includes young children with special needs. An assessment developed to be used with 3- through 5-year-olds that includes items only appropriate to that age span will not adequately capture the growth of a 3-year-old who begins the year with the skills of a 2-year-old and finishes with those of a 3-year-old. One last consideration related to assessing young children with special needs is the extent to which the test’s assumptions about how learning and development occur in young children are congruent with how development occurs in the child being assessed. Caution is needed in using assessments with children with special needs that were developed for a typically developing population, and in which children with special needs were not included in the design work or the norming sample (Bailey, 2004).
OCR for page 280
Early Childhood Assessment: Why, What, and How Conclusion The nearly 1 million young children with special needs are regularly being assessed around the country for different purposes. Although a variety of assessment tools are being used for these purposes, many have not been validated for use with these children. Much more information is needed about assessments and children with special needs, such as what tools are being used by what kind of professionals to make what kind of decisions. Assessment for eligibility determines whether a young child will have access to services provided under the IDEA. It is unknown to what extent these critical decisions are being made consistent with recommended assessment practices and whether poor assessment practices are leading to inappropriate denial of service. The increasing call for accountability for programs serving young children, including those with special needs, means that even more assessment will be occurring in the future. Yet the assessment tools available are often insufficiently vetted for use as accountability instruments, and they are difficult to use in standardized ways if children have special needs, and they focus inappropriately on discrete skills rather than functional capacity in daily life. Until more information about assessment use is available and better measures are developed, extreme caution is critical in reaching conclusions about the status and progress of young children with special needs. The potential negative consequences of poor measurement in the newest area of assessment, accountability, are especially serious. Concluding that programs serving young children with special needs are not effective based on flawed assessment data could lead to denying the next generation of children and families the interventions they need. Conversely, good assessment practices can be the key to improving the full range of services for young children with special needs: screening, identification, intervention services, and instruction. Good assessment practices will require investing in new assessment tools and creating systems that ensure practitioners are using the tools in accordance with the well-articulated set of professional standards and recommendations that already exist.