used for accountability purposes—group-administered paper-and-pencil tests—which may be inappropriate for young children (Graue, 1999, Meisels, 1996). Such tests often fail to capture children's learning over time or predict their growth trajectory with accuracy, and they often reflect an outmoded view of learning. In contrast to older children, young children tend to learn in episodic and idiosyncratic ways; a task that frustrates a child on Tuesday may be easily accomplished a week later. In addition, children younger than 8 have little experience taking tests and may not be able to demonstrate their knowledge and skills well on such instruments. A paper-and-pencil test may not provide an accurate representation of what a young child knows and can do.
However, the types of assessments useful for instructional improvement, identification of special needs, and program evaluation may not be appropriate for use in providing accountability data. Instructional improvement and identification rely on measures such as direct observations of student activities or portfolios of student work, which raise issues of reliability and validity if used for accountability (Graue, 1999). Program evaluations include a wide range of measures—including measures of student physical well-being and motor skills, social development, and approaches to learning, as well as cognitive and language development—which may be prohibitively expensive to collect for all students.
However, it is possible to obtain large-scale information about what students have learned and what teachers have taught by using instructional assessments. By aggregating this information, district and state policy makers can use data on instructional assessment to chart the progress of children in the first years of schooling without encountering the problems associated with early childhood assessment noted above. To ensure accuracy, a state or district can “audit” the results of these assessments by having highly trained educators independently verify a representative sample of teacher-scored assessments. Researchers at the Educational Testing Service have found that such an approach can produce valid and reliable information about literacy performance (Bridgeman et al., 1995).
A second strategy might be to assess the full range of abilities that young children are expected to develop, and hold schools accountable for their progress in enabling children to develop such abilities, by assessing representative samples of young children. To ensure the validity of inferences from such assessments, the samples should represent all students in a school; sample sizes can be sufficiently large to indicate the performance of groups of children, particularly the disadvantaged students who are the intended beneficiaries of Title I. Individual scores would not be reported. Researchers are exploring methodologies to describe levels or patterns of growth, ability, or developmental levels.