Click for next page ( 231


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 230
Testing In Eclucational Placement: Issues arid Evidence JEFFREY R. TRAVERS To write about testing in relation to the issues facing the Panel on Selec- tion and Placement of Children in Programs for the Mentally Retarded is somewhat like testifying as a ballistics expert at a shooting trial: The topic invites discussion in almost limitless technical detail, but the details are significant only insofar as they help illuminate whether someone has in- jured someone else and by what means. Therefore, this paper focuses less on psychometric issues than on their interplay with the legal, political, and moral issues raised by testing in the context of educational placement. The paper, in providing background and support for portions of the panel's report, attempts to accomplish two distinct but related tasks. First, given the controversy that has surrounded testing in the academic and popular literature as well as in recent court cases, the panel felt a responsibility to survey the scientific evidence bearing on relevant aspects of the controversy. This paper provides such a survey, albeit one that is condensed and selective and that covers material already well known to professionals in testing and related fields. Second and more impor- tantly the panel wanted to place the testing controversy in proper per I would like to thank the pane! members and the outside reviewers who commented on drafts of this paper. Among the panel members, special thanks go to Donald Bersoff, Asa Hilliard, Jane Mercer, and Samuel Messick. Outside reviewers were Lee Cronbach, Robert Linn, Richard Snow, and Mark Yudoff. Their thoughtful comments helped me to strengthen my arguments and correct various errors. For errors that remails, as well as for,judgments with which a few reviewers disagreed, I alone am responsible. 230

OCR for page 230
Testing in Educational Placement 231 spective. Issues surrounding testing are part of the larger complex of issues raised by the stubborn and tragic fact that large numbers of children, particularly minority children, are not learning in regular class- rooms. Consequently, as the paper examines various controversies and the associated scientific evidence, it also examines their wider implications for educational policy and practice. Several limitations on the scope of the paper should be made clear at the outset. It is not a comprehensive discussion of issues related to ability testing. (For such a discussion, see the report of the Committee on Ability Testing of the National Research Council, Wigdor and Garner, eds., 1982; see also the special issue of American Psychologist, Glaser and Bond, eds., 1981.) This paper focuses specifically on the issues that have figured in the debate over placement in programs for educable mentally retarded (EMR) children. It does not deal with research on mental retar- dation per se, nor does it make judgments about the validity or utility of the EMR category. It asks instead how tests contribute to classification or misclassification, given current professional and legal definitions of EMR. Finally, this paper does not deal directly with the consequences of classifi- cation the effects of labeling or the educational benefits and costs of placement in EMR classes-although one of its major themes is that the consequences, not just the accuracy, of classification must be taken into account in deciding whether any assessment procedure is appropriate. This paper focuses primarily on the widely used, individually adminis- tered tests that yield IQ scores, notably the Stanford-Binet and the revised Wechsler Intelligence Scale for Children (WISC-R), although other tests are mentioned. Much of the discussion applies to ability tests generally. Special issues raised by group testing and by various quick and dirty sub- stitutes for the major tests are not discussed. (The fact that the Stanford- Binet and WISC-R are widely used and that IQ scores are important determinants of EMR placement are documented in Chapter 2 of the pan- el's report and in the paper by Bickel in this volume.) Here, these facts are taken as points of departure and concentration is not on describing how tests are used in educational placement but on elucidating the controversy surrounding their use. Readers familiar with professionally recommended practices for admin- istering and interpreting tests of mental ability and with the range of such tests currently available may be disturbed by the emphasis throughout this paper on single IQ scores and the occasional use of such words as "IQ test." Leaders in the field of assessment have long recommended the use of multiple tests and careful consideration of performance profiles across subscales within tests, and they have inveighed against the practice of recording only single, summary IQ scores. Unfortunately, data (cited in

OCR for page 230
232 TRAVERS Chapter 2 and in the paper by Bickel in this volume) indicate that in many school systems the single IQ score is accorded overwhelming weight in placement decisions. Although the extent of this practice cannot be gauged, it is an important source of the controversy over testing in educa- tional placement. It may also be a source of miscommunication between professionals in testing and related fields, who think in terms of the best practices and proper test use, and some critics of testing, who focus on possible or actual misuse and misinterpretation of tests. This paper assumes that the reader has at least a rudimentary knowl- edge of how tests are constructed and interpreted as well as of basic statis- tical concepts and procedures. The presentation is largely qualitative, however, and some background material is included. It is useful to begin this inquiry with rough caricatures of the positions taken by proponents and opponents of mental ability testing. Though such caricatures ignore many significant distinctions and nuances within the two camps, they lay out most of the major points of dispute and illus- trate the interrelatedness of the various issues from both perspectives. Subsequent sections of the paper will necessarily discuss selected issues seriatim. However, if one thing is clear in all of the debate, with its com- plex arguments and high emotions, it is that the positions of participants rarely rest on one or a few isolated facts or arguments; data and logic in- stead lodge within a web of assumptions, beliefs, and values that must be understood if rational analysis is to proceed. TESTING ON TRIAL: BRIEFS FOR THE DEFENSE AND FOR THE PROSECUTION Proponents of the use of tests of general ability in educational placement hold that such tests measure global, enduring qualities of cognitive func- tioning not necessarily "native intelligence" but some broad ability to learn, reason, and grasp abstract concepts. Proponents deny that tests are culturally biased; while they recognize that children from certain ethnic and socioeconomic groups on the average score lower than white, middle- class children, they attribute these group differences in test scores to genu- ine differences in cognitive functioning, caused by heredity, environment, or both. Finally, in justifying the social uses of tests in educational and oc- cupational selection and placement, proponents argue that tests offer in- dividual members of disadvantaged groups, such as minorities and the poor, their best chance of distinguishing themselves and achieving educa- tional and economic success; alternatives to testing, such as qualitative as- sessments by teachers and supervisors, are, claim the proponents of testing, likely to be more discriminatory than tests.

OCR for page 230
Testing in Educational Placement 233 Critics of standardized tests hold that the tests fail to measure in- telligence, aptitude, or global cognitive skill and instead measure specific skills and knowledge acquired through particular experiences or instruc- tion. Moreover, critics charge, experiences leading to the acquisition of these skills are more accessible to white, middle-class children than to children of other ethnic and socioeconomic groups. Some critics also argue that the test situation itself is unfamiliar and threatening to low-in- come and/or minority children, further depressing their scores. Thus, argue the critics, tests are inherently biased against low-income and/or minority children and systematically underestimate their intellectual abil- ity relative to that of middle-class whites. Finally, critics attack the social uses and social effects of testing. Tests, they allege, perpetuate race and class prejudices because they are widely interpreted as demonstrating the inherent intellectual inferiority of minorities and low-income groups. Sim- ilarly, they perpetuate racial and class inequities in income, job status, and other forms of success and achievement, because they channel children from minority and/or low-income families into educational set- tings that provide little intellectual stimulation, little opportunity to ac- quire the skills most valued by the society, and little in the way of presti- gious credentials and social contacts that can influence occupational and economic success quite apart from ability and effort. The extreme case in point, of course, is placement in classes for mentally retarded students, which, it is alleged, stigmatizes the child unfairly and virtually guarantees a dead-end education leading to a menial job at best. Even this brief summary, which has barely skimmed the surface of the debate, makes it clear that many profound issues divide the proponents and opponents of testing. Any list of the primary open questions would in- clude at least the following: 1. What do standardized ability tests measure? To what degree do they measure deep-seated mental abilities as opposed to skills and knowledge that can be readily acquired by almost any child in the right environment? 2. Are tests culturally biased? To what degree do test scores understate or fail to measure the abilities of minority and/or low-income children? 3. What are the causes of observed group differences in test pejor- mance? To what degree are the causes genetic? To what degree do such dif- ferences arise from group differences in quality of prenatal care, nutrition, and health care? To what degree do they arise from differences in early ex- perience or in the home environment? To what degree do they arise from differences in out-of-home educational environments and opportunities from the preschool years on? 4. What are the social consequences of testing? To what degree do tests

OCR for page 230
234 TRAVERS provide opportunities for gifted individuals from disadvantaged back- grounds to identify themselves? To what degree do they perpetuate disad- vantage and prejudice? In the context of educational placement, do they, on balance, help or hinder the meeting of children's needs? To what de- gree do they identify children who need special help? To what degree do they lead to inappropriate classification and unfair allocation of educa- tional opportunity? Answers to these questions vary with particular tests and particular pol- icies regarding their use. The partial answers offered below relate primar- ily to the use of major "IQ tests" in EMR evaluations during recent years and may not generalize beyond that context. The first three issues are discussed in separate sections below. The fourth is central to the mission of the panel and crosscuts the others; it is discussed in each substantive section and in the conclusion of this paper. The possible contribution of testing to the disproportionate representa- tion of boys in EMR classes another concern of the panel is not dis- cussed explicitly since the controversy over testing has focused on ethnicity rather than gender. Important issues concerning possible interactions of gender and ethnicity and the reportedly greater vulnerability of boys than girls to environmental variations are likewise beyond the scope of this paper. WHAT DO "INTELLIGENCE" TESTS MEASURE? To discuss what such tests as the WISC-R and Stanford-Binet measure, it is first necessary to clear away a popular misconception about what they are supposed to measure. In the view of most professionals in psychology, psychometrics, and related fields, such tests do not and are not intended to measure the global, fixed native capacity that seems to be implied by the term "intelligence." Indeed, for these professionals the equation of in- telligence with native intellectual capacity is entirely misleading and has been the source of much confusion and unnecessary acrimony in debates about testing and its uses. (For an authoritative statement of this position, see Cleary et al., 19751. The gap between this view and that of many educators, policy makers, members of the public, and some social scientists is illustrated by federal Judge Robert Peckham's landmark decision in the case of Larry P. v. Riles (1979~. In a section entitled "The Impossibility of Measuring Intelli- gence," the judge writes (Larry P. v. Riles, 1979, Section IVA): While many think of the IQ as an objective measure of innate, fixed intelligence, the testimony of the experts overwhelmingly demonstrated that this conception of

OCR for page 230
Testing in Educational Placement 235 IQ is erroneous. Defendant's expert witnesses, even those closely affiliated with the companies that devise and distribute the standardized intelligence tests, agreed, with only one exception, that we cannot truly define, much less measure, intelli- gence. We can measure certain skills, but not native intelligence. The judge implies that in the common view intelligence is, by definition, a quality both innate and unchanging; and he apparently holds this view himself. (Generations of psychologists, most of them now deceased, ad- vanced the same definition.) However, the judge rejects what he considers to be the popular view that IQ is an accurate measure of native intelli- gence. He himself was convinced that IQ tests measure something that is not fixed or innate "certain skills" and he does not seem to equate these skills with intelligence. Presumably, however, the "experts" who "devise and distribute" intel- ligence tests must believe that they measure something that can legiti- mately be called "intelligence," even if it is ill defined and not fixed or innate. The experts seem to hold the view of those contemporary psycholo- gists who think of intelligence as a kind of global ability to absorb complex information or grasp and manipulate abstract concepts- an ability that is not fixed but that develops continuously through a process of reciprocal interaction with the physical and social world, including, but not limited to, the world of formal education. This very general view is shared by psy- chologists who differ on many specific theoretical points Piagetian devel- opmental psychologists, cognitive psychologists oriented toward computer simulation and information processing, even some learning theorists com- mitted to animal behavior models. For all of these psychologists, it is reasonable to speak of an individual's intelligence at a given point in his or her development, but there is no presumption that individual differences in intelligence are fixed or wholly determined by the genes. From this perspective the central question is whether IQ is a valid mea- sure of "developed intelligence." Questions about how much genes con- tribute, how genes and environment interact, and how much IQ can be modified by planned social intervention through education are separate. A few of these questions are discussed in a subsequent section on the causes of variation in IQ; selected aspects of the validity question are dis- cussed here. Inspection of one of the major intelligence tests, such as the Stanford- Binet or the WISC-R, reveals that items vary widely in content and that many plainly require learning of a very specific sort. Examples include verbal analogies, numerical computations, and questions about practical tasks and social norms (How do you make water boil? What should you do if a smaller child tries to start a fight with you?. Vocabulary items provide some particularly striking examples: At its most advanced adult level, the

OCR for page 230
236 TRAVERS Stanford-Binet asks the meanings of such esoteric words as "parterre" and "sudorific." This manifest emphasis on acquired knowledge and di- versity of item content naturally raises questions as to how such tests can be said to measure any general mental property (as opposed to specific skills and knowledge) as well as how tests can be said to measure "ability" in any broad sense that goes beyond the ability to answer the specific ques- tions and solve the specific problems presented by the test itself. The generality of mental test scores has been the subject of a long debate in psychometrics. Early leaders in the field, notably Spearman and Thurstone, took opposed positions. The debate came to focus on the statis- tical issue of shared variation: What fraction of the variance in individual performance is shared by all items? What fraction is shared within dis- tinct clusters of items but not across clusters (thus pointing to differenti- ated abilities rather than a single "intelligences? What fraction is unique to individual items (pointing to "abilities" specific to the items)? Statisti- cal techniques of principal components and factor analysis were developed largely to address these questions. There is no universal agreement on precise, quantitative answers to these questions. Different analytic techniques yield different estimates of the relative importance of the general factor versus differentiated clusters. There is agreement, however, that a significant fraction of the variation is shared across items. The diverse items on such tests as the WISC-R and Stanford-Binet appear to measure (in part) the same thing or a small number of things; they are not merely a heterogeneous ragbag of skills and bits of knowledge. Item responses correlate with one another, with sub scare scores, and with total scores on the test. Items load on a single general factor and on a small number of orthogonal factor scales. For ex- ample, several analyses of WISC-R scores, based on large samples com- prised of several ethnic groups, have revealed independent "verbal" and "perceptual" factors and, occasionally, a third factor variously labeled "distractibility," "attention," "memory," and "sequential" (Kaufman, 1975; Mercer, in press; Reschly and Reschly, 1979~. In addition, most tests of general abilities, even when apparently dissimilar in content, cor- relate positively and often highly with each other. Covariance of scores across items and across tests is an established em- pirical fact. To identify common variance with ability or abilities requires inference and interpretation. The inference rests on an assumption: A child who possesses general perceptual and analytic abilities will make good use of experience and will master a wide range of specific facts, con- cepts, and principles. Conversely, a child who performs well on a wide variety of items is likely to have well-developed information-processing abilities of a general sort. An alternative interpretation of test and item co

OCR for page 230
Testing in Educational Placement 237 variance is that both the tests and the individual items reflect exposure to the mainstream culture, especially to the language, symbols, information, strategies, and tasks that are important in schools. These two interpre- tations are not necessarily opposed, so long as it is recognized that per- ceptual and analytic abilities may be developed in part through experience and exposure to appropriate stimulation. (There may of course be other broad perceptual and analytic abilities that are neither captured by ex- isting tests nor fostered by the mainstream culture.) It is important to recognize that all test performance depends on both general abilities and specific knowledge, both of which are products of learning, at least in part. For example, a test of an advanced, academic subject matter, e.g., one that requires the respondent to solve differential equations, clearly requires specific preparation. Nevertheless, general mathematical ability is likely to play a large role in individual perfor- mance. The relative contributions of general ability and specific learning are not fixed characteristics of the test itself but depend as well on the tested population and the circumstances of testing. Pursuing the example just given, if students in a calculus class are all drawn from a narrow, high band of the spectrum of general mathematical ability but vary widely in their previous preparation for calculus, the latter variations will be a relatively important determinant of test performance. If students in the class vary widely in ability but have all been exposed to the same mathe- matics curriculum in the past, variations in ability will be a more impor- tant factor. Most school psychologists and educators who use IQ tests avoid the in- terpretive issues discussed above and justify their use of tests on grounds of "predictive validity," a purely empirical phenomenon. Many studies have shown that IQ scores predict (correlate with) "criterion" measures of scholastic success, such as later school grades or scores on standardized tests of achievement in specific subject areas. For elementary school children, validity coefficients (correlations) of .7 or higher have often been obtained using achievement tests as criteria (see Crano et al., 19721. Cor- relations with grades are typically somewhat lower. Values around .5 have been reported (Messe et al., 1979~. Occasionally, much lower correlations with grades have been reported; however, technical limitations may ac- count in part for these findings. ~ iLower and less consistent correlations with grades are to be expected for many reasons. IQ tests are more similar in style and content to standardized achievement tests than to class- room tests and other performance measures used in grading. Grades are likely to be less reli- able than standardized achievement tests, and unreliability attenuates correlations. Grades are likely to be influenced by factors other than achievement, such as deportment or per

OCR for page 230
238 TRAVERS It is not necessary to dwell on the evidence for predictive validity, be- cause some degree of the predictive power of tests is generally conceded. What is sharply debated, however, is the interpretation of validity correla- tions. They are obviously consistent with the hypothesis that IQ tests measure academic ability, which is later manifested in scholastic perfor- mance, and they have been interpreted in this way, implicitly or explicitly, by many of those who use tests in schools. They are also consistent with the hypothesis that IQ tests, teacher-made tests, and standardized achievement tests all sample the same domain of acquired skills. This ambiguity of interpretation points to an important fact, noted by Messick (1980), among others, that the term "predictive validity" is a misnomer. Prediction is not a kind of validity; prediction does not in itself guarantee that a test measures what it is supposed to measure. (Parental income predicts a child's IQ and school success, but it is surely stretching the term "measure" to call parental income a measure of the child's in- telligence.) What is needed is an explicit theory of intelligence that links this construct to its measures and to other constructs and their associated measures. To draw a physical analogy, there is an explicit theory that links temperature to pressure and volume and, thereby, to the height of a column of liquid in a sealed tube. Without such a theory it would be hard to understand why a thermometer measures the entity that causes water to boil or one's hand to hurt when placed on a hot stove. Belief in the validity of the measure gains strength with repeated confirmation of the theory. In psychometric parlance, this process is "construct validation," and, as Messick and others have argued, construct validity is the only kind there is. Prediction is just one of several kinds of evidence that can be used to support claims of construct validity. Unfortunately, where intelligence is concerned, there are multiple, competing theories, few of them very pre- cise; hence, the evidence of prediction is subject to multiple interpretations. In sum, there are two principal pieces of evidence for the validity of IQ tests as measures of "developed intelligence." One is the convergence of different items and different tests. The other is the association between IQ scores and measures of academic achievement. Both are subject to varying interpretations. The question of interest here is how the evidence bears on the use of tests in educational placement. ceived effort. Overall grade point averages may include nonacademic subjects, for which lit- tle effect of intellectual ability might be expected. Students are likely to be grouped by abil- ity, formally or informally, and graded in comparison to their classmates; such practices imply that the same grade means different things for students in different classes or for stu- dents graded by different teachers and also that the restricted range of variation in IQ within classes will reduce the correlation between IQ and grades.

OCR for page 230
Testing in Educational Placement 239 Critics of testing have argued vehemently that tests are invalid as mea- sures of a child's potential and are, therefore, unfair devices to use for placement. However, they have not spelled out why they would be fair if they did measure potential nor why they are unfair if they measure only acquired skills or developed abilities. Defenders of testing have not con- tested the point about measurement of potential but have justified the use of tests on grounds of predictive validity, apparently believing that the use of tests in educational placement is fair even if tests measure skills that are partially or primarily acquired. In my view, neither the critics nor the defenders (exemplified by the plaintiffs and defendants in Larry P. and in Parents in Action on Special Education v. Hannon) have focused their arguments appropriately. Prediction in itself is not sufficient justification for using tests in educational placement. Nor is the critical shortcoming of tests their failure to measure "potential" or "native intelligence." The key issue is whether tests offer guidance in choosing among educational alternatives. One relevant, if obvious, limitation of prediction has been mentioned in court cases concerned with the use of tests in EMR placement (e.g., U.S. Department of Justice, 1980:A7-A81: Prediction is probabilistic. The fact that a given IQ on the average predicts a specific grade level does not guarantee that any particular child who achieves the given IQ will achieve the predicted grade level. Variation around the predicted level can be quite wide. When the validity coefficient is as high as .6, a child who scores below the 10th percentile (an IQ of roughly 80) would have a 46 per- cent chance of achieving a grade point average in the bottom fifth of the class, hence a 54 percent chance of doing better. The child would have a 17 percent chance of being in the top half of the class. When the-validity coefficient is as low as .2, the child would have only a 28 percent probabil- ity of being in the bottom fifth just 8 percent higher than pure chance. The child would have a 40 percent likelihood of being in the top half of the class (Schrader, 1965~. Even if it is conceded that IQ tests are among the best predictors of school success that we have, the margin of error in an in- dividual case is substantial. (In principle, prediction can be improved by the use of other valid indicators in conjunction with IQ scores. In practice, as indicated earlier, this improvement may or may not be achieved, de- pending on whether additional indicators are in fact collected and used.) A second limitation somewhat paradoxical, given the first is that the predictive information available in the IQ overlaps with that available in the child's grade record or achievement test scores, when the latter are available. Past and current achievement predicts future achievement, typically better than IQ (Crano et al., 19721. Although, as illustrated in the previous paragraph, a substantial portion of the variation in achieve

OCR for page 230
240 TRAVERS ment is independent of IQ and vice versa, prediction based on both IQ and achievement is only a little more accurate than prediction based on achievement alone. (The fact that IQ and achievement measures are not entirely redundant does have important implications, however. In current practice, children are usually referred for testing only after experiencing serious and prolonged difficulty in the classroom. When testing reveals that such children have low IQs, it merely confirms expectations. In some individual cases, however, testing can make a distinctive and positive con- tribution: When children who are performing poorly in class prove to have IQs in the normal range, the discrepancy points to undetected problems that should be diagnosed sensory malfunctions, emotional difficulties, poor or inappropriate instruction, etc. Obviously, this is not to say that high scores are somehow more valid or meaningful than low scores or that predictive equations are different for high and low scores. The point, rather, is that the functional contribution of testing is likely to lie less in improving prediction than in stimulating diagnosis.) A more fundamental limitation concerns the underlying logic of using prediction as a basis for educational placement at all. Even if it could be predicted with certainty that a child with a low IQ will get low grades in a regular class, this fact would not in itself dictate or justify removing the child from the class. Judge Peckham recognized this point when he drew a distinction between testing for educational placement and testing for job placement. Courts have held that employers have a legitimate stake in employee performance and thus are justified in selecting employees on the basis of a test that has demonstrated predictive power (Bersoff, 1979~. But the stake of educators in the performance of children is not analogous. Children, not educators, are the beneficiaries of education, and the public schools have an obligation to teach every child as well as possible. The paramount question is not how to select children who will perform well in regular classes but how to select classes or programs that best meet the needs of children. To justify separate placement on the basis of an IQ score it would be necessary to show that children with low IQs require and profit from a dif- ferent curriculum or different type of instruction from that available in regular classes. (Alternatively, separate placement might be justified if it could be shown that children with low IQs are not harmed by it, while children in regular classes are harmed when children with low IQs share those regular classes.) Educational researchers call situations in which dif- ferent educational approaches work best with children of different initial ability "aptitude-treatment interactions" (Cronbach and Snow, 1977~. It has been urged that demonstration of aptitude-treatment interactions is the appropriate way to validate tests for use in educational placement,

OCR for page 230
Testing in Educational Placement 251 within the range of scores and ages most relevant to the panel's work. There is at best scattered evidence for bias in aspects of the testing situa- tion external to the test itself; however, this issue merits further study under field conditions. There is little evidence that bias lodges in particu- lar test items, but this does not preclude the possibility of generalized bias across all items. In general there has not been consistent evidence for dif- ferential predictive validity of tests across ethnic groups, although such evidence has been found in several influential but controversial studies. On balance it must be concluded that bias in the technical sense con- tributes little either to explaining group differences in IQ or to shaping placement policy. No study I have encountered suggests that the magni- tude of any bias effect, or even several combined, comes close to explain- ing all of the differences in IQs between whites and minorities. It is unlikely that elimination of psychometric bias, in the absence of other changes in policy and practice, would have much effect on the IQ scores of minority children or the proportion assigned to EMR classes. It is important to recognize the limited import of this conclusion. The conclusion relates only to technical bias and says nothing about fairness in test use or about ethnic or racial bias in the interpretation of test scores or bias in the educational system or in society at large. Psychometric investi- gations of bias do not address many of the larger concerns of educators, policy makers, and the public, most of whom use the term "bias" more broadly than the technical definition allows. For example, these investiga- tions ignore the problem of bias in the criteria: If school grades and/or achievement test scores underestimate the academic attainment of minor- ity students as tests allegedly underestimate their abilities it would be no justification of testing, from a moral or policy standpoint, to find that prediction was perfect. In addition, as we saw at the beginning of this sec- tion, many persons outside the field of psychological measurement define bias as any contribution of sociocultural factors that raise or lower the IQ scores of one group relative to another. There is simply no doubt that there is some cultural contribution, as even the firmest believers in genetic determination of IQ would admit. I take up the issue of the relative size of this contribution in the next section, but I also argue that the issue is less important for policy in the area of educational placement than it may seem. WHAT CAUSES INDIVIDUAL AND GROUP VARIATIONS IN IQ? No question in psychology has provoked more bitter debate than that sur- rounding the determinants of variation in IQ scores. In recent years the controversy has centered on the relative contributions of heredity and en

OCR for page 230
252 TRAVERS vironment to the 15-point average difference usually found between the IQ scores of blacks and whites. I survey some of the main lines of evidence briefly and then consider the relevance of the entire debate for educational policy and practice. The hereditarian viewpoint has had a sporadic history in psychology generally and in the field of IQ testing particularly. Alfred Binet, whose work in the Paris schools in the early 20th century initiated modern ability testing, vociferously denied that his test measured innate ability. How- ever, many of the American and British psychologists who translated, modified, and used Binet's instrument took the contrary view. Some ex- pressed their opinions in the public-policy arena and were associated with the eugenics and anti-immigration movements (Kamin, 1974~. As we have seen, the assumption that "IQ tests" measure or are supposed to measure innate intelligence is still shared by many outside the measurement field, although most professionals in the field reject it. Arthur Jensen's article in the Harvard Educational Review (1969) re- vived the hereditarian viewpoint within the field and provoked a debate that still continues. Jensen's paper attempted to show that IQ tests measure general intellectual ability, that this ability is of great social impor- tance, and that educational intervention has relatively little effect on in- dividual differences in IQ. Examining correlations among IQs of persons in various biological kinship relations, Jensen concluded that the data can be well explained by postulating that intelligence is a polygenic trait and that 80 percent of its phenotypic variation is due to underlying genotypic . , ~ variation. Others, using similar techniques of "heritability estimation" but with somewhat different models, assumptions, or data, have arrived at lower estimates, in the neighborhood of 0.5 (e.g., Jencks et al., 1972; Plomin and DeFries, 19801. One thorough and dispassionate review (Loehlin et al., 1975) reached a summary estimate only a little lower than Jensen's for the heritability of individual variations in IQ within European and Caucasian populations. The reviewers found that estimates of heritability within the black population were less consistent and often lower than estimates for whites, although they still pointed to a substantial genetic component. However, Loehlin et al. note that there is considerable room for disagree- ment about the technical details of heritability calculations; existing evi- dence is hence consistent with a very broad range of within-group herita- bility coefficients. A number of factors create difficulties for the statistical techniques, borrowed from population genetics, that are used to estimate heritability. For example, one widely noted problem is the confounding of heredity and

OCR for page 230
Testing in Educational Placement 253 environment: Innately bright parents are likely to provide their children with a lot of intellectual stimulation; innately bright children are likely to elicit stimulation from others and to find or create it in their physical en- vironments (Scarr, 1981~. Similarly, patterns of biological relationship are likely to mirror patterns of environmental similarity. For example, cousins share fewer genes than siblings, but they are also likely to grow up in less similar environments. As Loehlin et al. (1975) point out, most techniques for estimating heritability confound the purely genetic contribution with the contribution of the gene-environment correlation. To get a meaningful heritability estimate for a given trait in a given population, it is necessary to sample the relevant ranges of genotypes and environments and to specify correctly the statistical model that describes their separate and joint contribution to the phenotype. Some skeptics (e.g., Layzer, 1972) doubt that techniques of heritability estimation can be legitimately applied to IQ data, given the limitations of existing data, the imprecision of existing definitions and theories of intelligence, and our ignorance about possible environmental influences and gene-environment interactions. Probably this rather arcane controversy over the proper use of statistics in estimating the heritability of traits would have aroused little public at- tention had Jensen not gone beyond his discussion of individual differ- ences in IQ to speculate that group differences, specifically black-white differences, are also partly genetic in origin. Jensen wrote (1969:82~: So all we are left with are various lines of evidence, no one of which is definitive alone, but which, viewed all together, make it a not unreasonable hypothesis that genetic factors are strongly implicated in the average Negro-white intelligence dif- ference. The preponderance of the evidence is, in my opinion, less consistent with a strictly environmental hypothesis than with a genetic hypothesis, which, of course, does not exclude the influence of environment or its interaction with genetic factors. This conjecture was not based on direct examination of data on the causes of racial differences but rather was an extension of Jensen's main discussion, which, as already noted, dealt with individual differences within ethnic groups. Jensen's critics have stressed that average group dif- ferences in a particular trait can be due mostly or entirely to the environ- ment even if the heritability of the trait within groups is very high. In an attempt to address the issue of between-group variance as directly as possible, Loehlin et al. (1975) reviewed a number of studies relating IQ to various indices of racial mixture. Some of these studies examined corre- lations between IQ and race-linked characteristics such as skin color and blood-group distributions. Others examined IQ distributions associated

OCR for page 230
254 TRAVERS with various patterns of interracial mating. One particularly interesting study traced the genealogies of black children with extremely high IQs (and found no evidence for increased European admixture, compared with the black population at large). While careful to point out that the results of these studies "are consistent with either moderate hereditarian or environmentalist interpretations, " Loehlin et al. ( 1975:238) suggest that the findings are "more easily accommodated in an environmentalist framework." (In an appendix they estimate between-group heritability at .125, though the estimate is cautious and tentative.) A similar conclusion can be reached regarding other studies, indicating that the size of the IQ gap between blacks and whites is inversely related to the degree of the black child's exposure to white, middle-class culture and schooling. These include classic studies of black families who migrated from the rural South to the urban North (Klineberg, 1935), studies of in- terracial adoptions (Scarr and Weinberg, 1976), and studies of the effects of sociocultural variations within the black community (Mercer, 1979~. The foregoing cursory glance at a large and complex literature will not satisfy either supporters or critics of the hereditarian position. It merely indicates some of the areas in which scientific controversy exists. The im- portant points for purposes of this discussion are (1) that controversy does exist; science has not yet provided definitive answers to the nature-nurture question and perhaps never will and (2) that virtually everyone involved in the controversy agrees that both genetic and experiential factors influence IQ; what is at issue is the degree of influence and the mechanisms involved. The relevant question is whether there are policy decisions or practices having to do with educational placement or instruction that hinge on reso- lution of the issue. Courts have held that the issue is indeed central. In Lacy P., for exam- ple, Judge Peckham argued that EMR classes are (according to defini- tions adopted by the California Department of Education)7 intended for children who are congenitally unable to learn in regular classes; to be valid for purposes of placing children in such classes, the judge reasoned, tests 7California's EMR classes were intended for "pupils whose mental capabilities make it im- possible for them to profit from the regular instructional programs" (Larry P. v. Riles, 1979, Sec. IIIC). EMR children were distinguished (in a 1963 law) from "culturally disadvantaged minors," who are "potentially capable of completing a regular educational program" but unable to do so because of "cultural, economic and like disadvantages." EMR children were also distinguished from "educationally handicapped" children, who "cannot benefit from the regular educational program" because of "marked learning or behavioral disorders or both" (Larry P. v. Riles, 1979, Sec. IIIB). Given the historical definitions of the latter two categories, Judge Peckham not unreasonably construed the EMR category as applying to children who are congenitally unable to learn.

OCR for page 230
Testing in Educational Placement 255 must be capable of identifying congenital disability. (See Larry P., Sec- tions IIIC and VB(4), and the analysis of the decision by Smith, 1980~. The assumption that mental retardation is by definition innate is one that professionals concerned with the problem abandoned long ago. The American Association on Mental Deficiency, for example, cites "signifi- cantly subaverage general intellectual functioning" and "deficits in adap- tive behavior" as the defining conditions (Grossman, 1977:5~. It can, of course, be debated whether this is an appropriate definition or whether IQ is an appropriate measure of intellectual functioning. Nevertheless, given the definition, it is not necessary to show that the deficient intellectual functioning (arguably) signalled by a low IQ is inborn in order to say that a child is "mentally retarded" according to the stated functional criteria. The medical profession has been more explicit in defining mental retarda- tion as a purely functional category that may have many different causes, experiential as well as organic. (For a lucid discussion of contemporary definitions see Goodman, 1977.) It appears that there is a wide gap be- tween the assumptions and definitions embraced by leaders in the field and those embodied in administrative procedures of some school systems. The latter assumptions apparently guided the Larry P. decision. Professionals have abandoned the organic definition of mental retarda- tion in favor of the functional definition for both scientific and moral rea- sons that seem compelling. Organic causes can be identified in a small proportion of cases of mild mental retardation. However, there is no evi- dence that different educational procedures are needed, or work better, for organically disabled children, compared with other children with simi- lar functional abilities but no (known) organic deficit (Goodman, 1977~. There is no evidence that it is any easier to teach the latter group than the former or that their prognosis for future success is any worse. Good teaching can do a great deal to help even children with organic disabilities meet their potential; conversely, poor performance that is socially caused is just as hard to correct as poor performance that is organically caused- at least up to the limits of present scientific knowledge and instructional techniques. Moreover, different views of the relative contributions of genetic and environmental factors in no way affect the responsibility of schools to pro- vide the best instruction possible. There will always be differences in abil- ity and achievement among students, and schools will always have to deal with these differences, regardless of their causes. To be sure, schools face difficult questions about how to allocate resources among students with different levels of developed academic ability. However, there is apparently no basis in current knowledge for believing that investment in the edu- cation of students of low ability with environmentally caused deficiencies

OCR for page 230
256 TRAVERS will pay off (in future performance or social contribution) more than in- vestment in the education of those with congenital disorders. If it is indeed the case that treatment of educational disability is in- dependent of the cause of the problem, it is hard to see why different beliefs about the relative contributions of genes and environment to IQ should have any educational import. Earlier we saw that a wide range of academic performance is consistent with any given IQ score. The job of the educator is to make sure that performance is as good as it can be. Though a teacher, administrator, or policy maker with hereditarian views might be pessimistic about the likelihood of large gains in underlying in- tellectual ability, this pessimism would be no justification for failing to im- part as many skills and as much knowledge as possible. I am not denying that negative expectations can potentially do harm; they probably can, whether they are based on beliefs in genetic or cultural inferiority of minority groups. I am arguing that they should not be allowed to do harm that such beliefs provide no legitimate basis for educational policies or practices that would in any way restrict children's progress. Decisions about curricula and teaching methods to be used with children at different levels of initial performance as well as decisions about whether to teach these children separately or together can and should be based on the demonstrated pedagogical effectiveness of the various approaches, not on preconceptions about the causes of initial differences in performance. Finally, one's position on the nature-nurture question gives little or no guidance as to the degree of racial imbalance in special education place- ment that one should be willing to tolerate. As long as there are separate classes or programs for children who are significantly lacking in tradi- tional academic skills, both environmentalists and hereditarians would expect minority children to be overrepresented in such classes, at least for the immediate future. Critics of IQ testing and EMR classes (e.g., the plaintiffs in Larry P. and PASS) have argued that the nativist connotations of terms such as "intelligence" and "mental retardation" are deeply ingrained. Children are harmed because people misinterpret the meaning of IQ scores and EMR placements, stigmatizing children and denying them educational opportunities. None of the evidence reviewed in this section bears on the truth or falsity of such claims. The arguments in this section of the paper have not dealt with the actual political and educational consequences of hereditarian versus environmentalist views. The arguments have been in- tended to make one fundamental point: Given current knowledge, there is no logical or necessary connection between the heritability of IQ and edu- cational practice.

OCR for page 230
Testing in Educational Placement CONCLUSIONS 257 Two kinds of conclusions have been sprinkled liberally throughout this paper and need not be repeated here: judgments about the weight of the scientific evidence on various empirical issues that have been raised and value-based arguments about the implications of these judgments for edu- cation policy. In this final section I will draw a few more general lessons and reflect on their implications for the work of the panel. One general lesson is that there is less articulation between the concerns of the public and the concerns of specialists in psychological measurement than might be expected, given their common agreement on the impor- tance of the issues. Specialists have succeeded in formulating and answer- ing an array of specific questions regarding aspects of test validity, bias, and the like. Other questions, however, remain ill formulated or unan- swered; many of the latter questions are important to the nonspecialist nr1 fiery in his or her legitimate definition of validity, bias, etc., even if _ _ =~ _ O they do not figure in specialists' definitions. By the same token, nonspe- cialists including some who are highly knowledgeable about education policy and legal aspects of testing- have often failed to recognize scien- tifically important distinctions among possible interpretations of connota- tively loaded terms, such as intelligence, validity, and bias. A second lesson is that standardized ability tests, as currently conceived and constructed, will inevitably contribute to disproportionate placement of minority children in classes for mildly mentally retarded students (or classes by any other name that are designed to serve children whose prog- nosis for success in school is poor). The reasons for this bleak conclusion are deeply rooted in the natures of the tests, of the schools, and of society. As long as new tests are built on the same logic as old ones, namely the logic of inferring ability from achieved performance across a wide variety of specific "intellectual" tasks, they will continue to tell us what we already know that children who grow up outside the mainstream are likely to have trouble in school. They will not help us resolve the ambigui- ties of potential and achievement, of nature and nurture, that plague the existing tests. There are some new, experimental approaches to testing based on Piagetian developmental theory, on direct observation of the child's learning in novel situations, and even on measures of neurological functioning, such as electroencephalograms. It is impossible to say at this juncture, however, how much hope we should pin on them. For the fore- seeable future, decisions about public policy and educational practice will have to be based on tests as they are. Fortunately, many such decisions can be made in the face of a great

OCR for page 230
258 TRAVERS deal of ambiguity about the meaning of tests. This is the third and most important lesson to be drawn from this paper. Debates over validity, bias, and the causes of group differences have a hypnotic quality because of the connotations of the word "intelligence" and the specter of genetic predes- tination. But the debate distracts our attention from what should be our central concern, namely how to improve education, particularly for chil- dren who are not doing well in the school system as it currently exists. It is striking that some scholars who disagree fundamentally about the nature of IQ tests, such as Jane Mercer and Arthur Jensen, are in agree- ment about many aspects of the proper use of tests in evaluating children for placement in classes for mentally retarded children. Both Mercer and Jensen agree that tests tell us something about a child's level of school functioning and that they deserve a place in an assessment battery. Both agree that full diagnostic assessment should take place only when children have had trouble in the classroom; tests and other assessment procedures should not be used as general screening devices. Both agree that IQ tests alone should not determine placement but should be used in tandem with information about other characteristics of the child, notably the child's capacity to function in nonschool environments and roles, and the pres- ence of any neurological, sensory, or other physical problems. To be sure, Mercer and Jensen would disagree about using information on the child's sociocultural background to interpret or adjust IQ scores, but the areas of agreement are substantial. It seems that serious theoretical disagreements are consistent with surprisingly similar practical recommendations. If so, one can only wonder about the wisdom of dragging the theoretical dis- agreements into the courts. One consequence of the current focus of debate is that judges have been forced to deliberate about scientific controversies that they are ill equipped to consider. It is not surprising that their conclusions are sometimes con- tradictory. But judges (and policy makers) are well equipped to consider other kinds of issues; given the ambiguous meanings of test scores, and given the consequences of placement, is it consistent with established legal standards of fairness to use tests as placement devices? Are some uses fair, while others are not? This way of framing questions puts them squarely in the court of values and legal definitions and precedents. The consequences of placement will surely play a central role in any such deliberation. Regardless of the intrinsic merits of tests or alternative placement procedures, it is hard to justify the use of any device to sort chil- dren or prescribe educational programs, unless there are demonstrated educational benefits attached to the sorting or prescription. In Larry P. Judge Peckham concluded that EMR classes are "educationally dead- end, isolated and stigmatizing." Given the issues raised in the case, it was

OCR for page 230
Testing in Educational Placement 259 necessary for the judge to go on to examine discrimination in placement procedures; had his purpose been to decide whether schools and society were meeting their responsibilities, however, he need not have looked fur- ther. If "special" classes (particularly EMR classes) convey no special benefits and involve no remedial instruction, it is hard to justify placing any children in them, regardless of race. If minority children are overrep- resented in such classes, they are being disproportionately harmed; the basis for placement doesn't much matter. If, on the other hand, special classes do convey demonstrable benefits, disproportionate placement does not represent disproportionate harm. The benefits of the classes must be weighed against their costs, e.g., the cost of separateness per se. If we are going to fight about IQ tests (or EMR classes) we should be fighting about what they do or do not contribute to learning. Proponents should try to show that tests give information, not available through other practical means, that can be used to match instruction to children's per- fo~ance. Opponents should be trying to show that there are better ways to channel children into the most effective instructional situations. If the panel can help refocus public debate in this manner, it will have done a great service. REFERENCES Bersoff, D. N. 1979 Regarding psychologists testily: regulation of psychological assessment in the public schools. Maryland Law Review 39(1):27-120. Cleary, T. A., Humphreys, L. G., Kendrick, S. A., and Wesman, A. 1975 Educational uses of tests with disadvantaged students. American Psychologist 30: 15-41. Cole, N. S. 1981 Bias in testing. Americar' Psychologist 36:1067-1077. Crano, W. D., Kenny, D. A., and Campbell, D. T. 1972 Does intelligence cause achievement?: a cross-lagged panel analysis. Journal of Educational Psychology 63:258-275. Cronbach, L. J., and Snow, R. E. 1977 Aptitudes and Instructional Methods: A Handbook for Research 0~2 Interactions. New York: Irvington. Parr, J. L., O'Leary, B. S., Pfeiffer, C. M., Goldstein, I. L., and Bartlett, C. J. 1971 Ethnic Group Membership as a Moderator in the Prediction of Job PerJor,na'~ce: An Examination of Some Less Traditional Predictors. AIR technical report no. 2. Washington, D.C.: American Institutes for Research. Glaser, R., and Bond, L., guest eds. 1981 Testing: concepts, policy, practice and research. America,' Psychologist (special issue). Goldman, R. D., and Hartig, L. K. 1976 The WISC may not be a valid predictor of school performance for primary-grade minority children. American Journal of Me''tal Deficiency 80(6):583-587.

OCR for page 230
260 TRAVERS Goodman, J. F. 1977 The diagnostic fallacy: a critique of Jane Mercer's concept of mental retardation. Journal of School Psychology 15: 197-205. Gordon, R. A. 1980 Examining labeling theory: the case of mental retardation. Pp. 111-174 in W. R. Gove, ea., The Labeli'2g of Deviance: Evaluc~ti,7g a Perspective. Beverly Hills. Calif.: Sage Publications. Green, R. L., and Farquhar, W. W. 1965 Negro academic motivation and scholastic achievement. Journal of Educational Psychology 56:241-243. Grossman, H. J., ed. 1977 Manual 0~? Terminology arid ClassiJicatio'2 ill Me''tal Retardatio,'. American Asso- ciation on Menta} Deficiency. Baltimore, Md.: Garamond/Pridemark. Hoffman, B. 1962 The Tyra,`'zy of Testing. New York: Crowell-Collier. Jencks, C., Smith, M., Acland, H., Bane, M. J., Cohen, D., Gintis, H., Heyns, B., and Michelson S. 1972 Inequality: A Reassess''`e~zt of the Effect of Family arid Schooling i,' A',~erica. New York: Basic Books. Jensen, A. R. 1969 How much can we boost IQ and scholastic achievement? Harvarcl Educational Re- vieu' 39~1~:1-123. 1980 Bias i'' Me'2taf Testing. New York: Free Press. Kamin, L. J. 1974 The Science card Politics ol IQ. New York: John Wiley & Sons, Inc. Kaufman, A. 1975 Factor analysis of the WISC-R at 11 age levels between 6~/: and 16~/2 years. Journal of Consulting arid Cli''ical Psych ol`,gy 43: 135-147. Klineberg, O. 1935 Negro I''tellige'~ce arid Selective Migratio''. New York: Columbia University Press. Larry P. v. Riles 1979 495 F. Supp. 926 (N. D. Cal, 1979) (decision on merits) appeal docketed No. 80.4027 (9th Cir., Jan. 17, 1980). Layzer, D. 1972 Science or superstition?: a physical scientist looks at the IQ controversy. Cog/2itio'2 1 :265-299. Linn, R. 1982 Ability testing: individual differences, prediction, and differential prediction. Pp. 335-388 in A. K. Wigdor and W. R. Garner, eds., Ability Testing: Uses Conse- quences and Controversies Vol. II. Report of the Committee on Ability Testing, National Research Council. Washington, D.C.: National Academy Press. Loehlin, J. C., Lindzey, G., and Spuhler, J. N. 1975 Race Differences in Intelligence. San Francisco, Calif.: W. H. Freeman. Mercer, J. 1979 System of Multicultural Pluralistic Assessment Technical Manual. New York: Psy- chological Corporation. In What is a racially and culturally nondiscriminatory test? In E. R. Reynolds and R. press T. Brown, eds., Perspectives on Bias in Mental Testing. New York: Plenum. Messe, L. A., Crano, W. D., Messe, S. R., and Rice, W. 1979 Evaluation of the predictive validity of tests of mental ability for classroom per- formance in elementary grades. Journal of Educational Psychology 71 :233-241.

OCR for page 230
Testing in Educational Placement Messick, S. 1980 Test validity and the ethics of assessment. American Psychologist 35:1012-iO27. Parents in Action on Special Education (PASE) v. Hannon 1980 No. 74-C-3586 (N. D. III. 1980). Petersen, N., and Novick, M. 261 1976 An evaluation of some models for culture-fair selection. Jo,urnal of Educational Measurement 13:3-31. Plomin, R., and DeFries, J. C. 1980 Genetics and intelligence: recent data. Intelligence 4:15-24. Reschly, D. J. 1981 Psychological testing in educational classification and placement. American Psy- chologist 36:1094-1102. Reschly, D. J., and Reschly, J. E. 1979 Validity of WISC-R factor scores in predicting achievement and attention for four sociocultural groups. Journal of School Psychology 17:355-361. Reschly, D. J., and Sabers, D. L. 1979 Analysis of test bias in four groups with regression definition. Journal of Educa- tional Measurement 16(1):1-9. Sandoval, J. 1979 The WISC-R and internal evidence of test bias with minority groups. Journal of Consulting and Clinical Psychology 47:919-927. Sattler, J. M. 1974 Assessment of Children's Intelligence. Philadelphia, Pa: W. B. Saunders Com- pany. Scarr, S. 1981 Testingfor children: assessment and the many determinants of intellectual compe- tence. American Psychologist 36:1159-1166. Scarr, S., and Weinberg, R. A. 1976 IQ test performance of black children adopted by white families. American Psy- chologist 31: 726-739. Schrader, W. B. 1965 A taxonomy of expectancy tables. Journal of Educational Measurement 2:29-35. Smith, E. 1980 Test validation in the schools. Texas Law Review 58:1123-1159. U.S. Department of Justice 1980 Post-Trial Memorandum of the United States. Amicus Curiae brief filed in PASE v. Hannon. Wigdor, S., and Garner, W. R., eds. 1982 Ability Testing: Uses, Consequences, and Controversies. Vols. 1 and 2. Report of the Committee on Ability Testing, National Research Council, Washington, D.C.: National Academy Press.