Skip to main content

Currently Skimming:

3. The Role of Intellectual Assessment
Pages 69-140

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 69...
... intelligence tests used commonly in the diagnosis of mental retardation; (3) assessment conditions that affect examiners' 69
From page 70...
... psychometric considerations in the selection and application of intelligence tests for diagnosing mental retardation, including test fairness. HISTORICAL PERSPECTIVE ON THEORY AND PRACTICE History of Development of Tests of Intelligence The use of intelligence tests in the process of diagnosing mental retardation dates back to the turn of the 20th century, when Alfred Binet and Theodore Simon developed an intelligence test for that purpose.
From page 71...
... The current edition of the Stanford-Binet Intelligence Scale includes a Seguin-like form board, which has resulted in the merger of efforts by Binet, Simon, and Seguin, the three pioneer European test developers, in a contemporary American instrument. Nonverbal intelligence testing has a history paralleling that of traditional language-loaded intelligence tests with the publication of many nonverbal scales during the early 1900s.
From page 72...
... Wechsler Scales Since the onset of mental testing with the Stanford-Binet and the application of group intelligence testing procedures, a plethora of individual and group tests have been developed in the United States for assessing overall intelligence and diagnosing subnormal intellectual functioning in infants, children, adolescents, and adults. Most prominent among the post-Binet instruments was a series of intelligence tests developed by David Wechsler (1939, 1949, 1955, 1967, 1974, 1981, 1989, 19911.
From page 73...
... During the past 20 years, a number of intelligence tests have been published as alternatives to the Stanford-Binet and the Wechsler scales. Currently psychologists have an impressive array of instruments differing in their features, theoretical orientations, length and complexity, and technical quality from which to select.
From page 74...
... published two groundbreaking statistical papers, one on basic methods of correlational analysis and the other that laid the foundation for factor analysis. The factor analytic techniques that Spearman ~ 1904a)
From page 75...
... This information undergirds the committee's recommendations regarding the intelligence test scores that can best be used for eligibility decisions. Spearman's Two-Factor Theory Spearman (1904a)
From page 76...
... borrowed from the Industrial Revolution. Arguing that g, or general intelligence, could be likened to or identified with mental energy, Spearman also hypothesized that individual differences in mental energy were largely genetic in origin.
From page 77...
... This approach discarding tests that led to failure to confirm his theory—was a common one for Spearman, who discarded recalcitrant tests in several analyses reported in his major empirical work on mental abilities (Spearman, 19271. As a result, Spearman's two-factor theory has equivocal support, because any indication of lack of fit was
From page 78...
... Thurstone's Primary Mental Abilities During the 1930s, L.L. Thurstone and his colleagues pursued a program of research designed to identify the basic set of dimensions that span the ability, or intelligence, domain.
From page 79...
... quickly applied oblique rotations to the primary mental abilities and found substantial correlations among the seven dimensions. The correlations among the primary mental abilities were well described by a single second-order factor, which Thurstone and Thurstone argued provided a way to reconcile Spearman's theory with their own.
From page 80...
... Indeed, Spearman (1939) argued that the primary mental abilities were rather trivial and narrow, and that the second-order general factor, or g, should be considered the principal or primary factor, rather than being relegated to second-order importance.
From page 81...
... The Thomson explanation for the hierarchy of mental abilities may lead to a number of reactions. One may become highly suspect of factor analytic approaches, as one set of empirical results, with a dominant general factor, is consistent with diametrically opposed generating mechanisms a single entity common to all tests (e.g., Spearman)
From page 82...
... Cattell (1963) proposed a new theory of ability structure, subsequently referred to as the theory of fluid and crystallized intelligence.
From page 83...
... Thus, verbal comprehension, or the ability to extract meaning from text, is a crystallized ability assessed using tests of vocabulary, paragraph comprehension, and the understanding of proverbs, among others. All of these tests require one to extract the meaning from text using stored meanings of words in the lexicon.
From page 84...
... made a further contribution to the understanding of mental abilities by distinguishing between the order and stratum of a factor. The order of a factor is a superficial, methodological aspect of the analysis in which a factor is identified, whereas the stratum a factor occupies is a deeper, theoretical concern regarding the nature and breadth of the factor.
From page 85...
... Gc (crystallized intelligence) , which has verbal comprehension, semantic relations, numerical facility, mechanical knowledge, syllogistic reasoning, verbal clo
From page 86...
... , which represents a variety of fluency dimensions, such as delayed retrieval, associational fluency, expressional fluency, ideational fluency, word fluency, and originality; (7) Gs (general speediness or processing speed)
From page 87...
... Another type of evidence is derived from studies of heritability. Gf-Gc theory makes certain predictions regarding heritability, or the degree of genetic variance in ability factors.
From page 88...
... Moreover, the well-replicated structural results are leading the developers of intelligence tests to incorporate measures of Of and Gc, in addition to an overall IQ in the scoring of their instruments. Carroll's Three-Stratum Theory In 1993, John B
From page 89...
... R (broad retrieval) , with level indicators of originality and creativity and speed indicators of ideational fluency, associational fluency, expressional fluency, word fluency, and figural fluency; (7)
From page 90...
... Carroll also identified the general factor as corresponding to Spearman's g, the mental ability common to all tests of ability, also a position that Cattell might have favored. However, for more than 25 years, Horn has been responsible for the current synthesis of the Horn-Cattell model.
From page 91...
... In addition to these theories based on factor analysis, several additional theories of the structure of mental abilities have been developed. Most of these other theories have been based on a priori theory or summaries of previous research, but have relied much less or not at all on sophisticated measurement techniques such as factor analysis.
From page 92...
... The architectural system is assumed to be genetically, or at least biologically, based and consists of basic operating parameters of cognitive processes, encompassing individual differences in (1) amount of information that can be processed, which is assessed using memory span, (2)
From page 93...
... musical intelligence, involving individual differences in rhythm and pitch and skills in composing music; (3) logical-mathematical intelligence, including logical reasoning and number abilities; (4)
From page 94...
... During the next decade, even greater alignment of intelligence tests and the IQ scores derived from them and the Horn-Cattell and Carroll models is likely. As a result, the future will almost certainly see greater reliance on part scores, such as IQ scores for Gc and Gf, in addition to the traditional composite IQ.
From page 95...
... Also, several brief or unidimensional intelligence tests are currently available for the screening of intellectual functioning (e.g., Kaufman Brief Intelligence Test, KBIT Kaufman & Kaufman, 1990; Test of Nonverbal Intelligence: Third Edition, TONI-III Brown et al., 19971. Although these brief tests may have merit for use as cognitive screeners, they are best suited for low-stakes decision making because of their brevity and limited sampling of important theoretical facets of intelligence.
From page 96...
... 96 MENTAL RETARDATION TABLE 3-1 Comprehensive Tests of Intelligence Intelligence Age Publication Publish Test Rangea Date Levelb Bayley Scales of Infant Development-ll Birth to 42 months 1993 C Cognitive Assessment System 5-0 to 17-11 1997 C Differential Ability Scaler 6-2 to 17-11 1990 C Kaufman Assessment Battery for Children 6-2 to 12-6 1983 Kaufman Adolescent and Adult Intelligence Test 11-0 to 85+ 1993 Leiter International Performance Scale-Revisede Mullen Scales of Early LearningC Stanford-Binet Intelligence Scale: Fourth Editionb Universal Nonverbal I ntelligence Teste 2-0 to 20-0 1997 Birth to 68 months 1995 2-0 to 24 1986 5-0 to 17-11 1998 Wechsler Adult Intelligence Scale-lll 16 to 89 1997 Wechsler Intelligence Scale for Children-111 6-0 to 16-11 1991
From page 97...
... THE ROLE OF INTELLECTUAL ASSESSMENT 97 blication te Publisher Levelb Appropriate for MR Appropriate Scores 93 C Conditionals Mental development index 97 C Yes Full-scale standard score 90 C Yes Verbal ability Nonverbal ability General conceptual ability 83 C Yes Mental processing composite 93 C Yes Fluid scale Crystallized scale Composite intelligence scale 97 C Yes Full-scale IQ 95 C Conditionals Early learning composite 86 C Yes AbstracVvisual reasoning Verbal reasoning SAS composite 98 C Yes Reasoning Memory Full-scale IQ 97 C Yes Verbal scale Performance scale Full-scale IQ 91 C Yes Verbal scale Verbal comprehension index Performance scale Perceptual organization index Full-scale IQ Continued on next page
From page 98...
... bTest publishers use criteria for purchasing tests, with different levels of tests requiring different levels of training and/or credentials. Most comprehensive intelligence tests are known as Class C tests, which require the highest level of training and credential to purchase.
From page 99...
... eThe Leiter-R and UNIT are explicitly designed to assess intelligence in a nonverbal administration format. Such tests are employed when language-loaded intelligence tests may provide distorted portrayal of the client's current level of intellectual functioning due to limited English proficiency, language-related disabilities (e.g., verbal learning disability, speech disorders)
From page 100...
... Most of the instruments cited in the table assess between three and five cognitive factors, with support for their theoretical underpinnings adequate to warrant their use in the diagnosis of mental retardation. ASSESSMENT CONDITIONS THAT AFFECT INTELLIGENCE TEST SCORES Intelligence test scores have considerable weight in diagnostic determination of mental retardation.
From page 101...
... In instances in which examinees have had an ongoing history of physical illness or mental health problems, the effects of these conditions on the examinee's cognitive functioning must also be considered. Examiners must also ensure that examinees have the requisite skills to perform all intelligence test tasks and activities when selecting instruments or assessment procedures.
From page 102...
... Similarly, individuals with impaired motor skills should not be examined using materials that require fine motor dexterity or processing speed. Examinees with vision, motor, or visual-motor handicapping conditions might better be assessed on verbally loaded measures of intelligence to remove the construct-irrelevant influences of these noncognitive handicapping conditions on their cognitive assessment results.
From page 103...
... Subtests that are spoiled for any reason (e.g., examiner, examines, environmental) loaded intelligence tests, but rather should be assessed with nonverbal tests of intelligence.
From page 104...
... The ethical standards of American professional ancl scientific psychological associations, like the American Psychological Association ancl the National Association of School Psychologists, require that psychologists not engage in services for which they lack competence. Whether the examiner is a psychologist or holds other acceptable creclentials to provide psychological assessment services, it is essential that examiners provide only those services they are competent to perform.
From page 105...
... Environmental Conditions Intellectual assessments should be conducted in settings that are optimal for eliciting the examinee's best performance. Office furniture should be appropriately sized and safe for clients of all ages; for example, preschool children should be seated in small chairs for safety and comfort.
From page 106...
... The section on test standards in this chapter addresses these relevant psychometric considerations in more detail. USE OF TOTAL TEST SCORES AND PART SCORES i' Whenever the validity of one or more part scores (subtests, scales)
From page 107...
... presented a sound argument for generally using total test scores in decision making. His recommendation was to use instruments' total scale composite scores (for example, a composite IQ)
From page 108...
... , these global scores tend to be highly correlated and share a common source of variance: general intelligence or psychometric g. In this sense, there is little practical difference between what the total test scores are called; they are all representations of overall intelligence and historically have been referred to as IQ or full scale IQ.
From page 109...
... Part Scores There are occasions when a total test score may not be the best indicator of an individual's overall intellectual functioning, and the examiner must resort to interpreting one of the instrument's part scores as the best indicator of overall intellectual functioning. In such cases, the instrument's total test score may offer little more than an awkward and artifactual "average" of a number of relatively disparate subtests or subscales (i.e., part scores)
From page 110...
... Differences of this magnitude, although statistically significant, are not unusual or rare occurrences in the general population. Before determining that a total test score is not an optimal representation of the examinee's overall intellectual functioning, the examiner must consider both the statistical significance of the difference
From page 111...
... In such cases, the total test score would generally be considered invalid as a measure of the examinee's "true" overall intellectual functioning because limited English facility, and not limited overall intelligence, is likely to have adversely affected and rendered invalid the examinee's assessed verbal IQ and, consequently, the composite IQ. The client's language difficulty consequently would have had the adverse effect of reducing the composite IQ in direct proportion to its influence on the person's performance on the verbal scale.
From page 112...
... From a Thurstonian perspective, Gc maps closely onto the construct of verbal comprehension and Gfmaps closely onto Thurstone's concept of reasoning. In a multiple-instrument factor analysis of the Woodcock-Tohnson PsychoEducational Battery and the Cognitive Assessment System, subtests from both of these broad ability factors load at moderate to high levels (.60s - .70s)
From page 113...
... Magnitude of the Total Test Score The last issue when considering whether a part score should be used in place of a total test score is the magnitude of the existing total test score. That is, when scale score discrepancies meet the previously mentioned criteria of significance and meaningfulness, the total test score may be simply too high to support a diagnosis of mental retardation.
From page 114...
... Therefore, the final criterion for deciding whether or not to use part scores in place of the total test score in the diagnosis of mental retardation is that, no matter how great the discrepancy between relevant subscales, individuals with total test scores greater than 75 should not be diagnosed as having mental retardation.] Composite scores from intelligence tests should be used routinely in mental retardation diagnosis, except when the composite IQ validity is in doubt, in which case an appropriate part score may be used in its place.
From page 115...
... IQ from an individually administered intelligence measure. In common clinical practice, this usually results in the use of a Wechsler VIQ, PIQ, or FSIQ, a situation that unfairly privileges one set of intelligence tests and has the effect of discouraging innovation on the part of other test developers.
From page 116...
... The committee's examination of the structure of intelligence suggests that part scores that measure crystallized and fluid intelligence are the most appropriate part scores to use in these situations. Also, the committee recognizes that many of these abilities are measured by a wide number of intelligence tests, not just Wechsler measures, and therefore recommends that SSA expand in its listings the use of examples of other apropriate tests that yield g-loaded part scores.
From page 117...
... The unidimensional Peabody Picture Vocabulary Test (PPVT Dunn, 1959) originally reported an IQ as its total test score a score that was once used for high-stakes placement and eligibility testing.
From page 118...
... Comprehensive intelligence tests assess multiple facets of the construct, and they more thoroughly sample the domain of intelligence. The instruments listed in Table 3-1 represent a current compendium of comprehensive measures of intelligence for infants, children, adolescents, and adults.
From page 119...
... The goal of intelligence test norms is to accurately represent the U.S. population because the goal of assessment is to identify the degree to which an individual deviates from normative expectations.
From page 120...
... Although large-scale group tests may involve 10,000 to 20,000 students per grade or age level, samples for individually administered intelligence tests generally are considerably smaller. Carefully drawn samples of 150 to 200 participants per grade or age level are typically considered appropriate and are frequently employed with individually administered tests.
From page 121...
... Intelligence tests used for the diagnosis of mental retardation should include carefully selected samples that fully represent these important demographic characteristics to the degree that they are found in the general populatlon. Many intelligence tests also appropriately include individuals with handicapping conditions and educational exceptionalities in their normative samples.
From page 122...
... Weighting is not necessary with most carefully normed intelligence tests; however, sometimes weighting is done to "correct" errant samples when the stated goals of the sampling plan have not been adequately met. When specific demographic strata have been undersampled, score weighting is sometimes used to statistically correct this methodological slight.
From page 123...
... The standardization sample should be current. Research suggests that intelligence in the entire population increases at a rate of approximately 3 IQ points per decade, which approximates the standard error of measurement for most comprehensive intelligence tests.
From page 124...
... Similarly, new instruments just entering the field typically produce total test scores that are significantly lower than the scores obtained on the traditional "old standards" used in convergent validity studies leading to criterion contamination as a major threat to the validation of the newer instrument. For these reasons, professional
From page 125...
... and long-term memory continue to improve into advanced years, but fluid abilities like novel problem solving and clerical speed generally decline fairly rapidly after peaking in adolescence (Horn, 19851. Therefore, during the infant and toddler years, when cognitive growth and development are most rapid and consequently least stable, total test scores should be obtained at the time they are to be used in
From page 126...
... After age 50, total test scores might be considered reasonably valid for three years, but separate intellectual abilities, like Gf-Gc, might become important considerations. This lack of stability in elderly individuals' specific cognitive abilities is typically due to debilitating factors associated with aging, and, although their IQs may change over the years, their diagnostic status is unlikely to change.
From page 127...
... Although not pertinent to the diagnosis of mental retardation, intelligence tests should also have ceilings that are sufficiently high to differentiate the extreme upper 3 percent from the lower 97 percent. Evidence of Test Score Validity The validity of a test is characterized by the extent to which it exclusively measures its targeted constructs (construct validity)
From page 128...
... Content validity can be described as the degree to which a test adequately samples the domains of interest. Content validity varies with the purpose of the test and the nature of the inferences that may be drawn from test scores (Messick, 19931.
From page 129...
... suggest that exploratory factor analyses for clinical assessment instruments should routinely report principal component analysis or common factor analysis, initial communality estimates (or squared correlations of observed variables with the factors) , the method of factor extraction, the criteria for retaining factors, the eigenvalues and the percentage of variance accounted for by the unrotated factors, the rotation method and rationale, all rotated factor loadings, factor intercorrelations, and the variance explained by the factors after rotation.
From page 130...
... . An intelligence test that is proposed for use in the process of diagnosing mental retardation should demonstrate convergent validity with other extant intelligence tests before the instrument is accepted for this purpose.
From page 131...
... Examiners who wish to use tests for purposes not stated or supported in the examiner's manual, such as using a language instrument for discerning levels of cognitive functioning, must demonstrate the validity of the new application prior to its application. Test Score Reliability The reliability of test scores refers to the reproducibility (precision, consistency, and repeatability)
From page 132...
... original standards, Bracken (1987,1998; Wasseman & Bracken, 2002) recommended that total test or total scale internal consistency of high-stakes test applications, such as for clinical diagnosis or eligibility decision making, should equal or exceed .90 when averaged across the age levels.
From page 133...
... Test scores must be reasonably stable to have practical utility when diagnosing known stable conditions such as mental retardation and to be predictive of future performance. Stability is typically estimated through use of test-retest stability (correlation)
From page 134...
... Inuividuals who speak English as a second language also may be disadvantaged by trauitiona11anguage-1oaueu intelligence tests, even on performancebaseu measures like the Wechsler Performance Scale that include lengthy and conceptually laden test directions (Bracken & McCallum, 1998; Duran, 1989; Geisinger, 1992; Oakland & Parmelee, 19851. In audition, measures of crystallized ability and knowledge are inextricably linked to culture (Carroll, 1997)
From page 135...
... refers to a family of statistical procedures used to identify whether test items display different statistical properties in different group settings after controlling for differences in the abilities of the comparison groups (Angoff, 19931. The concept of DIE has been extended by Shealy and Stout (1993)
From page 136...
... 251. The demonstration of comparable reliabilities across samples that differ on the basis of gender, race, or ethnicity has been studied in some current-generation intelligence tests with positive outcomes (Bracken & McCallum, 1998; Matazow et al.
From page 137...
... Research also suggests that intelligence in the entire population increases at a rate of approximately 3 IQ points per decade, which approximates the standard error of measurement for most comprehensive intelligence tests. Thus, tests with norms older than 10 to 12 years will tend to produce inflated scores and could result in the denial of services to significant numbers of individuals who would have been eligible for them, if more recent norms had been used.
From page 138...
... Composite scores from intelligence tests should be used routinely in mental retardation diagnosis, except when the validity of a composite IQ above 70 is in doubt, in which case an appropriate part score
From page 139...
... Many intelligence tests assess several facets of intelligence, but not all facets are equally important or predict life events equally well. Those intellectual facets that are heavily g-saturated provide the best sources for replacing the composite IQ score when its validity is questionable.
From page 140...
... Therefore, a score of 70 or below on either of these part scores from any standardized, individually administered intelligence test that reports such scores should be deemed sufficient to meet the listings for low general intellectual functioning regardless of the level of the composite score, providing that the part scores have adequate psychometric properties (e.g., high reliability, low standard error of measurement)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.