Read "Behavioral Measures of Neurotoxicity" at NAP.edu

« Previous: Human Neurobehavioral Toxicology Testing

Page 86 Cite

Suggested Citation:"Neurobehavioral Tests: Problems, Potential, and Prospects." National Research Council. 1990. Behavioral Measures of Neurotoxicity. Washington, DC: The National Academies Press. doi: 10.17226/1352.

Page 87 Cite

Page 88 Cite

Page 89 Cite

Page 90 Cite

Page 91 Cite

Page 92 Cite

Page 93 Cite

Page 94 Cite

Page 95 Cite

Page 96 Cite

Page 97 Cite

Page 98 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Neurobehavioral Tests: Problems, Potential, and Prospects J. Graham Beaumont There seems to be general agreement that any monitoring of the effects of environmental and occupational exposure to neurotoxins should include behavioral measures. An important element in the effects of known toxins is the response of the nervous system, includ- ing peripheral sensory and motor components and higher central ef- fects upon the function of the forebrain. This response has clear behavioral aspects following gross acute exposure and significant chronic exposure to a range of neurotoxins. There are considered to be more subtle behavioral effects of less severe acute exposure or of sustained exposure to lower levels of the relevant substances. The assessment of behavioral effects is considered to be the pri- mary approach to the systematic monitoring of neurotoxic exposure, and where mass screening is considered for large populations at risk, it may be the only practicable approach, at least for initial selection. It is obvious that automated screening by the use of computer-based assessment could contribute significantly to the development of appropriate techniques. The essential context for the adoption of acceptable assessment techniques is that the potential behavioral changes should have been identified and reliable measures of these changes should be avail- able, which have been demonstrated to be valid, and for which ap- propriate normative data are available. - It may also be desirable that the test be stable under conditions of repeated testing. Particularly when relatively subtle changes, with a 86

NEUROBEHAVIORAL TESTS 87 low base-rate in the population (as may be typical of mass screening), are to be detected, it is essential that the validity (and therefore the reliability) be exceptionally high. None of this is in conflict with the preceding chapters, indeed there is remarkable agreement as to the current state of the field, the methodological principles that apply, and the standards that should be adopted. Areas in which there is some potential disagreement are as to whether the currently available tests are sufficient for their purposes, and whether the introduction of new test instruments is to be encouraged. This chapter therefore concentrates principally upon those issues. TESTS CURRENTLY IN USE The last three chapters have covered the history and description of current tests, with particularly helpful tabulations by Hanninen and Anger, and it would be redundant to repeat much of this material. It is worth, however, drawing attention to the version of the World Health Organization's Neurobehavioral Core Test Battery (WHO-NCTB) in a computer-based form developed by the Institute of Occupational Health at the University of Milan. The battery is much as in its original form except that the Santa Ana Rotation Test of the original battery has, for pragmatic reasons, been replaced by a test which assesses rather different cognitive functions, and the modality of the Digit Span task has been changed in a way that is known to alter the cognitive functions involved (Beaumont, 1985~. A preliminary study of the psychometric characteristics of this implementation of the NCTB has been reported (Camerino, 1987~. This indicates that there are some serious questions concerning the validity of these tests in terms of their suitability for the assessment purposes under consideration. A group of 30 volunteers young, relatively well-educated adults- were retested at weekly intervals on certain of the tests [excluding Benton Visual Retention Test (VRT) and Aiming Pursuit], and estimates of the reliability and validity of the measures were made from the results. It is a little unclear what reliability should be expected from an instrument that assesses mood "over the past week," given at weekly intervals: the range of values of r from -0.24 to +0.87 on the various individual scales is probably not remarkable. The reliabilities on the cognitive tasks are more acceptable, being in the range 0.62 to 0.89, if reaction time (RT) variability and Digit Learning are excluded. The reliabilities of the Digit Learning test at 0.40 and 0.19 (for occasions 1-2, 2-3, respectively) are clearly quite inadequate and suggest that the test should be abandoned as part of this assessment.

88 J. GRAHAM BEAUMONT Correlations were also calculated with paper-and-pencil versions, as a crude measure of construct validity. Correlations were modest ranging from 0.55 (Serial Digit) to 0.79 (Benson VRT). These values are not atypical of values that might be expected on psychometric tests of this type. However, these results do raise certain doubts about the psychometric suitability of these tests to the purposes for which they are being employed. For purposes of debate, assume that the validity of the measures is on average about 0.75. This is probably rather generous: reliability limits the upper extent of validity, and reliabilities are in some cases below this level. In addition, the sample employed was likely to provide relatively high levels of reliability and validity. At this level of validity, if we are trying to identify pathological effects which are present in 50 percent of those tested, the best that the test can theoretically achieve is 77 percent correct classification of the test subjects. In practice, a much more unfavorable base-rate of the condition is likely to apply in the test population. If the incidence to be detected falls to 1 in 10, the theoretical maximum achievement of the test will be 90 percent overall correct classification, but of those affected only 50 percent will be correctly identified. Of those achieving "positive" results on the test, half will be misclassified because they are false positives. As the base-rate or the validity falls, these statistics become even more unacceptable. It should be clear that in psychometric terms, these tests as implemented in the Milan study are insufficiently powerful to allow any valid assessment of the neurobehavioral functions under study. This must be a serious concern because (quite reasonably) the NCTB has been adopted in a number of centers around the world. Swedish studies conducted at the National Board of Safety and Health (Iregren, 1986) have used these tasks among a battery of others administered in both traditional and computer-based formats, as well as some other automated modes. The computer-based tests include Memory Reproduction (letter and digit sequences, rather like Digit Span), Simple and Choice Reaction Time, and Color Word Vigilance, and others are under development. Studies conducted with the full range of tests have demonstrated some significant interesting findings between criterion groups selected for contrast on relevant variables, mostly relating to solvent exposure. Some reliability data are reported by Iregren in this volume. The methodological rigor of this approach is to be ap- plauded, and the data show some reliabilities for certain of the assessments significantly higher than the Milan data for their battery. Neverthe- less, with assessments of higher cognitive functions and of affect, the psychometric adequacy of the instruments remains a problem.

NEUROBEHAVIORAL TESTS 89 A battery that shares some provenance with the WHO-NCTB, al- though it strictly just predates it, is Baker and Letz's Neurobehavioral Evaluation System (NES). The NES has been adopted by a major study being carried out by the Institute of Occupational Health in Birmingham, United Kingdom (Spurgeon and Harrington, 1987~. This study will use the Clinical Interview Schedule together with the Hogstedt Symptom Questionnaire, Stress and Arousal Checklist, Cognitive Failures Questionnaire, and Prospective Memory Test, in addition to the NES tests. At present only preliminary pilot data are available. Of course there have been a large number of other studies published in the literature which have employed a wide variety of tests. A survey of the literature of the effects of lead on intelligence reveals the WISC-R to be the most popular test in a traditional format to have been employed in this research (Yule and Rutter, 1985~. A great variety of more specific tests of individual functions have also been employed (Anger, 1985~. Further contributions concerning the use of computer-based as- sessment in this domain are to be found in Braconnier (1985~. A useful collection of papers concerned more generally with the issues raised by computer-based assessment appeared in Applied Psychology (e.g., see Huba, 1987~. The preceding chapters seem to be in agreement that (1) there are problems evident in the construction of various batteries, (2) most of the tests currently in use are relatively inadequate, and (3) there is poverty in the current psychological descriptions of neurotoxic syndromes. SOME SPECIFIC POINTS here. Some specific points made in the preceding chapters are highlighted Methods in Behavioral Toxicology (Hanninen) The problems concerning the definition and description of the neurotoxic deficit are well taken: this is clearly of crucial significance for any advance in the field and emphasizes the need for more fundamental research into the cognitive processes affected. The suggestion that the in-depth study of individual patients might be profitable is also a valuable one. There are now many good single- case experimental designs that might be appropriately deployed in this area, and they should be considered in order to further clarify the description of the relative deficits. The -dilemma that Hanninen discusses between the "conservative"

go J. GRAHAM BEAUMONT and "progressive" approaches is a real one and, to some extent, is fundamental to much of the discussion that follows. It is of central importance to decide whether to make the best of the rather poor tests that are currently in use, or whether to adopt a more radical reevaluation of current tests and the potential new instruments that might be created. Current Status of Test Development (Williamson) Williamson sensibly highlights the potential for tests that relate explicitly to psychological theory (although the distinction between those that relate to "cognitive structure" and those that are "theory- based" may not be so easy to sustain). If it becomes possible to elaborate our understanding of the psychological processes (and, perhaps, as a contribution to that understanding), there is obvious merit in the use of such tests. The "potential barrier" of computer-based testing must be taken seriously. There is clearly no value in developing computer-based tests if they confer few advantages, introduce extraneous sources of error, and hinder the wide application of tests. There may be benefits from the application of computers that outweigh these disadvantages- at least in parts of the world where they can practicably be used but it is important to be clear about the advantages in any given case. The need for more basic research is again emphasized, and the proposal that the adaptive nature of some of the changes which take place be considered may be a particularly useful insight. Human Neurobehavioral Tests (Anger) Anger's useful and authoritative view is clear and correct about the potential contribution that the test batteries may make in this field. It is necessary, however, to ensure that this potential is realized in practice. It is certainly possible that the relevant changes could be detected. It is much less certain that current batteries are capable of detecting the changes (and some reason to believe that they are not). The case also has to be argued more clearly for the value of cross- cultural data collection. It is naturally important, indeed essential, that appropriate local norms be available. However, given that there are inevitably differences among cultures in education, cognitive processes, cultural experience, exposure to testing and test materials, and even (some believe) in intelligence, test performance will differ in different cultures and subcultural contexts. In this situation, differences underlying test performance and exposure to toxins will undoubtedly be confounded.

NEUROBEHAVIORAL TESTS 91 The results will be difficult, perhaps impossible, to interpret, and little will have been gained by international comparisons. The idea of a worldwide pool of test results may be superficially attractive, yet not based in the psychometric realities of the situation. THE ADEQUACY OF CURRENT TESTS The problems inherent in current assessment batteries appear to be twofold. First, the tests employed have been selected on the basis of their previous use in experimental studies of the effects of exposure to neurotoxins. It is natural that, when a test has been shown to distinguish between a criterion group of exposed individuals and a control group, this test should be considered suitable for inclusion in an assessment battery. This is, however, not necessarily the case. Only if the test can be shown to have sufficient psychometric power for the role of general screening can it be considered useful in this way. It is important throughout to maintain a careful distinction between tests that are useful for group experiments and those that may be used for individual screening. Second, there is a temptation to select tests that are generally con- sidered to be capable of indicating central nervous system (CNS) dysfunction. Here the temptation has been to take tests that are believed capable of revealing the effects of dementia, cerebral disease, or gross trauma, and to adopt them for detection of the effects of neurotoxins. This procedure is open to two misconceptions: that the effects of neurotoxins will be the same (in cognitive terms) as the effects of dementia, cerebral disease, or trauma, and that there are tests capable of simply discriminating among these other disorders. There seems to be little basis for accepting either of these proposals. It is unlikely that CNS poisoning is similar in its effects to other cerebral pathology, any more than the sunilarity between, say, dementia and trauma. The history of neuropsychology is littered with failed attempts to identify, by means of a single measure or small group of measures, general cerebral pathology. In particular, if the effects are relatively diffuse, the problem is especially difficult. An example is the difficulty of distinguishing, by cognitive measures alone, dementia of the Alzheimer type in the elderly at least in its early stages- from either functional psychiatric illness or acute systemic illnesses. Much the same problem must apply to the effects of neurotoxins. It is therefore not surprising that the battery of tests now generally employed is not of strong validity and is probably inadequate for the general detection of the behavioral effects of neurotoxins. There is

92 I. GRAHAM BEAUMONT simply insufficient power in the basic psychological instruments be- ing employed. The critical problem is the psychometric power of the tests, and the critical question is, Is the WHO-NCTB (including related batteries such as the NES) adequate to the task? It is important at this point to be clear as to what the task is either to conduct group experiments or to undertake individual screening. If the task is to investigate the differences between criterion groups, then the NCTB may be adequate to the task. Its psychometric power is still weak, and there might well be better tools available. It is probably, as a psychometric instrument, best described as "premature." Nevertheless, the fact that it is available, and already quite widely adopted, is of some importance, and it is clearly capable of discrimi- nating between carefully selected groups under favorable conditions. Its use is certainly justified in this context, although efforts should be made to dramatically increase the size of the standardization samples available and to improve the basic reliability of the tests. In the context of such studies using the NCTB, it might be that computers are an impediment and that administration in the standard form is to be preferred. However, if the aim is to carry out screening for exposed and affected individuals, the NCTB is likely to be quite inadequate on psychometric grounds. As discussed above, the available data suggest that the battery is not reliable enough to permit sufficiently accurate classification of affected and nonaffected individuals. This implies that if screening is a goal of the research (or if significant improvements are to be made in the sensitivity of the tests for detecting differences between criterion groups), then the whole basis of the assessments currently employed needs to be reexamined. Better fun- damental research is needed to generate a psychological description of the deficits and better models of the effects which can be related to that description. In achieving this it may well be advantageous to make better use of new developments in psychometrics and in the explicit models of cognitive performance. It is at this point that computers might well be introduced. One way in which this might be done is described below. v SOME NEW DEVELOPMENTS IN COMPUTER-BASED ASSESSMENT It seems worth inquiring whether there are alternative approaches that could potentially provide more satisfactory solutions to the assessment of cognitive performance. There seem to be at least two potentially

NEUROBEHAVIORAL TESTS 93 fruitful avenues of exploration. One is rather better charted: the use of adaptive testing systems, although it is not considered further here. The- other is through the explicit incorporation of cognitive models into intelligent assessment systems. Such systems would not radi- cally overthrow the traditional psychometric approaches, but would complement and extend such approaches so that the advantages of both could contribute to the power inherent in the assessment procedure. If an intelligent and powerful assessment system is to be devel- oped, it must incorporate appropriate psychometric models of the reference domain as well as a psychological (cognitive) model of that domain. The solution may well come from a progressive integration of psychometric theory, together with selection of those methods with greatest utility on the basis of empirical study. There is, after all, no reason why more than one psychometric model should not be operated concurrently, and the respective processes cross-referenced, as long as the assumptions of each are properly respected. Cognitive Componential Models Functional models, increasingly explicit in the cognitive domain, might allow an assessment system to possess an internal representa- tion of the function that is under examination. One of the fruits of the growth of cognitive information-processing approaches into the dominant zeitgeist of contemporary psychology has been the production of explicit functional models. Some of these models are now presented in a sufficiently well-articulated form to make them useful in the description of functional status. Such descriptions can, in turn, be used in the identification of dysfunctional elements in performance and in the design and monitoring of instructional and remedial schemes. Perhaps the most well known of these models relate to reading ability. Here the interaction between the developmental study of normal reading ability and neuropsychological investigation of the dysfunc- tions to be observed in brain-injured patients has stimulated the pro- duction of general models of reading competency. Over the past few years the analyses of developmental dyslexia and of adult acquired dyslexia have converged into a common view of the processes that may be defective in reading failure. The point about this and similar models is that each component is capable of identification by manipulations in an explicit experimental paradigm. The evidence is derived from studies on normal subjects by which the processing components can be inferred and from study of clinical patients in whom the failure of one component of the sys- tem can be identified.

94 J. GRAHAM BEAUMONT A number of models in a variety of domains (spelling, arithmetic functions, algebra, reasoning, number-series identification, map interpretation) illustrate how human abilities can be analyzed in terms of componential subprocesses. The relationships among the subprocesses are described in the model. The components in each model, both functional elements and channels of information transfer, can be as- sessed by experimental paradigms that are amenable to automated im- plementation. A system which incorporated an explicit model about the function under investigation should be capable of intelligently describing the nature and level of that function in the psychological domain within which the model has been created. Inferential Systems It remains to be shown how explicit cognitive functional models, in association with adaptive testing systems technology, might be incorporated into a practical and intelligent assessment system. The way in which this might be achieved is through the use of an intelli- gent knowledge-based systems approach. The differences between traditional psychometrics and "expert systems" are not as fundamental as might be supposed. Although expert sys- tems as commonly expressed within a rule-based programming envi- ronment appear very different from a psychometric test instrument, they have several fundamental constructs in common. The parallels become more clear if the elements of each procedure are considered. Me objectives of the expert system are the test items of the conventional test; the values, the responses; the questions and user interface are equivalent to the administration procedures; the rules are represented in the scoring norms; the inference engine is matched by the psychometric model being employecl. The goal that the expert system is set is, of course, the test result of the conventional test instrument. It is possible to establish the validity of these parallels. The author has a demonstration system, created under a popular expert system "shell," that administers the Mill Hill Vocabulary Test in a form indistin-guishable-from a number of computer-based implementations of that test which have been realized by procedural programming systems that simply simulate the conventional administration of the test. It may well not be the most efficient way to achieve this result, and the use of the expert system shell may be to some degree artifi- cial, but it nonetheless provides;evidence-for the parallels that are being proposed between these kinds of systems. Given these parallels, it Is a short step to suggest that a cognitive componential model might be explicitly incorporated within a knowledge-

NEUROBEHAVIORAL TESTS 95 based system to permit intelligent assessment of the cognitive func- tion modeled. This would simply require that the mode} be sufficiently well articulated to be expressed in terms of the contents of a rule base. A variety of procedures will, of course, also be defined which permit data to be established pertinent to the rule-based inferences that are to be made. These procedures may be prior values held within the system, they may be the responses to questions put to the test subject or to the test examiner, or they may be the results of ancillary procedures (including independent subprocedures defined within a procedural programming environment). The procedures may operate at the level of individual test "items" or may refer to a higher level of "subtest" investigation. These subprocedural levels may reflect the structures that have already been developed within the adaptive testing context. The statistical procedures that have been derived for use within adaptive testing systems may also operate at this level of the organization of the system. Traditional psychometric (statistical) techniques may be applied at this level, within the lower level subprocedures, or at the level of the implementation of the rule base. The statistical procedures may operate within the defi~ution of the rules derived from the cognitive model, or else be applied in parallel with the cognitive model, so that estimates derived from each inferential process may be compared and combined in generating the overall test outcome (see Huba, 1987~. This is, after all, no more than a formalization of what an expert human examiner does in performing an assessment. Elements of the assessment procedure are composed into the battery of tests to be applied, according to some model (often implicit) that the examiner maintains of the functions to be assessed. The individual tests are then administered, often with some degree of selection and modification of the battery, depending upon earlier test results. Statistical estimates derived from the test are obtained and interpreted in line with hypotheses generated from the functional model that the test examiner holds. A psychological description (the "report") is generated which is relevant to the assessment question being investigated. The potential advantages of the kind of scheme envisaged above are that the internal cognitive model is explicit and can be more rigorously applied (and improved); the investigation of data relevant to the inferences being tested is systematic and should therefore be more efficient; and intelligence, in the form of the inferencing procedures, is automatically and consistently applied to the problem. In addi- tion, the behavioral description generated- from the system is inevita- bly formulated in terms of the cognitive mode] being maintained: it is a psychological and not a statistical description. It must therefore

96 I. GRAHAM BEAUMONT be relevant to the application for which the test is being employed and be more useful in response to questions about diagnosis, man- agement, treatment, selection, or adjustment. IMPLICATIONS FOR NEUROBEHAVIORAL ASSESSMENT The adoption of techniques such as these implies that a number of conditions should be met before the development of assessment pro- cedures in this area can advance. The first is a better understanding of the psychological functions affected by neurotoxins. A vague formulation in terms of effects upon psychomotor performance, slowing of response, impaired eye- hand coordination, diminished concentration, recent memory, and affective state, is insufficient. A better model is needed of the physiological vectors that are generating these effects, as well as a better-elaborated description of the effects in behavioral terms. Second, these functions should be summarized in the form of a psychological description of the general dysfunctional state that follows from exposure to neurotoxins. This needs to be sufficiently detailed to allow a clear account of the psychological processes implicated in these functions to be deduced. Third, these processes should be formulated in terms of a cognitive componential mode! of the relevant functions, in a sufficiently coherent form to allow decomposition of the observed performance and analysis of the functional status of the subject. Fourth, this should be translated into an assessment system in terms of the individual component elements of performance. These should be assessed separately by testing routines (either criterion- or norm- referenced) that will allow an intelligent computer-based diagnostic analysis of performance. This analysis will probably be dismissed by those whose immediate concern is for an instrument that can be used now to address the very real problems of assessing current levels of neurotoxic exposure. However, if a valid assessment is needed, one is forced to conclude that no adequate instrument is currently available. Radical improvements must be made in our understanding of the target behavioral effects, which must be based on more extensive fundamental research. These can then be translated into effective assessment instruments. Such an approach has already been shown to yield dividends in other neuropsychological areas [particularly in assessing reading disorders (Seymour, 1987) and errors in arithmetic processing], and could well

AIL ASH 97 be profitably applied to the neurobehavioral testing of exposure to toxins. ^ gnat and unrelated thought No one seems to have taken serL ously the possibility of the individual baseline testing of workers potentiaUy open to exposure Such an approach could completely transform what could be achieved by psychological assessment. Many of our psychometric difOculhes would be Mated at a stroke data on each worker before exposure were avaHable. Compare me advances made in neuropsychology during World War H. largely because of me avail of ps~lo~cal data coed Won Educed. ~ a battery were administered on recruitment (and perhaps every 5 or 10 years subsequently), it would be possible to establish me emus Won co--e ~cho^g ~ an individual worker Aim repave ease and to dramadcaDy improve our understanding of the relevant processes in general. Even if legislation to introduce this is unattain- abl~ the ~boduchon of such a system by ~ namer of major employers Could at least make a ~orth~hUe contribution. The suggestion ~ no doubt naive, but of such immense potential value that it deserves to be discussed. REFERENCES Wager, W. K. 1985. Neurobehavioral tests used ~ NIOSH-supported workshe studies, 1973-1983. Neurobehav10ral Toxicology and Teratology 7:359-368. Beaumont, J. G. 1985. The enact of microcomputer presentation and response medium on digit span performance. International Journal of ~an-~achine Studies 22:11- 18. Branco~ier, R. J. 1985. Demand ~ human populations exposed to neurotoxic agents portable microcomputerized screening device. Neurobehavioral Toxicology and Teratology 7:379-386. C_, D. 1987. Pay ~~ ~ ~ Meads ~ ~ Autism_ Fog of WHO-NCTB. Insulate of ~cupab~1 Heals, Owner of ~Han. Huba, C. J. 1987. ~ probabilistic computer-based test Retails and Aver expert systems. Applied Psychology 36:35~3~. Iregren, A. 1986. Effects of Industrial Solvent Interachons: Studies of Behavioral Enacts ~ ~an. Athlete och Halsa, (ISSN 0346-~21) Solna, Sweden. Seymour, P. H. K. 1987. Individual connive analyst of competent and impaired reading. Banish Journal of Psychology 78:48~06. Spurgeon, A~ and ad. Harr~gton. 1987. The Neuropsychological EHects of Long- Term Exposure to Organic Solvents Institute of Occupational Heal+, University of Birm~am, U.~. Yules We and at. Ruttier. 1985. E~c~ of lead on chOdren's behavior and connive performance: ~ review. In Dietary and Environmental Lead: Human Realm Enacts, K. R. ahoy, ed Amsterdam: Elsevier.

Next: Part II. Assessment of Animal Models: What Has Worked and What Is Needed »

Behavioral Measures of Neurotoxicity (1990)

Chapter: Neurobehavioral Tests: Problems, Potential, and Prospects

Welcome to OpenBook!

Get Email Updates