Read "Behavioral Measures of Neurotoxicity" at NAP.edu

« Previous: Environmental Modulation of Neurobehavioral Toxicity

Page 359 Cite

Suggested Citation:"Computerized Performance Testing in Neurotoxicology: Why, What, How, and Whereto?." National Research Council. 1990. Behavioral Measures of Neurotoxicity. Washington, DC: The National Academies Press. doi: 10.17226/1352.

Page 360 Cite

Page 361 Cite

Page 362 Cite

Page 363 Cite

Page 364 Cite

Page 365 Cite

Page 366 Cite

Page 367 Cite

Page 368 Cite

Page 369 Cite

Page 370 Cite

Page 371 Cite

Page 372 Cite

Page 373 Cite

Page 374 Cite

Page 375 Cite

Page 376 Cite

Page 377 Cite

Page 378 Cite

Page 379 Cite

Page 380 Cite

Page 381 Cite

Page 382 Cite

Page 383 Cite

Page 384 Cite

Page 385 Cite

Page 386 Cite

Page 387 Cite

Page 388 Cite

Page 389 Cite

Page 390 Cite

Page 391 Cite

Page 392 Cite

Page 393 Cite

Page 394 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Computerizecl Performance Testing in Neurotoxicology: Why' What, How/ and Whereto? Francesco Gamberale, Anders Iregren, and Anders KjelIberg WHY MEASURE PERFORMANCE? Behavioral performance tests have been developed primarily to assess the psychophysiological efficiency of the individual. Most fre- quently, the tests have been used for diagnostic purposes in clinical contexts or for the purpose of personnel selection. During the last 15 to 20 years, however, performance tests have been applied with increasing frequency to assess functional changes in the central nervous system (CNS) induced by exposure to unfavorable work environmental con- ditions. Ever since the early 1970s, when the use of psychometric tech- niques made it possible to link together deterioration in human per- formance and the inhalation of solvent vapor (Astrand and Gamberale, 1978), psychometric tests have been widely and successfully used in many countries in the study of solvent toxicity (Anshelm Olson, 1985; Gamberale, 1985; Iregren, 1986b) as well as in the study of the toxic- ity of numerous other chemical compounds including anesthetic gases (Biersner, 1972), agricultural chemicals (Rodnitzky et al., 1975), and metals (Roels et al., 1987~. The growing interest in the measurement of performance is most probably due to the sensitivity shown by these methods in unveiling changes in the human organism that otherwise would not be de- tected. By now, the evidence that these changes are some of the earliest indicators of the occurrence of health effects has become un- 359

360 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG equivocal. As a consequence, the measurement of performance has come to be regarded by many as a device of major importance for monitoring hazards to health and safety in the work environment. This development appears to agree with the ideas promulgated by the World Health Organization (WHO) that health does not mean only absence of disease but also optimum physical, mental, and social well-being and, moreover, that health means not only freedom from pain and disease but also freedom to maintain and develop one's functional capabilities. At our institute, the measurement of performance has undoubt- edly constituted the main method of studying the effects on the CNS of low dose exposure to the chemical substances that are frequently found in the work environment. Thus, we have applied performance tests of various kinds in experimental laboratory studies as well as in field studies and in cross-sectional epidemiological investigations. Through the years, we have made use of performance tests in experi- mental inhalation studies of industrial solvents such as toluene (Gamberale and Hultengren, 1972; Iregren, 1986a), methylchloroform (Gamberale and Hultengren, 1973), styrene (Gamberale and Hultengren 1974), white spirit (Gamberale et al., 1975a), methylene chloride (Gamberale et al., 1975b), trichloroethylene (Gamberale et al., 1976), xylene and ethylbenzene (Gamberale et al., 1978), methyl isobutyl ketone (Wigaeus- Hjelm et al., 1990), and toluene in combination with p-xylene (Anshelm Olson et al., 1985) or with ethanol (Iregren et al., 1986~. Psychometric tests have been applied directly at the worksite in two investigations of workers in the plastic boat industry exposed to styrene (Gamberale et al., 1976b; Kjellberg et al., 1979), in studies of steelworkers (Anshelm Olson et al., 1981) and of workers in the paint industry (Anshelm Olson, 1982) exposed to solvent mixtures, and in two investigations of nurse anesthetists exposed to anesthetic gases (Gamberale and Svensson, 1974; Kjellberg and Strandberg, 1979~. Furthermore, we have used comprehensive batteries of behavioral tests to investigate possible long-term effects of chronic exposure to organic solvents among car and industrial spray painters (Elofsson et al., 1980), workers in a jet motor factory (Knave et al., 1978), and rotogravure printers (Iregren, 1982~. Behavioral tests are also used to an ever-increasing extent to study the effects of work environmental conditions other than exposure to neurotoxic substances. Thus, unfavorable effects on performance of environmental factors such as noise, vibration, cold, heat, electric and magnetic fields, and physical work load have been demonstrated in laboratory experiments as well as in field studies. Our experience with the use of psychometric tests to study the

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 361 effects of nonchemical agents in the physical work environment is relatively moclest. However, behavioral tests have been successfully applied at our laboratory in experiments on the effect on performance of exposure to noise (KjelIberg and Wide, 1988) and to moderate cold (Enander, 1987~. A series of experimental studies on the effects of different climatic conditions on performance is now in progress (Gamberale et al., 1988b). Finally, we have used psychometric tests to investigate the possible effects on workers of acute (Gamberale et al., 1988a) and chronic exposure (Knave et al., 1979) to electric and magnetic fields. HOW TO MEASURE PERFORMANCE? A working group within the WHO has recommended (WHO, 1987) a battery of tests to use in the search for neurotoxic effects in work- ing populations. The main criterion applied by the WHO in selecting the tests to be included in the battery was that the tests should have proven their sensitivity in empirical investigations. To facilitate a widespread application of the methods, no tests that required complicated technical equipment for their administration were included in the battery. A further requirement was that the tests should be selected among those commonly used by the clinical practitioner for diagnostic purposes. In practice, these requirements limited the choice to the tests in the Wechsler Adult Intelligence Scale (WAIS battery). Thus, no attention was paid to tests developed espe- cially for use in laboratory experiments and quasi-experimental field studies of the effects of exposure to neurotoxic substances. In our opinion the tribute paid to the clinical practitioner by selecting among traditional manual or paper-and-pencil tests; has had a negative effect on the sensitivity of the WHO battery to detect neurotoxic effects. Against this background it is difficult to understand why some groups working with the development of computerized tests fee! the need to refer to this WHO list of tests as a rationale for their test implementations (Cassito, 1985; Letz and Baker, 1986~. It is obvious that such a strategy leads to inadequate utilization of the possibilities offered by computerized testing. Another problem should be considered when implementing exist- ing traditional tests on computers, namely, that the correlations be- tween the results obtained with the two versions of the tests (i.e., the computerized and the paper-ancl-penci] tests) often are quite low. This low correspondence may be due to several inevitable differences between the resulting test protocols with regard to stimulus presen- tation as well as response input. To mention one example, Beaumont

362 F. GAMBERALE, A. IREGREN, AND A. KJELLBERG (1985) investigated the effects of various response modes on the re- sults in a computerized Digit Span test. He found substantial differ- ences between responses entered via the ordinary keyboard, an external keypad, and a touch sensitive screen. Several of the testing systems applied within the area of neurotoxicology use fairly simple psychomotor tasks, and reaction time or response latencies are generally used as the outcome variables. It has been argued, however, that performance on more complex cognitive tasks should be more sensitive to disruption by exposure to toxicants. Still, in the experience at our laboratory, a test of Simple Reaction Time (SRT) has proved to be generally the most sensitive test. The greater sensitivity demonstrated by the tests of relatively simple mental functions does not necessarily imply that these tests tap the CNS functions most vulnerable to neurotoxic substances. Instead this greater sensitivity may be due primarily to the higher reliability of these tests compared to tests measuring complex cognitive func- tions. A substantial contribution to the reliability of the tests of simple mental function stems from the fact that performance parameters in these tests are usually based on a large number of items. These circumstances concerning the sensitivity and reliability of different types of tests should be taken into consideration especially when analyzing results in terms of possible differential deficits (e.g., Chapman and Chapman, 1978~. Several groups have recently developed computer-based perfor- mance evaluation systems for use in neurotoxicology (e.g., Baker et al., 1985; Braconnier, 1985; Cassito, 1985; Eckerman et al., 1985; Iregren et al., 1985; Laursen and Jorgensen, 1985), and some laboratories use similar systems in related fields (e.g., Bittner et al., 1985; Irons and Rose, 1985~. Furthermore, there are of course several computer-based systems that have been applied in clinical use, two examples of which are those of Acker (1983) anti Beaumont and French (1987~. WHAT ELSE TO MEASURE? The methods of value in early detection of neurotoxic effects or of the effects on the CNS of exposure to other unfavorable physical environmental agents include, besides performance tests, neurophys- iological and neurological testing, as well as questionnaires for the assessment of subjective experience. In most investigations, a systematic collection of the subjects' experience when exposed to different experimental or occupational conditions may constitute an invalu- able source of information. Most questionnaires for this type of as- sessment can easily be computerized with maintained reliability and validity (see, e.g., Carr et al., 1981; Lucas, 1977~.

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 363 With the Swedish Performance Evaluation System (SPES), it is possible to collect three types of self-report data: (1) symptoms of acute as well as long-term exposure, (2) self-rating of mood, and (3) self-rat- ing of performance. The first two types of data aim at the detection and description of possible environmentally induced changes in the subjects' perceptions of their physical and psychological states. The third type of perceptual data are motivational in character and refer to the subjects' motivation, confidence, and effort expended during testing. PROS AND CONS OF AUTOMATED TESTING Automated testing in general implies some advantages over tradi- tional paper-and-pencil testing, The automation of tests gives · an excellent opportunity for strict standardization of test pro- cedures (e.g., instructions, test protocols, and evaluation variables), thus increasing the possibilities for comparisons across studies; . the ability to perform detailed measurement and analysis of single responses or response components this type of microanalysis has greatly enhanced the sensitivity of performance tests; and . increased precision in the measurement procedure; by reduc- ing the influence of the investigator, the reliability and the validity of the results are increased. Because an automated test may be administered by a technician or a nurse, there is the possibility of reducing the work load of psy- chologists, who therefore can make better use of their skills. It has also been suggested that automation of the testing procedure would render the testing situation less threatening. Fully computerized testing procedures provide some additional possibilities as well: · Computers are flexible, and one system can be used to admin- ister a variety of tests, as well as to perform other routine tasks in the laboratory or clinic. · Computers facilitate prompt scoring and evaluation of even very complex tests and questionnaires. · Computers make it possible to adapt the choice of items ac- cording to the performance capacity of the individual. · Computers offer communication possibilities, making data transference for statistical computations or other purposes very con- venient. · Computers are transportable-and, in the extreme case, even portable, thus making field testing feasible.

364 F. GAMBERALE, A. IREGREN, AND A. KJELLBERG Some of the most frequently mentioned critical comments regard- ing computerized testing are (1) the supposedly poor rapport estab- lished between the subject and the machine; (2) the difficulties in testing large groups; (3) the static form of computerized approaches; (4) the restricted range of stimuli that can be presented; and (5) the restriction in the choice of response media. With regard to the rapport established, investigations pertaining to this problem generally indicate no difficulties (see, e.g., Carr et al., 1981; French and Beaumont, 1987; Lucas, 1977; or Lukin et al., 1985~. The "user friendliness" of a system is not dependent upon whether it is computerized or not, but on the careful design of the system (Heal et al., 1973~. As pointed out by Beaumont (1982), the single most important requirement for successful design is the predictability of the system. Thus, no action on the part of the subject should result in accidental termination of the test. Due to the still relatively high purchase costs of even a microcom- puter system, it is difficult to test large groups of people simultaneously. On the other hand, computerized tests often provide much more in- formation than traditional tests within a specified time period. Thus, the ability to perform simultaneous testing of large groups is not as important with computerized tests. The objection concerning the static form of computerized tests is by now invalid. There have been successful attempts at making "tai- lored" (i.e., adaptive) tests, where the test items administered are contingent upon the performance of the subject (see, e.g., Weiss and Vale, 1987~. One of the memory tests available with our system, a version of the Digit Span test, functions in an adaptive way. Until today, most efforts at the implementation of performance tests on computers have used visual stimuli, because the administra- tion of auditory or tactile stimuli, for example, requires the use of rather complicated (probably custom-made) external equipment in addition to the computer. Such additions would naturally make it very clifficult to standardize the tests to allow for widespread use. The few response media available present a similar problem with respect to the development of tests. Because input to a computer is normally made via the keyboard, most testing systems presently use this medium. Attempts to use other means of input usually also imply manual performance, as for example, with joysticks or touch- sensitive screens. At present, the choice of response mode is prob- ably the most limiting factor in the development of computerized tests because the response medium may easily affect test results in unintended ways. A fast and reliable system for processing speech input would in many instances provide the only acceptable solution.

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 365 The restrictions regarding stimulus presentation and response in- put are, of course, steadily diminishing because continuous technical developments make computers increasingly competent. However, standardized tests using new technical possibilities (e.g., speech in- put) are still several years off because the development of well-stan- dardized tests is a laborious, long-term project. In spite of the restrictions mentioned, efforts at developing com- puterized tests and testing systems are steadily increasing, and this type of assessment is currently in use in a wide variety of settings. Computerization has made complicated psychometric techniques available even to persons lacking training as professional psychologists. Therefore, this trend accentuates ethical demands for control of the construc- tion, distribution, and use of these methods (Matarazzo, 1983~. The American Psychological Association (APA, 1986) and the British Psy- chological Association (Bartram et al., 1987) have published guidelines for this purpose. The experience gained at our laboratory in using computerized tests has generally been positive, although there are of course diffi- culties (Iregren et al., 1985~. One significant problem, which applies equally to traditional tests, relates to the time and effort needed for successful test development. Furthermore, computers are still technically complicated machines, thus requiring special skills of the psychologists and technicians engaged in this development. Several recent reviews have treated various aspects of computer- ized testing, and the reader is recommended those by Bartram and Bayliss (1984), McArthur and Choppin (1984), Space (1981), and Thompson and Wilson (1982~. DEVELOPMENTS IN SWEDEN Since 1970, the Division of Psychophysiology, National Institute of Occupational Health (NIOH), Solna, Sweden, has been concerned with the development of psychometric methods suitable for the study of adverse effects of environmental stressors, primarily neurotoxic sub- stances. The first test to be standardized for use in environmental research was an SRT test (Lisper and Kjellberg, 1972~. At the start, this test was administered with an electronic apparatus, consisting of a timer, a tape recorder, and a stimulus/response panel. The test was then implemented on other types of testing equipment, and it has been used in most of our investigations. Special equipment, de- veloped in 1973, was used solely in laboratory experiments. This apparatus comprised a paper tape controlled solenoid operated stimulus/ response panel and a-teletype printer, and it was used for the first

366 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG time in an investigation of the toxicity of white spirit (Gamberale et al., 1975a). Besides the above mentioned SRT test, tests of Choice Reaction Time (CRT), numerical ability, and memory were imple- mented on this equipment. A further step in the development of testing equipment was taken by the introduction of a new stimulus/response panel in 1975. Stimuli were presented on eight rows of 32-LED displays, each capable of showing any alphanumeric character. Responses were entered on a full QWERTY keyboard. This equipment, which made possible the presentation of more complex stimuli as well as the registration of written responses, was used to test different cognitive functions. It was used in several cross-sectional epidemiological investigations on workers exposed to industrial solvents and electromagnetic fields (Elofsson et al., 1980; Iregren, 1982; Knave et al., 1978, 1979~. The major disad- vantages of this type of equipment were the laborious programming procedure, the fragility of the paper tape, and the time-consuming evaluation of the results. For a review of similar attempts at noncomputerized automated testing, see Denner (1977~. Some of these disadvantages could be overcome when computers became more easily available. However, early computers had other shortcomings with respect to automated testing. They were expen- sive and difficult to program or handle, and the access via time- sharing terminals made exact timing of response latencies impossible. With the advent of the microcomputer, new approaches became possible. For the first time, fully automated testing could be per- formed, with administration of instructions and test items, as well as response registration with precise timing of response latencies and data storing. New demands were made on our performance assessment methods bv the acquisition of an exposure chamber, which required a fully automated procedure in the solvent inhalation studies. The equipment used in these experiments consisted of a computer with a black-and-white monitor, a dual disk drive, a modified numerical keyboard, a reaction time panel, and a printer. Three performance tests were used with this equipment. The previously used SRT test and a test of short-term memory were adapted to this computer sys- tem, and a new test of CRT was developed. This system had several shortcomings owing to its technical limi- tations, e.g., poor picture quality due to low graphic resolution, and poor precision in the timing of response latencies. A small working memory and a slow basic language were also severely limiting factors with respect to test development. For a review of system requirements for computerized automated testing, see Beaumont (1982~. When a new generation of the same computer was introduced, the performance assessment system underwent further development. The .. ._ ., .~ ~~ ~ ~

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 367 new computer was equipped with high-resolution color graphics, a more flexible working memory, and a basic language that facilitated the construction of long sequences of tests. The requirement of a timing accuracy of at least 1 ms was met by using an external clock with program routines in Assembler language. Several of the previ- ously used tests were implemented on this equipment (i.e., SRT, CRT, a memory test, and a test of numerical ability). Furthermore, new tests were developed for use with this computer, e.g., a Complex Reaction Time task using color words as stimuli. The performance assessment system was further improved in 1984 by using a later version of the computer, which was equipped with greater working memory for graphics and a high-quality color moni- tor. The number of tests currently available on this system is 14. The SPES has now been transferred to IBM computers to facilitate its use by other research groups. DESCRIPTION OF THE SWEDISH PERFORMANCE EVALUATION SYSTEM The SPES consists of a number of semiautomated computerized performance tests and various scales for the subjective evaluation of performance on the tests, of mood, and of different kinds of symp- toms. The system is designed to be dynamic and flexible. Thus, it allows the researcher or the practitioner to choose among the tests and the scales, adapting the battery to the specific purpose of the evaluation at hand. The system is also intended to undergo gradual improvement based on analyses of the results of ongoing empirical studies and on future experience with the use of the system. With few exceptions, the performance tests are nonverbal, i.e., they can be used to assess performance independent of the language of the subjects. Some of the tests, e.g., the Color word vigilance (SPES3:1), the Color Word Stress (SPES3:2), and the Verbal Reasoning test (SPES10) can be easily adapted to other languages and will only require trans- lation of the text files used by the programs. These text files, which are easily edited, contain all the verbal communication with the sub- ject (i.e., instructions on how to perform the test as well as the test items). The only test that requires a completely new construction and standardization if used with non-Swedish speaking subjects is the Vocabulary test (SPESll). Hardware Any IBM or IBM-compatible PC, XT, or AT, equipped with an external clock card (SB11 multifunction card, Emulex Corp., 3545 Harbor

368 F. GAMBERALE, A. IREGREN, AND A. KJELLBERG Blvd, P.O. Box 6725, Costa Mesa, CA 92626), an overlay to the key- board, an EGA graphics card, a color monitor (for some of the tests), and an optional printer may be used. A hard disk is not necessary to run the test battery. However, because the full system is too large to be contained within a single diskette, a hard disk is recommended. Software The programs, which are available in compiled form on diskette, are written in TURBO Pascal. The system includes a master program referring to a number of different test programs, which can be com- bined to any preferred sequence. This sequence may in turn be repeated any number of times, according to the design of the investigation at hand. The graphic presentations within this system are developed with the aid of a graphic tools package TURBO PAINT TOOLS 1986 DATABITEN/P.S. DATAKRAFT. At present, the system consists of the 14 tests listed in Table 1, in addition td four scales for the measurement of reported mood, symp- toms (two scales), and self-rated performance. The tests are Simple, Choice, and Complex Reaction Time (four tests); Search and Memory Test; Symbol Digit; Digit Span; Logical Reasoning; Additions; Finger Tapping (two tests); Vocabulary; Digit Classification; and Digit Addition. A short description of each test may be found in the appendix to this chapter. Anyone interested in further information about the SPES system should contact Anders Iregren, who is responsible for the system development and the distribution of SPES software. EMPIRICAL BACKGROUND AND APPLICATIONS Table 2 lists the investigations that have been performed so far by using SPES tests. These include experimental studies in the labora- tory, occupational studies of effects from acute or long-term exposure to various agents, two investigations applying SPES tests in clinics of occupational medicine, and two studies directly aimed at the meth- odological evaluation of the tests. Standardization Stucly A standardization sample of 100 subjects went through SPES1, 2, 3:1, 4, 5, 6, 7, 10, and 12:1 (Kjellberg and Wisung, i987~. A large proportion of the subjects (i.e., 38 persons) were university students, and 62 were employees of NIOH. For 59 of the latter, testing was repeated four to five months later.

COMPUTERIZED TESTING IN NE UROTOXICOLOGY TABLE 1 Tests and Scales in the SPES SPES Code Performance Tests 3:1 3:2 4 5 6 7 Simple Reaction Time Choice Reaction Time Color Word Vigiliance Color Word Stress Search and Memory Symbol Digit Digit Span Additions Digit Classification Digit Addition No. of Items (+ Practice) Parameters Extracted 80 + 16 Mean, SD, 112 + 32 192 + 16 192 + 16 3* (10 + 1) 6 + 4 Varies 40 ( + 3) 240 10 Verbal Reasoning 11 Vocabulary 12:1 Finger Tapping Speed 24 (3 + 1) 12:2 Finger Tapping Endurance 369 Approx- imate ~- ~ ~me (min) decrement 6 Mean, no. of errors 9 Mean RT, no. of commissions, 8 no. of omissions Mean RT, no. of . commissions, no. of omissions Mean RT/search level 10 Mean RT, no. of errors Length of memory span Mean RT, no. of erros Mean RT, no. of errors, no. of lags 120 Mean RT, no. of errors, no. of lags 64 Mean RT, no. of errors 45 No. of correct answers Mean no. of taps/hand 1 Changes of movement time and resting time over test time 4 7 8 8 5 Approx- imate SPES Time Code Self-Rating Scales No. of Items Paramters Extracted (min) 30 Performance 1/test Percent of maximum performance 1 31 Mood 12 Activity score and stress score 3 32 Acute symptoms 17 No. of symptoms reported 4 33 Long-term symtpoms 38 No. of symptoms reported 6 NOTE: SD = standard deviation; RT = reaction time.

370 - - o o o x + _ U] do ~4 ~ C U) ~ Ct au CD in Cal on V) cn Cry . . On ·_' V) ·_I cry ~ in ¢ .s° Ed ~ o US CD Ct o . - Cal Go ~ ~ ~ ~ di LO DO ED ~ ~0 ~ Cal ~ ~ 00 + ~4 Us ~ ~ di Dot di ~ cat ~ ~ ~ ~ di ~ ~ ~ oO _ ~ _ _ '_ ~ _ ~ cat ~ ~ ~ at 0 ~ ~ ~ ~ ~ ~ ~ ~ m; ~ ) ~ ~ o on cn cry, us ~ - , En ~ at, me) ~ ~ Car _ ~ c~) 0 0 .4 .~ ~ ~ ~ ~ ~ ~ c a, ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ it, ~ ~ ~ 0 0 g ~ 0 v, ~ ~ ~ cn ~ in ~ ~ ~ ~ ~ X C,, ~ ~ ~ ~ U ~ ~ ~ ~ X X ._ ._ ._ ._ ._ ._ ._ ._ ._ ._ ~ ._ tt Ct ca C~ Ct ~ C5 Ct ~ ~ ~ U) ~ CIS 04 04_ · Ct ti4 O O O ~ I_, Ct) O O O O O O O O O O U U Z ~ O _ _ _ _ ~ ~ ~ _ ~ O C ~ C ~ ~ ~ ~ ~ X ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ O ~ ~ ~ ~ ~ ~ _ _ ~ ~ ~ ~ ~ o ~ ~ O ,~, ~ ~4 ~ tTS /1J C {t I_~ ~ l_~ I_~ ~ ~ I-4 ~ ~ ~ ~4 a~ ~ ~ ~ au ~v ~ c: ~C p~= ~ ~ ~ ~ ~ ~ ~ 0D b4 ~ ~ ~ ~C ~ V) t\ r~s t13 (~5 (~: ~t {~S ~ ~ ~ ~ ~ ~ ~ CL, -i ~ V ~ ~ C~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 5 x

371 Go ~ lo ~ ~ ~ d4 I en 0 ~ ~ ~ ° 0° _ _ _ ~ _ _ _ _ _ _ O ~ ~ ~ O ~ O di O di cat Us Go Cat Cat ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ it., . - C~ ;, E-l ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ° ~ U-) ~ Cal ~ ~ - ` o ~ co in ~ - ` - , Cat U) Vat _ _ Cd ~ . . ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ _ U. CO ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ·= U t)4 ~) O O O O ~ O ~ O O O O O O ~ ~ .U .U .U .U U U U U U U U ~ O O O ~ ~ ~ ~ ·~ ~ · - ~ ~ ~ ~ O O U U Ct ~ ~5 ~ ~ ~ ~ C15 ~S n5 ~S .= .= t>4 ~ ~ b0 ~ 6,4 u (34 e,.o b4 e,o Z Z O O O O ~ O ~i O O O O ~ ~ _ _ C~ ~ .~ ~ ~ ~ ~ ~ ~ ~ ~ ~m ~ C ¢u (U C; 43} ~ ~ ~ ~ ~ O C ~ ~ ~ o ~ o U C' ~) ~¢ ~ ~ ~ ~ ~ ~ ·= ~ ~ ~ ~ U ._ ~ O U ~ CC U CD cn cn o U) o ._ - ._ ._ U. ~4 o 7- U, U) ._ CO o U . . o

372 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG The aim of this study was to assess (1) the reliability of the tests (test-retest as well as homogeneity); (2) the factor analytic structure of the tests; (3) learning and fatigue effects; and (4) sex and age dif- ferences, as well as differences among educational levels. Because the educational level of the majority of this group was high, the re- sults could not be treated as norm data, at least not for the tests in which educational level proved to yield significant differences. Simple Reaction Time Standardization Data from previous field and epidemiologic studies using the SRT task were reanalyzed for standardization purposes (Soderman et al., 1982~. The sample consisted of 730 industrial workers, 306 of whom were exposed to solvents in their work. For 83 of the nonexposed and 149 of the exposed workers, measurements were made twice, before and after a work shift. Numerous different performance indi- ces were assessed with respect to their power to discriminate among exposed and nonexposed workers, age groups, and morning-after- noon performance. Clinical Validation of the Battery Six SPES tests (i.e., SPES1, 2, 3:1, 5, 6, and 10) were used as a complement to traditional tests for diagnosing occupational illness due to the chronic effects of long-term exposure to organic solvents (Iregren et al., 1987~. A total of 148 cases with suspected solvent- induced illness were tested at four Swedish clinics of occupational medicine over 15 months. The aim of this study was to investigate whether these computerized tests were useful in a clinical situation, from a practical as well as from a psychometric point of view. Clinical Trial with SPES} During one year, the SRT test was administered to 51 consecutive patients referred to a clinic of occupational medicine with suspected solvent-induced occupational illness (Hagberg and Iregren, 1984~. The performance of these patients was compared to that of a control group. Other Studies Many of the tests have been used in several experimental and field studies, as well as in epidemiological investigations, conducted at the Institute of Occupational Health to assess the effects of different

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 373 environmental factors (see Table 2~. Data from these studies have been used to assess the sensitivity and, in some cases, the reliability of the tests. Furthermore, in some cases the validity of the test is supported by research conducted with similar tests at other research institutions. RESULTS Standardization Study Table 3 shows means and standard deviations over repeated mea- surements of the different performance indices for each of the nine tests included in the standardization study. The mean response times might be viewed as indicators of the degree of difficulty of the tests. Thus, Tapping stands out as the easiest test (a frequency of 66.5/10 s corresponds to a response time of 150 ms), followed by Simple RT, Search and Memory (mean time for each of the 30 letters), Choice RT, Color Test Vigilance, Additions, Symbol Digit, and Reasoning. No response times were recorded in Digit Span. Reliability Coefficients Test-retest reliability coefficients and alpha coefficients for the tests are also given in Table 3. The test-retest coefficient shows to what extent the performance level is a stable characteristic of the indi- vidual, whereas the alpha coefficient reflects the precision of the individual measurement. The alpha coefficient could thus be viewed as an esti- mate of the test-retest coefficient, given that the measurement is repeated under identical circumstances. Mean response times in Choice RT, Color Test Vigilance, Additions, and Simple RT all proved to be highly reliable. The test-retest coefficients of Search and Memory, Symbol Digit, Digit Span and Reasoning were all below 0.80. As expected, response times were found to be more reliable than error rates, among which the error rate in Symbol Digit is notable as a very unreliable performance indicator. Sex Differences The men had a higher mean educational level, and sex differences therefore were analyzed within the group with academic education. This group contained 31 men and 30 women with a similar age distri- bution. The women tended to make somewhat more errors in the Choice RT test, to give fewer correct answers in the Reasoning test,

374 __ Lo 11 cn ._ V, U) CJ) V) o Em ._ cn EM o cn o · ~ Cal CD C~ U) a~ C~ ¢ E~ o U) ~ CD .o cn oo ~ ~ ao ~ ~ U~ co c~ ~ ~ ~ ~ ~ ~ ~ cr~ oo ~ oo ~ ~ ~ o0 ~ O d~ U~ ~ di ~ O ~ O ~ O ~ ep oo ~ ~ ~ o ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ oo ·-- ·1- ·- ·- ·- . .. .. .. o o ~ o o o o o o o o ~ o ~ ~ ~ ~ ~ o o ~ o . . . . . . . . . . . . ooo oo oo oo oo o A ~ A A /\ A A V /\ O ~ ~ ~ ~ ~ C~ O d4 Ct) =~ ~ ~4 ~ ~ ~ eP ~ ~ O O ~i ~4 0 ~ O ~ co ~ ~i 00 00 0 ~ ~ ~ 00 00 C~ 00 00 ~ ~ ~ ~ 00 00 o0 ~ O U) (o ID 10 C') ~1 ~ It-) O ~ C~ ~ ~ d4 0 . . . . . . . ~ ax 0 ~ u~ c~ oo - - O 00 . . . . . . . O d~ ~ U) '_ ~ ~ ~ C~ C~ _ ~ ~ ~ ~ ~ ~ ~ .~- C ~ O cn U . t_ . . C~ ~ O ~ C~ ~ ~) 00 0 C~ C~ co ~ '_ . . . . ~ ~ u~ O O. ~ u~ ~ ~m =00 u~ ~ .e 6^<0 ~ O s 5, -, 0 ~ ': -, 0 -° - , o .E ~ ~ . . ~ u~ . . . ~ ~ oo d. . ~ oo oo ~o 0 ~ ~ 0 ~ . . . . . . 00 oo oo A A A A . - 0 .- ~4 11 _ co u u' ~ -- ~D~ ~0 ~ ao ~ ~ ~ ~ 0 ~ 0 . . . . . . . ~ ~ 0 ~ ~ 0 c~ 0 _ 0 _ ~o aN _ co _ u, ~ ._ ._ ~ ._ _ v 0 ~ ~o .O ~ 0 ce ~ r ~ .. .. E~ ~ oo ~ cn

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 375 and to have a lower tapping frequency in the dominant hand. How- ever, only the difference in the Tapping test was significant at the 0.05 level. Age Differences To minimize the influence of the effects of sex and education, age effects were also tested in the group with academic education. This group was divided into three ages (29 years or younger, 30-39 years, and 40 years or older). Age differences in response times were tested with one-way analyses of variance, whereas error rates were tested with extended median tests. Significantly longer response times were obtained in the oldest age group for Choice RT and Symbol Digit. A similar tendency (p < 0.10) was found for Color Test Vigilance and Tapping (dominant hand). Error rates were very similar in the three groups, indicating that the prolonged response times did not simply reflect a more cautious strategy in the older group. Educational revel The composition of the standardization sample did not permit a division into groups with different educational level and similar age and sex distribution. To gain an impression of the importance of educational level, the differences between groups at three educational levels (no academic education, lower academic education, doctoral degree) were analyzed with analyses of covariance using sex and age as covariates. No differences between educational groups were ob- tained in Simple RT, Color Word Vigilance, Choice RT, or Tapping. The group with the highest educational level performed significantly better than the other two with respect to response times in the Search and Memory test (p = 0.011), response times in Reasoning (p = 0.004), and length of the Digit Span (p = 0.014~. The group with the lowest educational level made more errors in the Search and Memory test than the other two groups (p = 0.0006~. Response times in Symbol Digit and Additions and the number of errors in Reasoning were successively lowered with higher educational level. Other Individual Differences No performance differences were obtained between groups who were more or less experienced with work on computers or between subjects who had or had not participated in experiments using com- puterized tests.

376 F. GAMBERALE, A. IREGREN, AND A. KJELLBERG Fatigue and Learning Effects Training effects and other effects of increased experience with the tests were analyzed both within the tests and between the repeated testings. Within-test changes were analyzed by computing mean re- sponse times or errors for successive comparable periods. One-way analyses of variance were performed to test the significance of changes as a result of time on task. The conservative estimate of the level of significance recommended by Greenhouse and Geisser (1959) was used in these tests. The changes between repeated testings were analyzed in the group of 59 subjects who performed the tests twice and were tested with l-tests (Table 3~. In most tests no significant changes occurred after the prescribed training period. A significant training effect was obtained only in Additions, where response times were shortened from the first to the second half of the test (p = 0.00001~. There was also a tendency for errors to diminish between the two halves of the Verbal Reasoning test (p = 0.06~. The opposite effect was obtained in Simple RT where response times were gradually prolonged (p < 0.00001), being 17 ms longer during the fifth than during the first minute. Similarly, in Color Word Vigilance (CWV), response times were fairly stable until the last minute of the test when they were significantly prolonged (p = 0.018~. As shown in Table 3, performance improved between the two ses- sions in Search and Memory (two letters) and Reasoning (number of errors). The performance decrement in Simple RT was also less in the second session. Surprisingly, response times in Symbol Digit were prolonged in the second session. Factor Analysis of Response Time A factor analysis was performed on response latency data from all the tests except Digit Span. The model used was a principal factors analysis with an orthogonal varimax rotation (the resulting factor loading matrix is given in Table 4~. A two-factor solution was chosen, although the second factor was rather weak. Simple RT and Choice RT had the highest loadings in the first factor. Tapping also had a much higher loading in this factor than in the second one. Thus the first factor seemed to represent motor response speed. Search and Memory (SAM) had the highest loading in the second factor, but both Reasoning and RT Additions had higher loadings in this factor than in the first one. Thus, this factor seems to represent decision processes that require more com-

COMPUTERIZED TESTING IN NEUROTOXICOLOGY TABLE 4 Factor Analysis of Response Times: Factor Loading Matrix of an Orthogonal Two-Factor Solution Test Factor I Factor II h2 Simple RT .74 .06 .56 Choice RT .76 .33 .69 CWV .52 .52 .54 SAM .21 .83 .74 Symbol Digit .45 .48 .44 RT additions .05 .44 .20 Reasoning .24 .48 .29 Tapping -.57 -.19 .36 Eigenvalue 3.11 .70 SOURCE: Data from Kjellberg and Wisung (1987). 377 Flex information processing. However, it should be noted that both RT Additions and Reasoning had very low commonalities. Thus, the response times in the two tests that required the most advanced in- formation processing primarily reflected factors not common with the other tests. Correlations between the accuracy scores were much lower than between response latencies, primarily as a result of the small varia- tion in accuracy measures in most tests. A factor analysis of these scores therefore did not yield any interpretable factor structure. Other Data on the Stability of Performance Measures The stability of performance on the tests can also be evaluated by analyses of data from the experiments by using repeated measure- ment designs. Table 5 shows the correlation coefficients between suc- cessive measurements using Simple RT, Choice RT, and Color Test Vigilance in an experimental study with exposure to toluene in com- bination with ethanol ingestion (Iregren et al., 1986~. A total of 12 subjects were tested three times during each of four sessions, which were separated by intervals of two weeks. Table 6 shows the consecutive correlations for data from Simple RT and Choice RT obtained in another experiment, evaluating the acute effects of toluene exposure on a sample of 26 spray painters (Iregren, 1986~. In this study the subjects were tested in two sessions separated by one week, and the tests were repeated three times within each session.

378 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG TABLE 5 Correlation Coefficients for the Relation Between Successive Measurements with Three SPES Tests in an Experimental Study of the Effects from Toluene Exposure and Ethanol Ingestion (decimal points are omitted) Day ~ IT Ill IV Testing occasions 1-2 2-3 3-1 1-2 2-3 3-1 1-2 2-3 3-1 1-2 2-3 SRT mean RT 63 97 86 74 97 77 93 91 93 87 91 CRT mean RT 91 90 80 85 83 73 94 73 67 81 75 CWV mean RT 47 79 74 78 92 85 88 91 78 94 91 CWV hits 86 94 80 67 69 74 83 79 93 86 87 SOURCE: Data from Iregren et al. (1986). TABLE 6 Correlation Coefficients for the Relation Between Successive Measurements with Two SPES Tests in an Experimental Study of Acute Effects from Toluene Exposure in a Group of Spray Painters (decimal points are omitted) Day II Testing occasion 1-2 2-3 3-1 1-2 2-3 SRT mean RT 79 70 CRT mean RT 84 90 70 82 91 91 90 89 SOURCE: Data from Iregren (1986). Performance data from consecutive sessions in a field experiment on the acute effects of exposure to electric and magnetic fields (Gamberale et al., 1988) are presented in Table 7, from which it can be seen that there are effects of learning as well as of time of day. In this study, Simple RT, Color Word Vigilance, Symbol Digit, and Digit Span were administered in the morning and in the afternoon of two consecutive days to a sample of 24 workers in the electrical industry. Table 8 presents the correlations between consecutive measurements in this study. In evaluating the data presented in Tables 5, 6, and 8, one should bear in mind the small group sizes (N = 12, 26, and 24, respectively) as well as the relative homogeneity of the groups. Both these factors contribute to a restricted range and thereby a reduced variance within the groups, which restricts the possibilities of obtaining high correla-

COMPUTERIZED TESTING IN NEUROTOXICOLOGY TABLE 7 Performance on Four Tests in the Field Experiment of Acute Effects of Electric and Magnetic Fields 379 Day 1 a.m. p.m. Day 2 a.m. p.m. Simple Reaction Time Reaction time (ms) Mean 253 245 252 242 SD 29 28 34 29 Variation Mean 54 52 52 51 SD 15 15 16 18 Decrement Mean 16 10 10 12 SD 21 15 20 14 Color Word Vigilance Reaction time (ms) Mean 540 517 519 501 SD 45 44 45 39 No. correct Mean 42.2 45.5 45.3 46.8 Symbol Digit Reaction time (ms) Mean 29.9 25.5 24.9 23.5 SD 5.8 4.5 4.7 3.5 Digit Span No. correct Mean 7.0 7.4 7.8 8.1 digits SD 1.2 1.2 1.0 1.9 SOURCE: Data from Gamberale et al. (1988a). TABLE 8 Pearson Correlation Coefficients for the Correspondence Between Successive Measurements Using Four SPES Tests in a Field Study of Acute Effects of Exposure to Electric and Magnetic Fields (decimal points are omitted) Day II Testing occasion 1-2 2-1 1-2 SRT mean RT 89 92 84 CWV mean RT 68 67 50 Symbol Digit RT 86 89 80 Digit Span length 63 61 75 SOURCE: Data from Gamberale et al. (1988a).

380 F. GAMBERALE, A. IREGREN, AND A. KJELLBERG lion coefficients. Furthermore, the correlations presented in Table 5 are also lowered by the effects of ethanol in the study. If these facts are considered, the presented data are impressive, and especially for Simple RT, Choice RT, and Symbol Digit they indicate suitability for use with repeated measurements designs. Simple Reaction Time Standardization Soderman et al. (1982) give a detailed report of the analyses made of the simple RT data collected in a group of 730 workers. Reaction Time Distribution The distribution of RTs is given in Figure 1. The figure is based upon the 67,840 RTs obtained from the 424 workers who were not exposed to industrial solvents. As expected, the distribution is skewed positively, although not extremely so. Effects of Time on Task The RT is gradually prolonged in a way similar to that found in the main standardization study. However, the decrement is some- what smaller, and the mean RT is about 20 ms longer than found in that study. Percent 1 0 - 8 4 - 2 - 100 200 300 400 500 600 700 Msec FIGURE 1 Distribution of reaction times in SPES1, Simple Reaction Time (N = 67,840; 424 individuals x 160 RT). SOURCE: Data from Soderman et al. (1982).

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 381 Discriminatory Power of Different Performance Indices Several statistical indices of reaction time performance were evalu- ated with respect to their statistical characteristics in a simulated log- normal distribution. Among other things, these analyses showed that all skewness measures gave very unsatisfactory estimates of population values. On the basis of these analyses, seven indices were selected for further evaluation by using the data from the worker group. The indices were the mean (X) and the median (Md) reaction time RT, the standard deviation (s) and the semiinterquartile range (Qj, and the regression of the RTs as a function of time on task (Klin). Two measures of the zero point of the RT distribution were also computed but were found to be unsatisfactory. The discriminatory power of the indices was evaluated with respect to age, time of day, and occupational exposure to solvents. In tests of the effects of age and time of day, only data from the unexposed group were used. To obtain a more detailed analysis of the effects on the RT distribution, the RTs of each person were sorted from the slowest to the fastest of the 160 RTs. The discriminatory power of each of these 160 steps in the RT distribution was tested. Age effects were tested by classification of subjects into three age groups, the oldest group being about 5 years older than in the stan- dardization study. The regression did not show any age effect, whereas the measures of variation and central tendency all discriminated about equally well between the age groups. Figure 2A shows that the dif- ferences between the age groups are largest in the upper part of the RT distribution. However, as is evident from Figure 2B, the discriminatory power peaks around the 90th percentile. Analyses of the data from the 83 subjects who performed the task both in the morning and in the afternoon did not yield a significant ettect ot time of day in any of the measures. However, the most consistent difference between the two times of testing was found for means and medians (p = 0.12) and Klin (p = 0.11~. In all the measures, afternoon performance tended to be better than morning performance. Figure 3A shows that the difference between morning and afternoon was about the same in the whole RT distribution, with the exception of the lowest and highest percentiles. The discriminatory power (Figure 3B) peaked at the 10th percentile. Corresponding data for the exposed and unexposed workers are given in Figures 4A and B. The mean, median, and standard devia- tion were all significantly larger in the exposed group. No consistent effect was found in Klin. Figure 4A shows that the difference be- '' ~ . ~ ~

382 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG A Msec So 1 40] 30 20 10 - o -10 -20 -30 to -50 - . . · ~ ~ · . ~ . O 20 40 60 80 100 Percentile , · . B F(2,421) 10 ~ 92. ,`~ 6 4- 3- 2] 1 ; O- . . . . . . · ~ 0 20 40 60 80 100 Percentile FIGURES 2A and 2B. Mean differences between three age groups in different parts of the reaction time distribution in SPES1 and the significance level for these differences. The lower curve shows the difference between the youngest group (-35 years, N = 129) and the middle age group (36~5 years, N = 97~. The upper curve shows the difference between the oldest group (46~5 years, N = 196) and the middle age group. The figure on the right shows the F-values for the differences between three age groups. SOURCE: Data from Soderman et al. (1982). tween the two groups was most pronounced in the upper end of the RT distribution. The discriminatory power, however, decreased gradually from the lowest to the highest percentiles. Thus, the parametric mea- sures (mean and standard deviation) generally fared better than their nonparametric counterparts. Given the fact that the maximum dis- crimination was found in different parts of the RT distribution for

COMPUTERIZED TESTING IN NEUROTOXICOLOGY Msec 40 - 30 - 20 - 10 - o B t(82 31 0 20 40 60 80 100 Percentile 0T 0 20 1 · ' ~ ' 1 ' ~ 40 60 80 100 Percentile 383 FIGURES 3A and 3B. Mean differences between mowing and afternoon measure- ments in a group of workers not exposed to solvents (N = 83) in different parts of the reaction time distribution in SPES1; t-values for the differences are shown in the figure on the right. SOURCE: Data from Soderman et al. (1982). the three effects tested, it also seems wise to choose measures that are affected by the whole distribution. In spite of the fact that Klin showed no consistent effect of either age or solvent exposure, it might be worthwhile computing this measure because it was the one that gave the most consistent difference (although insignificant) between morning and afternoon performance.

384 F. GAMBERALE, A. IREGREN, AND A. KJELLBERG A Msec 40 - 30 - 20 - 10 B 80 100 0 20 t(71 0 5,0\ 4,0 ~~_~ 3,0- 40 60 Percentile 2,0 - t,O - O O - . ~ 20 40 60 80 100 Percentile . . . ~ , , , FIGURES 4A and 4B. Mean differences between solvent exposed (N = 292) and nonexposed workers (N = 424) in different parts of the reaction time distribution in SPES1; t-values for the differences are shown in the figure on the right. SOURCE: Data from Soderman et al. (1982). CLINICAL VALIDATION STUDY General Results The test battery proved to be simple to use in clinics, and neither psychologists nor patients had any difficulty in utilizing the tests or the equipment. Furthermore, results indicated that computerized tests predicted the diagnosis slightly better than traditional tests.

COMPUTERIZED TESTING IN NEUROTOXICOLOGY Descriptive Statistics 385 Mean values for the different diagnostic groups on the various performance measures are presented in Table 9. The p-values for the group differences from a one-way analysis of variance are also given. Discriminatory Power The predictive power of the computerized tests on the diagnosis was tested with a multiple regression analysis. The multiple correla- tion coefficients ranged from 0.54 to 0.81 for the three psychologists involved. Fairly low though they are, these correlations are still slightly higher than those obtained between traditional tests and diagnosis. CLINICAL TRIAL WITH SPESL Table 10 shows performance on the Simple Reaction Time test for the four diagnostic subgroups and the control group. The positive TABLE 9 Mean Values for Various Performance Measures in the Diagnostic Groups and p Values for Group Differences Solvent-Induced Illness Test/Variable No Possibly Yes p Simple RT Mean 333 446 475 0.039 Standard deviation 80 125 139 0.004 Choice RT Mean 958 983 1,092 0.003 Color Word Vigilance Mean 641 705 710 0.003 No. of misses 6.9 12.5 14.4 0.081 No. of alarms 8.1 7.2 8.6 0.836 Symbol Digit Mean 45.7 50.4 55.9 0.142 Estimated RT 37.6 44.1 50.5 0.033 Errors 0.83 0.66 0.79 0.931 Digit Span 50% level 6.1 5.7 5.1 0.002 Reasoning Mean RT 7.9 8.4 7.7 0.723 No. correct 45.7 42.0 41.0 0.510 SOURCE: Data from Iregren et al. (1987).

386 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG TABLE 10 Mean and Standard Deviation for Performance on the Simple Reaction Time Test, Group Size and Age for Various Groups in Clinical Try-out of SPES1 Reation Time Age Mean Diagnosed Group Mean (SD) Variability N (years) SD Solvent-~nduced illness 543 122 5 48 12 (206) Possibly solvent-~nduced 388 104 17 51 8 illness (135) Psychiatric illness 315 73 10 44 12 (105) Other diagnoses 268 59 19 46 14 (e.g., low back pain) (49) Control group 242 46 27 39 7 (25) SOURCE: Data from Hagberg and Iregren (1984). predicted value for the SRT on this diagnosis is 56 and the negative predicted value is 79, given a cut-off limit from a 99 confidence inter- val derived from the control group data. POSSIBLE FUTURE DEVELOPMENT OF SPES The increasing technical competence of computers will certainly broaden the range of mental abilities that can be tested. However, due to the laborious procedure of test development, well-standardized and validated new tests are still a few years off. The current tests, which already have provided much useful infor- mation about the neurotoxic effects of many substances, will be ap- plied in closer collaboration with representatives from other disciplines. Thus, we will be able to relate performance data to increasingly pre- cise measurements of the exposure to toxic substances, as well as to more sophisticated physiological and neurochemical effect measures. In the long run, this development will increase our understanding of the biological mechanisms behind the functional changes that we ob- serve and will provide still better validation for the performance measures. However, the immense variety of performance measures in use is probably the single factor that at present has the greatest effect on the rate of growth of knowledge. The possibilities of making com- parisons across studies performed at separate laboratories and in dif- ferent countries are of major importance, and initiatives to facilitate the standardization of computerized tests have been taken within the

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 387 European Economic Community. One significant problem in this process is the slightly different primary uses of various tests and test systems, because the intended use of a test naturally affects the way in which it is implemented. However, the development of standard- ized test protocols is now in progress, and efforts to accomplish this task have been made at our laboratory as well as elsewhere. This volume is a good example of the present strivings. APPENDIX Description of Performance Tasks Simple Reaction Time SPES1 is a sustained attention task measur- ing response speed to an easily discriminated but temporally uncer- tain visual signal. The task is to press a key on the keyboard as quickly as possible when a red square is presented on the display. A total of 96 stimuli are administered during 6 min at intervals varying between 2.5 and 5.0 s. The first minute serves as practice, after which performance capacity is assessed for 5 min. Choice Reaction Time SPES2 is a four-choice RT task similar to SPESl with the addition of response selection requirements, The stimuli consist of crosses displayed one at a time on the screen. One arm of the cross is always shorter, and the task is to indicate on one of four keys, placed in analogy to the arms of the cross, which arm is the shorter. A total of 144 stimuli are presented at the same intervals as in SRT SPESl, and the first two minutes are excluded as practice trials. Color Word Vigilance SPES3:1 is a Choice Reaction Time task in which response selection is based on a more complex signal charac- teristic than in SPES2. It is a task of vigilance type since a response is required only to a minority of the signals. The Swedish word for "red," "yellow," "white," or "blue" (all three-letter words) is presented on the screen. The text can be written in any one of the colors. The task is to press a key as rapidly as possible when there is congruency between the meaning of the word and the color of the text. The interval between consecutive stimuli is 2.2 s, and the 16 possible combinations of words and color are randomly distributed within each sequence of 16 stimuli. Thus, the proportion of critical stimuli is 25 percent. A total of 256 items are presented, and the first 16 are regarded as practice trials. Color Word Stress SPES3:2 is a version of SPES3 which is con- structed to provoke false alarms, and thus primarily measures the ability to inhibit such responses. The stimuli are the same, but the

388 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG interval between subsequent stimuli is decreased to 1.5 s, and the proportion of critical stimuli has been increased from 25 to 75 per- cent. Search and Memory SPES4 measures the speed of comparing stimuli shown on the screen with a set of stimuli retained in memory. One, two, or three letters are presented on the screen for 1, 2, or 3 s, respectively. The task is to reproduce the letters on the keyboard after their disappearance. Following a successful reproduction, a row of 30 letters is presented. The task is to search this row as fast as possible for the occurrence of any of the critical letters, and each appearance is indicated by pressing a key. There may be anything from 0 to 3 critical letters in each row. Altogether there are 33 items, 11 for each number of search letters. The first trial at each level is regarded as practice. Symbol Digit SPES5 is a revised version of a traditional test of perceptual speed. In one row, a key to this coding task is given by the pairing of symbols with the randomly arranged digits 1 to 9. The task is to key in as fast as possible the digits corresponding to the symbols presented in random order in a second row. Each item consists of nine pairs of randomly arranged symbols and digits, and a total of ten items are presented in all. Performance is evaluated for the last six items of the test. Digit Span SPES6 is a traditional test of short-term memory capac- ity. Series of digits are presented on the screen. The digits are presented one at a time with a 1-s presentation time, and the task is to repro- duce the series on the keyboard. Depending on the correctness of the answer, the length of the following series is either increased or de- creased. The test starts with a series of three digits and is terminated after six changes from a correct to an incorrect answer. Additions SPES7 measures speed of simple mental arithmetic op- erations. An addition task comprising three horizontally placed dig- its is presented on the screen for 1 s. The task is to add the digits as quickly as possible and to indicate the sum on the keyboard. The test includes a total of 43 items. Digit Classification SPES8 is a continuous CRT task. Digits ranging from 1 to 8 are presented one at a time on the screen. The task is to determine whether the digit presented is odd or even and to respond by pressing one of two appropriately marked keys. As soon as a response is given a new digit appears, and 240 digits are presented in all. Digit Addition SPES9 is a version of SPES8 requiring more com- plex processing of the signals. The digits are presented one at a time on the screen for 1.5 s at intervals of 1.8 s. The task is to add the digit

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 389 currently presented to the previous digit and determine whether this sum is odd or even. The response is given by pressing one of two appropriately marked keys. Verbal Reasoning SPES10 measures the speed and accuracy of ver- bal reasoning. Sentences of varying syntactic complexity are presented on the screen. Each sentence describes a relation between the letters A and B. and it is followed by a combination of these letters. The task is to indicate with one of two keys whether the sentence gives a correct description of the relation between the letters A and B. There are 32 different items in a random series which is repeated twice. Vocabulary SPESll is a traditional test of verbal understanding. The task is to indicate which of five alternatives is the synonym of a key word. A total of 45 items 15 nouns, 15 verbs, and 15 adjec- tives are presented. The words have been selected from a 102-item vocabulary test which was distributed as a paper-and-pencil test to 164 subjects with varying educational background. The selection of words was made with the primary aim of achieving discriminatory power in a low-education group. The words are presented in as- cending order of difficulty. Finger Tapping Speed SPES12:1 measures the maximum rate of repetitive movement. The task is to tap as rapidly as possible on a key at the keyboard with the index finger. The forearm is kept in a fixed position at the table. Eight 10-s trials, with a forced interval of 15 s, are performed while alternating between the preferred and nonpreferred hand. Four trials are given with each hand, and the first trial with each hand is regarded as a practice trial. Finger Tapping Endurance SPES12:2 is a version of SPES12 in which the change in tapping rate over time is assessed. The task is to tap as rapidly as possible with the index finger on a key. A 1-min trial is performed with the dominant hand, and for each single tap the movement time and the resting time are registered separately. Performance is evaluated with respect to level and to changes over time. Description of the Self-Rating Scales Self-Rating of Performance SPES30. Within the system, it is pos- sible to let the subject rate his performance directly after each test. In the standard version, the subject is asked to rate his actual perfor- mance in percent of his maximum performance. The question could, however, easily be rephrased. Self-Rating of Mood SPES31. The scale consists of 12 mood-de- scriptive adjectives coupled to a six-category response scale. The

390 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG response categories have verbal labels ranging from "not at all" to "very much." Ratings are given by typing the number of the appro- priate response alternative. The questionnaire is based on two more comprehensive Swedish mood adjective check lists (Kjellberg and Bohlin, 1974; Sjoberg et al., 1979) each containing six subscales. Several authors have argued in favor of reducing these six dimensions of mood to two basic dimensions, an Activity or Energy dimension and a Stress or Tension dimension (Kjellborg and Bohlin, 1974; Sjoberg et al., 1979; Thayer, 1978; Watson and Tellegren, 1985~. On the basis of previously reported factor analyses, six words were selected for each of the two dimensions. Words in the original questionnaires which have been found to be unfamiliar to, or at least unnatural to use by, nonstudent groups were excluded. A score in each subscale is computed as a mean of the ratings of the six adjectives in the scale. Acute Symptoms SPES32. This questionnaire contains 17 items regarding symptoms of local irritation as well as symptoms from the CNS. The subject is asked to rate the present intensity of each symp- tom on a four-point scale. Long-Term Symptoms SPES33. The questionnaire contains 38 items regarding a wide variety of symptoms, such as vegetative symptoms, concentration deficits, fatigue, tiredness, dizziness, and symptoms of peripheral neuropathy. The subject is asked to rate the frequency of occurrence of each symptom during the last six months on a four- point scale. REFERENCES Acker, W. 1983. A computerized approach to psychological screening The Bexley- Man-Audsley Automated Psychological Screening and the Bexley-Man-Audsley Category Sorting Test. Int. J. Machine Stud. 18:361-369. American Psychological Association. 1986. Guidelines for computer tests and interpretations. Washington, D.C. Anshelm Olson, B. 1982. Effects of organic solvents on behavioral performance of workers in the paint industry. Neurobehav. Toxicol. Teratol. 4:703-708. Anshelm Olson, B. 1985. Early detection of industrial solvent toxicity. The role of human performance assessment. Arbete Halsa National Board Occupational Safety Health 21:1-59. Anshelm Olson, B., F. Gamberale, and B. Gronqvist. 1981. Reaction time changes among steel workers exposed to solvent vapor. A longitudinal study. Int. Arch. Occup. Environ. Health 48:211-218. Anshelm Olson, B., F. Gamberale, and A. Iregren. 1985. Coexposure to toluene and p- xylene in man: Central nervous functions. Br. J. Ind. Med. 42:117-122. Astrand, I., and F. Gamberale. 1978. Effects on humans of solvents in the inspiratory air: A method for estimation of uptake. Environ. Res. 15:1-4. Baker, E. L., R. Letz, and A. Fidler. 1985. A computer-administered neurobehavioral

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 391 evaluation system for occupational and environmental epidemiology. J. Occup. Med. 27:206-212. Bartram, D., and R. Bayliss. 1984. Automated testing: Past, present and future. Occup. Psychol. 57:221-237. Bartram, D., J. G. Beaumont, P. Cornford, P. L. Dann, and S. Wilson. 1987. Recommendations for the design of software for computer based assessment Summary statement. Bulletin for the British Psychological Society 40:86-87. Beaumont, J. G. 1982. System requirements for interactive testing. Int. J. Man-Machine Stud. 17:311-320. Beaumont, J. G. 1985. The effects of microcomputer presentation and response medium on digit span performance. Int. J. Man-Machine Stud. 22:11-18. Beaumont, J. G., and C. C. French. 1987. A clinical field study of eight automated psychometric procedures: The Leicester/DHSS project. Int. J. Man-Machine Stud. 26:661-682. Biersner, R. J. 1972. Selective performance effects of nitrous oxide. Human Factors 43:187-194. Bittner, A. C., M. G. Smith, R. S. Kennedy, C. F. Staley, and M. M. Harbeson. 1985. Automated portable test (APT system). Overview and prospects. Behav. Res. Methods Instrum. 17:217-221. Braconnier, R. J. 1985. Dementia in human populations exposed to neuro-toxic agents: A portable microcomputerized dementia screening battery. Neurobehav. Toxicol. Teratol. 7:379-386. Carr, A. C., R. J. Ancill, A. Ghosh, and A. Margo. 1981. Direct assessment of depres- sion by microcomputer. A feasibility study. Acta Psychiatr. Scand. 64:415-422. Cassito, M. G. 1985. Review on recent developments and improvements of neuropsychological criteria for human neurotoxicity studies. Pp. 20-24 in Neurobehavioral Methods in Occupational and Environmental Health. Copenhagen: WHO. Chapman, L. J., and J. P. Chapman. 1978. The measurement of differential deficit. J. Psychiatr. Res. 14: 301-311. Denner, S. 1977. Automated psychological testing: A review. Br. J. Soc. Clin. Psychol. 16:175-179. Eckerman, D. A., J. B. Carrol, D. Foree, C. M. Guillon, M. Lansman, E. R. Long, M. B. Waller, and T. S. Wallsten. 1985. An approach to brief field testing for neurotoxicity. Neurotoxicity Toxicol. Teratol. 7:387-393. Elofsson, S. A., F. Gamberale, T. Hindmarsh, A. Iregren, A. Isaksson, I. Johnsson, B. Knave, E. Lydahl, P. Mindus, H. E. Persson, B. Philipson, M. Steby, G. Struwe, E. B. Soderman, A. Wennberg, and L. Widen. 1980. Exposure to organic solvents: A cross-sectional epidemiologic investigation on occupationally exposed ear and industrial spray painters with special reference to the nervous system. Scand. J. Work Environ. Health 6:239-273. Enander, A. 1987. Effects of moderate cold on performance of psychomotor and cognitive tasks. Ergonomics 30:1431-1445. French, C. C., and J. G. Beaumont. 1987. The reaction of psychiatric patients to computerized assessment. Br. J. Clin. Psych. 26:267-278. Gamberale, F. 1985. The use of behavioral performance tests in the assessment of solvent toxicity. Scand. J. Work Environ. Health (Suppl. 1):65-74. Gamberale, F., and M. Hultengren. 1972. Toluene exposure. II. Psychophysiological functions. Work Environment and Health 9:131-139. Gamberale, F., and M. Hultengren. 1973. Methyl-chloroform exposure. II. Psycho- physiological functions. Work Environment and Health 10:82-92.

392 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG Gamberale, F., and M. Hultengren. 1974. Exposure to styrene. II. Psychological functions. Work Environment and Health 11:86-93. Gamberale, F., and Kjellberg, A. 1983a. Behavioral performance assessment as a biological control of occupational exposure to neurotoxic substances. Pp. 111-121 in R. Gilioli, M. G. Cassitto, and V. Foa, eds. Neurobehavioral Methods in Occupational Health. Oxford: Pergamon Press. Gamberale, F., and A. Kjellberg. 1983b. Field studies of the acute effects of exposure to solvents. Pp. 117-129 in The Neuropsychological Effects of Solvent Exposure, N. Cherry and A. Waldron, eds. Hampshire, England: The Colt Foundation. Gamberale, F., and G. Svensson. 1974. The effect of anaesthetic gases on the psychomotor and perceptual functions of anaesthetic nurses. Work Environ Health 11:108-111. Gamberale, F., G. Annwall, and M. Hultengren. 1975a. Exposure to white spirit. II. Psychological functions. Scand. J. Work Environ. Health 1:31-39. Gamberale, F., G. Annwall, and M. Hultengren. 1975b. Exposure to methylene chloride. II. Psychological functions. Scand. J. Work Environ. Health 2:95-103. Gamberale, F., G. Annwall, and B. Anshelm Olson. 1976a. Exposure to trichloroethylene. III. Psychological functions. Scand. J. Work Environ. Health 4:220-224. Gamberale, F., G. Annwall, and M. Hultengren. 1978. Exposure to xylene and ethyl- benzene. III. Effects on central nervous functions. Scand. J. Work Environ. Health 4:204-211. Gamberale, F., B. Anshelm Olson, P. Eneroth, T. Lind, and A. Wennberg,. 1988a. Acute effects of ELF electromagnetic fields. A field study on linemen working at 400 kV. Solna, Sweden: National Institute of Occupational Health. Gamberale, F., H. O. Lisper, and B. Anshelm Olson. 1976b. The effect of styrene vapour on the reaction time of workers in the plastic boat industry. Pp. 135-148 in Adverse Effects of Environmental Chemicals and Psychotropic Drugs, M. Horvath, ed. Amsterdam: Elsevier. Gamberale, F., A. Kjellberg, and S. Razmjou. 1988b. The Effects of Unfavorable Thermal Conditions on Performance. Solna, Sweden: National Institute of Occupational Health. Greenhouse, S.W., and S. Geisser. 1959. On methods in the analysis of profile data. Psychometrika 24:95-112. Hagberg, M., and A. Iregren. 1984. Simple reaction time as a diagnostic aid in psycho- organic syndrome induced by organic solvents. Proceedings from the International Conference on Organic Solvent Toxicity, Stockholm, October. Hedl, J. J., H. F. O'Neil, and D. N. Hansen. 1973. Affective reactions toward computer based intelligence testing. J. Consult. Clin. Psychol. 40:217-222. Iregren, A. 1982. Effects on psychological test performance of workers exposed to a single solvent (toluene) A comparison with effects of exposure to a mixture of organic solvents. Neurobehav. Toxicol. Teratol. 4:695-701. Iregren, A. 1986a. Subjective and objective signs of organic solvent toxicity among occupationally exposed workers. An experimental evaluation. Scand. J. Work Environ. Health 12:469~75. Iregren, A. 1986b. Effects of industrial solvent interactions. Studies of behavioral effects in man. Arbete Halsa National Board Occupational Safety Health 11:1-60. Iregren, A., F. Gamberale, and A. Kjellberg. 1985. A microcomputer based behavioral testing system. Pp. 75-80 in Neurobehavioral Methods in Occupational and Envi- ronmental Health. Copenhagen: WHO. Iregren, A., T. Akerstedt, B. Anshelm Olson, and F. Gamberale. 1986. Experimental exposure to toluene in combination with ethanol intake. Psychophysiological func- tions. Scand. J. Work Environ. Health 12:128-136. Iregren, A., O. Almkvist, M. Klevegard, and U. Aslund. 1987. A clinical validation of

COMPUTERIZED TESTING IN NEUROTOXICOLOGY 393 six computerized tests for diagnosing solvent caused occupational illness (in Swed- ish). Arbete Halsa National Board Occupational Safety Health 13:1-37. Irons, R., and P. Rose. 1985. Naval biodynamics laboratory computerized cognitive testing. Neurotoxicity Toxicol. Teratol. 7:395-397. Kjellberg, A., and O. Bohlin. 1974. Self-reported arousal: Further development of a multifactorial inventory. Scand. J. Psychol. 15:285-292. Kjellberg, A., and M. Strandberg. 1979. The effects of anaesthetic gases on reaction time of anaesthetic nurses. Report No. 11. Solna, Sweden: National Board of Occupational Safety and Health. Kjellberg, A., and P. Wide. 1988. Effects of simulated ventilation noise on performance of a grammatical reasoning task. Proceedings of the 5th International Congress on Noise as a Public Health Problem, Stockholm. Kjellberg, A., and H. Wisung. 1987. Some metrical properties in a computer administered test battery for use in behavioral toxicology. Report No. 1. Solna, Sweden: National Board of Occupational Safety and Health. Kjellberg, A., B. Wigaeus, J. Engstrom, I. Astrand, and B. Ljungquist. 1979. Long-term effects of exposure to styrene in a polyester plant. Arbete Halsa National Board Occupational Safety Health 18:1-25. Knave, B., B. Anshelm Olson, S. Elofsson, F. Gamberale, A. Isaksson, P. Mindus, H. E. Persson, G. Struwe, A. Wennberg, and P. Westerholm. 1978. Long term exposure to jet fuel. A cross sectional epidemiologic investigation on occupationally exposed industrial workers with special reference to the nervous system. Scand. J. Work Environ. Health 4:19-45. Knave, B., F. Gamberale, S. Bergstrom, E. Birke, A. Iregren, B. Kolmodin Hedman, and A. Wennberg. 1979. Long-term exposure to electric fields. A cross-sectional epidemiologic investigation of occupationally exposed workers in high-voltage substations. Scand. J. Work Environ. Health 2:115-125. Laursen, P., and T. Jorgensen. 1985. Computerized neuropsychological test system. In Neurobehavioral Methods in Occupational and Environmental Health. Copenhagen: WHO. Letz, R., and E. Baker. 1986. Computer-administered neurobehavioral testing in occupational health. Sem. Occup. Med. 1:197-203. Lisper, H.O., and A. Kjellberg. 1972. Effects of 24-hour sleep deprivation on rate of decrement in a 10-minute auditory reaction time task. J. Exp. Psychol. 96:287-290. Lucas, R.W. 1977. A study of patient attitudes to computer interrogation. Int. J. Man- Machine Stud. 9:69-96. Lukin, M. E., E. Dowd, B. S. Plake, and R. Kraft. 1985. Comparing computerized versus traditional psychological assessment. Computers in Human Behavior 1:49- 58. Mahoney, E. C., P. A. Moore, E. L. Baker, and R. Letz. 1988. Experimental nitrous oxide exposure as a model system for evaluating neurobehavioral tests. Toxicology 49:449-457. Matarazzo, J. D. 1983. Computerized psychological testing. Science 221:323. McArthur, D. L., and B. H. Choppin. 1984. Computerized diagnostic testing. J. Educational Measurement 31:391-397. Rodnitzky, R. L., H. S. Levin, and D. L. Mick. 1975. Occupational exposure to organophosphate pesticides. A neurobehavioral study. Archives of Environmen- tal Health 30:98-103. Roels, H., R. Lauwreys, J. P. Buchet, P. Genet, M. J. Sarhan, I. Hanotiau, M. deFays, and D. Stanescu. 1987. Epidemiological survey among workers exposed to manganese: Effects on lung, central nervous system and some biological indices. Am. J. Ind. Med. 11:307-327.

394 F. GAMBERALE, A. IREGREN, AND A. KIELLBERG Sjoberg, L., E. Svensson, and L. O. Persson. 1979. The measurement of mood. Scand. J. Psychol. 20:1-18. Soderman, E., A. Kjellberg, B. Anshelm Olsen, and A. Iregren. 1982. Standardization of a simple reaction time test for use in behavioral toxicology. Report No. 27. Solna, Sweden: National Board of Occupational Safety and Health. Space, L.G. 1981. The computer as psychometrician. Behav. Res. Methods Instrum. 13:595~06. Thayer, R.E. 1978. Toward a psychological theory of multidimensional activation (arousal). Motivation and Emotion 2:1-34. Thompson, J. A., and S. L. Wilson. 1982. Automated psychological testing. Int. J. Man-Machine Stud. 17:279-289. Watson, D., and A. Tellegen. 1985. Toward a consensual structure of mood. Psychol. Bull. 98:219-235. Weiss, D. J., and C. D. Vale. 1987. Adaptive testing. Appl. Psych. 36:249-262. Wigaeus-Hjelm, E., M. Hagberg, A. Iregren, and A. Lof. 1990. Exposure to methyl isobutyl ketone (MIBK). Toxicokinetics and occurrence of irritative and CNS symptoms in man. International Archives of Occupational and Environmental Health. In press. World Health Orgaruzation. 1987. Prevention of Neurotoxic Illness in Working Populations, B. L. Johnson, ed. New York: John Wiley & Sons.

Next: The Scope and Promise of Behavioral Toxicology »

Behavioral Measures of Neurotoxicity (1990)

Chapter: Computerized Performance Testing in Neurotoxicology: Why, What, How, and Whereto?

Welcome to OpenBook!

Get Email Updates