through self-reports, but native intellectual ability is difficult to measure in a convincing manner. Three approaches have been used to characterize it: the Armed Forces Qualifying Test (AFQT), which is administered on enlistment in the US military (e.g., Storzbach et al. 2001); the information subscale of the Wechsler Adult Intelligence Scale (WAIS), which measures knowledge attained in life (e.g., Proctor et al. 2003); and the National Adult Reading Test (NART) (e.g., David et al. 2002). The WAIS and NART were administered after the veterans returned from the Persian Gulf War, so they yielded estimates based on current performance; the AFQT is the one truly premilitary measure. Each of those tests correlated with measures of overall intelligence, such as the full WAIS, but is not believed to be affected by exposures, such as to neurotoxic chemicals (that idea has been put forward as one possible cause of symptoms attributed to “Gulf War syndrome”). In addition, it is important that the examiners who administer the neurobehavioral or neurocognitive tests be blind to the condition or status of the veterans and the control population. Blinding is of less concern if the tests are administered on a computer.

Studies That Respond to Question 1 (Outcomes in Gulf War-Deployed Veterans vs Veterans Deployed Elsewhere or Not Deployed)

The committee identified two primary and five secondary studies that compared deployed veterans with those deployed elsewhere or not deployed (Table 5.3). David et al. (2002) compared the neurobehavioral test performance of 209 UK soldiers deployed to the Persian Gulf, 54 UK Bosnia peacekeeping soldiers, and 78 UK Gulf War-Era nondeployed soldiers. A broad array of neurobehavioral tests were administered to all participants and the results were analyzed, although evaluation is limited by the lack of standard deviations of the mean test scores. No differences were found among the groups after correction for age, education, intelligence (according to the NART) and the Beck Depression Inventory (BDI) score (Table 5.3).

Proctor et al. (2003) studied 143 Gulf War veterans and 72 nondeployed veterans of the Danish military. A broad array of neurobehavioral tests were administered to participants and the results were analyzed. Proctor et al. (2003), too, did not find any differences in the overall analysis of neurobehavioral test performance (Table 5.3).

Three secondary studies addressed whether deployed veterans differed from nondeployed veterans (Axelrod and Milner 1997; Vasterling et al. 2003; White et al. 2001). Only one of the three (Axelrod and Milner 1997) found reliable differences in neurobehavioral test performance between the groups after correction for age and education (Table 5.3). The study by White et al. (2001) was considered a secondary study for this outcome because the Gulf War group combined two demographically heterogeneous samples—one from Fort Devens and the other from New Orleans—and because the comparison population, Germany-deployed veterans, was small.

In its evaluation of those studies, the committee was concerned that the investigators’ analyses might have masked differences. For example, David et al. (2002) adjusted the results for depression because it is found to coexist with cognitive measures (e.g., Brown et al. 1994). That adjustment could have made it impossible to detect cognitive differences. David et al. (2002), White et al. (2001) and Vasterling et al. (2003) used an overconservative Bonferroni statistical adjustment to correct for multiple comparisons (Sterne and Davey Smith 2001), which also might have masked differences. The committee, therefore, estimated the effect size (d) (Cohen 1992) of the corrected and precorrection significant test differences (or trends) and searched for a pattern or consistency of results among the studies. The percentage of the neurobehavioral tests given that were reported as significant is listed in column 6; it varied from

