between those POWs and a comparable group of WW II veteran controls (Goulston et al., 1985); psychiatric findings were much more prominent (Tennant et al., 1986; Dent et al., 1987). Beebe's earlier (1975) study, however, identified a number of conditions with significantly higher hospitalization rates among POWs; more generally, he also found that PWP illness rates were slightly higher than PWK rates and much higher than PWE rates. Thus, question C focused broadly on levels of illness, generally specified, and the main thrust of the question was directed toward determining whether Beebe's earlier findings (1975) on illness differentials still held.
Question D was in the broadest sense a holdover from the 1984–1985 follow-up. At the time of the follow-up, there were nagging questions about the quality of the self-reported data on illnesses, and it was decided that their analysis should be postponed until the more solid examination data were available for comparison. Thus, question D was formulated to emphasize the comparison of self-reported and examination data, and the data presented on question D generally support the original reluctance to analyze the self-reported data alone.
Question E, like others already discussed, was framed with findings from the 1984–1985 follow-up in mind, which had shown that malnourished WW II prisoners of the European theater had significantly elevated depressive symptoms compared with other European prisoners. The malnourished group, however, had not been included in Beebe's 1967 follow-up; consequently, less was known about their physical health in detail. Question E was thus formulated to study broadly the physical health of malnourished PWE.
Before turning to each of the above questions, some discussion of statistical testing is necessary. As was the case in Chapter 4, a useful examination of the descriptive data on POW and control lifetime prevalence rates requires some knowledge of the stability or, conversely, variability of these rates. Statistical tests are customarily used to compare such rates because a statistical test takes into account not only the magnitude of differences in rates but also the variability of the rates based on the size of the sample. Given the low response rates (see Chapter 3), however, the customary use of statistical tests would be inappropriate here. As in Chapter 4, statistical tests will nonetheless be used as indicators of whether a given difference in prevalence rates is ''noteworthy" or "appreciable." In doing this, it is recognized that a statistical test done in this setting has no valid inferential use but is instead merely a way of marking such noteworthy or appreciable differences. The additional comments made in Chapter 4 on the limitations of these tests all apply here as well; that is, the tests only mark a difference as noteworthy or not, without gradation; they may conclude that a difference is "not noteworthy" simply because sample sizes are small ("low pow-