Appendix B
Design Issues in the Gulf War Veterans Health Study

Prepared for the Institute of Medicine Committee on Measuring the Health of Gulf War Veterans.

Naihua Duan

RAND, Santa Monica, California

and

Robert O. Valdez

School of Public Health, University of California at Los Angeles

Abstract

The design for the Gulf War Veterans Health Study poses a variety of challenges. In order to study changes in the Gulf War (GW) veterans' health status over time, a panel design (also known as the prospective cohort design) is indicated. Since the study also aims to examine the levels of the GW veterans' health status at various time points, consideration needs to be given to the cross-sectional representativeness of the panel, thus a repeated panel design should be considered as a potential alternative to a permanent panel design. Due to the anticipated deterioration in the quality of the locating information on the GW veterans, recruiting the GW veterans will likely require a substantial effort to track and trace the sampled participants, making it unattractive to use designs such as a rotating panel that requires repeated recruitment of new panels. Given the closed nature of the GW veteran population (there are no new entries), it is important for the study to provide timely information of value to the GW veterans during their lifetime. Thus, the study should be designed with more frequent data collection in the early years when the information obtained has a longer "useful life." Based on the consideration of various trade-offs, three of the most promising designs are the permanent panel design, the repeated panel design, and a combination of the two. A promising design is to recruit an initial panel and follow the panel every 3 years for three waves. An assessment shall be made after the third wave to assess the quality of the panel, to determine whether to continue following the same panel, to switch to a new panel, or to take a combination of the two. The survey frequency might be reduced in the second decade and beyond.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 96
Appendix B Design Issues in the Gulf War Veterans Health Study Prepared for the Institute of Medicine Committee on Measuring the Health of Gulf War Veterans. Naihua Duan RAND, Santa Monica, California and Robert O. Valdez School of Public Health, University of California at Los Angeles Abstract The design for the Gulf War Veterans Health Study poses a variety of challenges. In order to study changes in the Gulf War (GW) veterans' health status over time, a panel design (also known as the prospective cohort design) is indicated. Since the study also aims to examine the levels of the GW veterans' health status at various time points, consideration needs to be given to the cross-sectional representativeness of the panel, thus a repeated panel design should be considered as a potential alternative to a permanent panel design. Due to the anticipated deterioration in the quality of the locating information on the GW veterans, recruiting the GW veterans will likely require a substantial effort to track and trace the sampled participants, making it unattractive to use designs such as a rotating panel that requires repeated recruitment of new panels. Given the closed nature of the GW veteran population (there are no new entries), it is important for the study to provide timely information of value to the GW veterans during their lifetime. Thus, the study should be designed with more frequent data collection in the early years when the information obtained has a longer "useful life." Based on the consideration of various trade-offs, three of the most promising designs are the permanent panel design, the repeated panel design, and a combination of the two. A promising design is to recruit an initial panel and follow the panel every 3 years for three waves. An assessment shall be made after the third wave to assess the quality of the panel, to determine whether to continue following the same panel, to switch to a new panel, or to take a combination of the two. The survey frequency might be reduced in the second decade and beyond.

OCR for page 96
Introduction The design of a study is usually determined by the research questions that need to be addressed, and the target population to be studied. The proposed Gulf War Veterans Health Study (GWVHS) aims to address the following research questions: 1.   How healthy are Gulf War veterans? 2.   In what ways does the health of Gulf War veterans change over time? 3.   Now and in the future, how does the health of Gulf War veterans compare with: The general population? Persons in the military at the time of the Gulf War but not deployed? Persons in the military at the time of the Gulf War who were deployed to nonconflict areas? Persons in the military deployed to other conflicts, such as Bosnia, Somalia, and so on? 4.   What individual and environmental characteristics are associated with observed differences in health between Gulf War veterans and comparison groups? Since the study aims to address both the levels at specific time points (the first and third research questions) and changes over time (the second and fourth research questions) for Persian Gulf veterans' health status, we will consider study designs appropriate for both types of research questions. In particular, we will consider repeated cross-sectional surveys and various panel survey designs (also known as prospective cohort designs). Unlike the general population (either civilian or military), Gulf War (GW) veterans are a closed population: there is no birth, enlistment, or migration into this population. The membership in this population was determined by the participation in the Persian Gulf War, that is, those who served in the Gulf War theater of operations between August 2, 1990, and June 13, 1991. The closed nature of the GW veteran population has important implications for the study design, such as on the merits of replenishing the panel.1 Further discussions are given in the section on temporal structure below. Given the closed nature of the GW veteran population, it is important for the GWVHS to provide timely information for the GW veterans. This objective gives the GWVHS a stronger focus on public health rather than basic science research. The information obtained in GWVHS will be of value to the GW veterans only during their lifetime. Therefore timeliness of information should be 1   Discussions about the use of refreshment samples in a panel study with attrition are given in Hirano et al. (1998).

OCR for page 96
taken into consideration for the design of the GWVHS. In other words, the focus for GWVHS is more on public health (to serve the GW veterans) than basic science research (to obtain scientific knowledge applicable to future patients). This important feature indicates that the GWVHS should be designed with more frequent data collection in the early years when the information obtained has a longer "useful life" to the GW veterans, and less frequent data collection in later years when the information has a shorter "useful life." Further discussions are given in the section on Survey Frequency. In order to strengthen our ability to understand the health problems of GW veterans and their trajectories, it is worth considering a case-control component in GWVHS. The comparison group would consist of patients with medically unexplained physical symptoms in the same geographical areas as the GW veterans in the GWVHS sample. Geographical matching has an important implication for the study design, namely, the extent to which the GWVHS sample should be clustered geographically. Further discussions are given in the section on Survey Modality and Geographical Clustering. Another important unique feature about the GW veteran population is that our ability to locate individuals in this population is likely to deteriorate over time. While the Department of Defense (DoD) maintains locating information on record for all veterans, including GW veterans, the accuracy of this information is likely to decrease over time. As this report is being written, 8 years have elapsed since the Gulf War. We anticipate, therefore, that a substantial tracking and tracing effort is necessary to recruit a representative sample of GW veterans, and a substantial level of nonresponse will still occur despite this effort. This feature has important implications for the design of GWVHS, making it less appealing to recruit multiple cohorts into the study. In population studies of Gulf War veterans conducted to date, response rates ranged from a low of 31% in the study conducted by Stretch et al. (1995) to 97% of those located in a survey of women who served in the U.S. Air Force during the Gulf War conducted by Pierce (1997). Further details are given in Chapter 5 of the report. A low response rate is a concern for the validity of the data, therefore it is important to engage in efforts to increase the response rate, using survey research tools such as tracking and tracing of participants, and incentives. Further discussions on nonresponse, tracking, and tracing can be found in the section "Nonresponse, Attrition, Tracking, and Tracing." In order to help interpret the health status of GW veterans (especially the changes over time), several comparison groups (listed under the third research question above) will be included in GWVHS. Those comparison groups will be recruited and surveyed using the same design to be used for GW veterans to maximize the comparability. Since the membership in the GW veteran population versus the comparison groups is not randomized, the comparison will be vulnerable to potential selection bias problems common to all observational studies: GW veterans might be different from the comparison groups even in the absence of the Gulf War experience. In order to account for such differences, the

OCR for page 96
best possible effort needs to be made to collect data on potential confounding factors. Based on the consideration of various trade-offs in the ability of various study designs to accomplish the objectives for the GWVHS, three of the most promising designs are the permanent panel design, the repeated panel design, and a combination of the two. A promising design is to recruit an initial panel and follow them every 3 years for three waves. An assessment shall be made to evaluate the quality of the panel at the end of the third wave, to determine whether to continue following the same panel, to switch to a new panel starting in year 10,2 or to take a combination between the two. The survey frequency might be reduced in the second decade and beyond. Note that the final design decision can be made after the first three waves have been fielded; thus it can be (and should be) based on the actual cost data obtained in the field. In order to facilitate the recruitment of the second panel if warranted, it is worth considering recruiting a "reserve" sample along with the initial panel, giving them a brief enrollment interview to collect contact information, and maintaining contact with them over time through tracking. This provision will reduce the potential deterioration in our ability to locate and contact the GW veterans who were not sampled in the initial panel. Temporal Structure for Survey Studies Given the dual goals in GWVHS to study both levels and changes for GW veterans' health status, it is important to consider the structure of the survey study design both in terms of the units (individuals) to be surveyed and the time(s) those units are to be surveyed. We discuss in this section the candidate designs, the pros and cons for those designs, and specific considerations for the GWVHS. Taxonomy of Survey Studies The temporal structure for survey study design can be classified according to the following taxonomy,3 listed in order of increasing emphasis on temporal versus unit-specific data collection. 2   Starting the second panel a year after the end of the first panel, instead of following the regular 3-year interval between waves, would facilitate the opportunity to "splice" across the panels. 3   Similar taxonomies and discussions about the trade-offs among those designs are given in Duncan and Kalton (1987), Bailar (1989), and Kalton and Citro (1993). General discussions about design and analysis issues for panel studies are also given in Duncan et al. (1984) and Hsiao (1985, 1986).

OCR for page 96
1.   Single cross-sectional survey. A cross-sectional sample of observation units is identified at one time point and surveyed once. This design only allows for the estimation of level parameters, such as the prevalence of medically unexplained physical symptoms, at the specific time point, and usually does not allow for the assessment of prospective changes over time. The survey might include retrospective data items to inquire about past events; thus, it might provide some information on past changes, although the reliability of the retrospective data might be compromised. Additionally, the validity of retrospective data might be affected by mortality and other forms of exit from the target population. For example, in order for patients who experienced a life-threatening disease to be available to report on past changes, they must have survived the disease. Therefore the survival rate estimated from the retrospective data is likely to be very biased. 2.   Repeated cross-sectional surveys without unit overlap. A cross-sectional sample of observation units is identified at each of several time points; each sample is surveyed once. The samples are drawn with no provisions for overlap; a few overlap cases might occur if the samples are drawn with replacement and the sampling rates are sufficiently high. This design allows for the estimation of level parameters at each of the selected time points, as well as the average of the level parameters over time. In addition, this design also allows for the estimation of net changes in the population parameters over time, such as the increase or reduction of the prevalence of medically unexplained physical symptoms over time. It usually does not allow for the assessment of individual gross changes, such as the persistence of these unexplained symptoms, whether the same patients are affected over time, and the amount of turnover. (Retrospective inquiries might provide some information, but might be of compromised quality.) 3.   Repeated cross-sectional surveys with unit overlap. This design is similar to (2), with the exception that a portion of each subsequent sample is drawn from the previous sample. In other words, the membership in the previous sample is used as a stratifying variable in the subsequent sample, with the units in the previous sample being oversampled relative to the new units. This design has similar capabilities as (2); it also allows for the estimation of gross changes on the individual level, using the portion of the sample that overlaps with the previous sample. 4.   Repeated panel surveys with temporal overlap. This design is similar to (2), with the exception that each sample is surveyed several times, usually at regular time intervals, thus each sample serves as a panel. The temporal spans for the panels overlap in time: a new panel is initiated before the previous panel has retired, thus several panels are usually active in the field at the same time. This design is essentially the same as the rotating panel survey design. (There are some fine distinctions between those two designs, but the similarity dominates their differences.) For each time point, the level parameters can be estimated using the panels active at the time, usually including a new panel (just initiated) and several ongoing panels. The quality of the ongoing panels for the level estimates might be compromised by panel artifacts such as attrition and

OCR for page 96
panel conditioning, to be discussed later. (Similar compromise might also occur in [3] among the overlap portion of the subsequent samples.) However, the presence of multiple panels in different stages of progression might help mitigate some of those problems. On the other hand, this design usually provides more precise estimates on the changes (both net and gross) than the previous designs. 5.   Repeated panel surveys without temporal overlap. This design is similar to (4), with the exception that the panels do not overlap in time: a new panel is initiated after the previous panel is retired. (Thus the temporal span for each panel is distinct and does not overlap with other panels.) Each panel allows the estimation of level parameters at the time of its initiation, as well as the estimation of those parameters at each follow-up. Similar to (4), this design might also be vulnerable to attrition and panel conditioning, and is likely more vulnerable than (4) at follow-up waves because there are no other panels active at the same time to help mitigate those problems. 6.   Permanent panel survey. A single sample is drawn at one specific time point, then followed for the entire duration of the study. This design provides more information on changes, especially on long-term gross changes on the individual level. (The ability of [4] and [5] to provide direct information on long-term gross changes is limited by the duration of the individual panels. It might be possible to "splice" distinct panels to assess long-term gross changes. This usually requires strong assumptions such as the Markovian properties on the nature of the gross changes.) On the other hand, this design is more vulnerable than (4) and (5) to attrition and panel conditioning. 7.   Time series study. A single unit is chosen and followed intensively for the entire duration of the study. This design will provide the most intensive information on gross changes, and practically no information on level parameters. There are many possible variations on and combinations of the designs listed above. For example, a study focused on a specific disease or condition might follow all cases with the condition to assess their trajectories, and follow a subsample only for the noncases to assess the incidence rates. Pros and Cons for Alternative Study Designs Given the dual goals for GWVHS to obtain both level and change estimates, designs (1) and (7) should be ruled out. Among the remaining candidate designs (2)-(6), the sequence in which they appear in the above section on Taxonomy is ranked in increasing order of the emphasis on repeated measurements on the same individuals, that is, the overlap of the cohort over time. Generally speaking, the more overlap there is across time in the units surveyed, the more information is available on estimating changes. This premise is self-evident for gross changes on the individual level: we observe gross changes

OCR for page 96
only among the units that overlap in tim.4 The permanent panel survey is therefore the preferred design for studies focused on gross changes. For net changes, the overlap is usually viewed as a plus because the same individuals serve as their own controls over time. This is illustrated in the following simple model: Yit = α + tβ + θi + εit, i = 1, . . ., n; t = 0,1, where Y denotes the outcome of interest, α denotes the baseline population status, β denotes the net change of interest, θ denotes the time-invariant individual variation, and ε denotes the temporal random error. With less overlap, such as in repeated cross-sectional surveys, distinct individuals are observed at times 0 and 1. The net change is estimated using the difference between the two sample means: , where the subscripts 0 and 1 refer to the distinct cross-sectional samples. The uncertainty in the estimated net change includes both the sampling error in the cross-sectional samples, θ, and the temporal random error, ε. With more overlap, such as in panel surveys, the same individuals are observed at times 0 and 1, thus the sampling error θ is cancelled when we compare times 0 and 1, resulting in a more precise estimate for the net change.5 For estimating levels, the overlap is usually a disadvantage.6 More specifically, consider the estimation of the average level of Y across the population and also over time. This is usually estimated using the grand mean of all observed Yij's. With repeated cross-sectional surveys, the sampling error variance is reduced by a factor of 2n, because two distinct (and independent) samples are drawn at times 0 and 1. For panel survey, the sampling error variance is reduced by a factor of n, because the same sample is used at times 0 and 1. Therefore the 4   It is possible to use repeated cross-sectional surveys to estimate gross change parameters such as survival rates under various assumptions. The precision for those estimates is usually substantially lower compared to those obtained using panel surveys. 5   For simplicity of illustration, we restricted ourselves here to a two-wave design and an analysis of changes. With more waves of data collected on the same participants, a rich variety of longitudinal analysis (time trend analysis, growth curve analysis) can be applied (see, e.g., Hsiao, 1985, 1986; Diggle et al., 1994). Discussions on the pros and cons of cohort designs versus repeated cross-sectional designs for community intervention studies are also given in Diehr et al. (1995) and Gail et al. (1996). 6   An important implication is that the less overlapped designs such as the repeated cross-sectional surveys without unit overlap are more capable of identifying cases with rare attributes such as rare disease conditions.

OCR for page 96
estimate based on the panel survey is less precise.7 This comparison is important, for example, for detecting persistent rare conditions: the chance for detecting such conditions is much higher with repeated cross-sectional surveys than panel surveys (because more individuals are surveyed). A related advantage for the panel survey design is its ability to control for time-invariant unobserved confounding factors. As an illustration, consider the following extension of the earlier model: Yit = α + tβ + xitγ + wiδ + θi + εit, i = 1,. . ., n; t = 0,1, where x denotes the predictors of interest, and w denotes unobserved confounding factors (assumed to be time-invariant). If the confounding factors w were observed, it would be possible to control for them in cross-sectional data. With panel data, it is possible to control for unobserved time-invariant confounding factors by taking the difference across time, resulting in the following difference model: Δi = Yi1 - Yi0 = β + (xi1 - xi0)γ + (εi1 - εi0), i = 1,. . ., n. We then regress the change in Y (Δ) on the change in x. Assuming that the confounding factors w are time-invariant, they would be cancelled out in the difference model. It should be noted, though, that the ability of the difference model to control for confounding factors and estimate the effects of interest depends critically on the temporal variation in the predictors of interest. If the predictors x do not vary over time, the difference model does not allow us to estimate their effects. Even if the predictors x do vary over time, the precision for the estimated effects might be poor if the temporal variation in x is small. Another important advantage of the panel design is that it might help improve the quality of the recall by using the events observed in the earlier waves to bound the time frame of recall in future waves (see, e.g., Neter and Waksberg, 1964). However, since the anticipated between-wave lags are much longer than the recall period for GWVHS, this feature is unlikely to be applicable. Further discussions on bounding the time frame and related recall error issues can be found in the section, Measurement Error. While the panel design has much merit, it also has some limitations. One important limitation is that the panel design can be especially vulnerable to nonresponse. Nonresponse is an important limitation to all survey studies under both cross-sectional and panel designs. Almost all survey studies fail to obtain complete data on some sampled subjects due to various reasons: some subjects cannot be located or reached, some are too sick to be interviewed, some refuse to be 7   This comparison assumes the same sample size under the two designs. This is somewhat unfair: the sample size available under the panel design will likely be larger due to its lower cost per person-wave.

OCR for page 96
interviewed. It is usually plausible that the nonrespondents are different from the respondents in terms of the attributes of interest. For example, Groves and Couper (1998) reported that households with many members and households with elderly persons or young children are easier to contact, urban households are more difficult to contact than rural households; once contacted, those in military services, racial and ethnic minorities, and households with young children or young adults are more likely to cooperate with the surveys. Given the potential for respondents to differ from nonrespondents, the analyses based on the respondents might provide biased estimates for the target population. The severity of the nonresponse bias is usually associated with the nonresponse rate (see, e.g., Kish, 1965, Section 13.4B). If the nonresponse rate is low, say, less than 10%, the nonresponse bias is likely to be small and negligible. If the nonresponse rate is high, say, more than 30%, there is a potential that the nonresponse bias might be serious, thus the conclusions based on the respondents might be flawed. There are a number of statistical and econometric techniques that can be used to mitigate the impact of nonresponse, such as nonresponse weighting, (multiple) imputation, pattern mixture modeling, and selectivity modeling. The section Nonresponse, Attrition, Tracking, and Tracing provides further discussion. While nonresponse is usually a limitation for both cross-sectional and panel designs, it is usually a bigger problem for panel designs because the nonresponse can accumulate over time. A panel study that is designed and implemented well usually holds the attrition over time to a very low level, such as 5 to 10 percent in each wave. Furthermore, some sampled subjects who did not respond to an early wave might be "resurrected" in a later wave. However, the nonresponse usually accumulates across waves, and reaches a substantial level after multiple waves. For example, the wave nonresponse accumulated to 19% at wave 7 in the 1987 Survey of Income and Program Participation (SIPP) panel (Jabine et. al., 1990), one of the exemplary panel studies. Therefore, the potential for nonresponse bias becomes more severe in later waves.8 A related limitation for the panel design is the omission of new members in the target population, due to birth, enrollment, immigration, and so on. While the panel was designed to be representative of the target population at the beginning of the study, the panel ages over time and does not represent the new members. Therefore the representativeness of the panel becomes compromised in subsequent waves, both because of the omission of new members, and because of the cumulative attrition discussed earlier. Some panel studies refresh the sample by adding a sample of new members who joined the target population since the original sample was drawn. This can be costly unless there is an easy way to identify the new members. However, since the GW veterans are a closed population, there are no new members entering the target population, thus the omission of new members is not an issue. 8   The ability of the statistical techniques to mitigate the potential nonresponse bias also improves over time with the panel design, because we have more data on the sampled subjects who responded to earlier waves and drop out in a later wave.

OCR for page 96
Another important limitation in the panel design is panel conditioning: the observed responses might be affected by the participation in the panel, thus compromising the validity of the data obtained in later waves. Further discussions on panel conditioning can be found in the section, Measurement Errors Specific to Panel Surveys. Gulf War Veterans Health Study Design Repeated panel surveys with temporal overlap (rotating panel surveys) is the commonly used compromise when both levels and changes are of interest. However, this design requires recruiting new cohorts from the target population regularly; thus, it might not be appropriate for the GWVHS. Given the anticipated deterioration of the quality of the DoD records data on the GW veterans, the recruitment cost is likely high for the GW veteran population, requiring substantial tracking and tracing efforts. In order to economize the design, it would be desirable to reduce the need to recruit new cohorts. Based on those considerations, either a permanent panel design or repeated panel surveys without temporal overlap would be the preferred choice for GWVHS, to avoid conducting costly recruitment on a regular basis. The permanent panel design has the advantage that it allows the direct assessment of long-term gross changes on the same individuals—with the repeated panel surveys without temporal overlap, we need to "splice" the trajectory from different panels to assess changes across waves that fall under different panels. However, the permanent panel design will be more vulnerable to cumulative attrition and panel conditioning. Therefore, a promising design also worth considering for the GWVHS is repeated panel surveys without temporal overlap. The study should review the quality of the first panel after the third wave for the initial panel,9 to determine the extent to which the validity of the inference based on the panel is compromised by cumulative attrition and panel conditioning.10 If the quality of the panel is judged to be satisfactory, the study would continue following the same panel. If the validity is judged to be unsatisfactory, the study would switch to a new panel. If the validity is judged to be marginal, it is con 9   Cumulative attrition usually levels off after the first two or three waves in the existing panel studies. Therefore it is reasonable to conduct the assessment after the third wave, and assume that further attrition will be small. It is reasonable to assume that panel conditioning will also level off after two or three waves, although there is less empirical evidence. 10   The assessment of cumulative attrition is straightforward. The assessment of panel conditioning is more involved, and requires differentiating the true changes over time from panel conditioning. Ideally, the assessment should use a new sample, and compare the distribution of outcome measures between the new sample and the on-going panel. However, since we anticipate that panel conditioning is unlikely to have a major impact on the GWVHS participants, it might not be worthwhile to devote a substantial amount of resources to conduct this assessment.

OCR for page 96
ceivable that a hybrid design analogous to a rotating panel design could be used, continuing to follow a random subsample of the initial panel, and drawing a new panel to make up for the discontinued portion of the initial panel. In order to facilitate the recruitment of a second panel if it is warranted, it is worth considering that a "reserve" sample be recruited at the same time of the initial sample. The "reserve" sample will be enrolled into the study, and given a brief survey to collect the contact information.11 This sample will then be sent into "hibernation," and will be reactivated if a decision is reached later to recruit a second panel. While the "reserve" sample is in "hibernation," we will maintain tracking to make it feasible to reactivate this sample if needed.12 This provision will require a nontrivial amount of resources, but will guard against the risk that the contact information in the DoD records will deteriorate further during the tenure of the initial panel, making it impossible to recruit a second panel. If the "reserve" sample is implemented successfully, the rotating panel design might be a viable option. We can activate a third of the "reserve" in waves 2, 3, and 4, respectively, and retire a corresponding portion of the original panel. The maintenance cost for the "reserve" sample will be lower under the rotating panel design, because the size of the "reserve" sample is reduced over time. On the other hand, the recruitment cost is still likely to be substantial even with the "reserve" sample, therefore it might be more economical to activate the "reserve" sample in one lump sum instead of in pieces. There are important analytic trade-offs between those designs. With the repeated panel design without temporal overlap, the entire panel is available in the first three waves, allowing more precise estimates for changes (both net and gross). Furthermore, this design allows the option of switching to a permanent panel design (without activating the "reserve" sample), either in part or in full, if warranted. However, we cannot estimate gross changes between the two panels, say, between the third and fourth waves. On the other hand, the rotating panel design includes less overlap across the first three waves; thus, it provides less precision for estimating changes (especially gross changes) among those waves. However, it does allow for the estimation of gross changes in later waves, say, between the third and fourth waves. As discussed in the Introduction to this appendix, the public health focus of the GWVHS indicates that it is more important to assess changes across the first three waves (repeated panel surveys without temporal overlap is preferable for those objectives), than assessing changes that occur in later waves (rotating panel design is preferred for those objectives). See Survey Frequency for further discussion. 11   We might as well collect a minimal amount of health status data at the same time the contact information is collected. 12   Since the "reserve" sample will be in "hibernation" for many years before reactivation, the tracking should be conducted in a cost-effective way, using the low-cost procedures only. A more comprehensive tracing effort will be conducted when the sample is reactivated.

OCR for page 96
subsequent follow-ups, and so on. This practice is usually based on the expectation that the nonrespondents are unlikely to convert into respondents in future waves.17 While this expectation is not unreasonable, this practice usually results in a substantial accumulation of nonresponse across waves. In order to reduce the level of cumulative attrition, we believe it would be appropriate to make efforts to survey the nonrespondents to earlier waves (including the baseline nonrespondents), unless they have given explicit instructions not to be contacted again. There are a number of procedures commonly used in panel studies to reduce nonresponse, namely, tracing and tracking (see, e.g., Burgess, 1989).18 Given the anticipated difficulties in recruiting the GW veterans, those procedures are important to help strengthen the quality of the GWVHS data. Tracing is in essence looking for a missing person using available information. Prospective tracing is necessary at the baseline to locate the participants who cannot be located using the record data from DoD information. Retrospective tracing is necessary for participants who were surveyed in an earlier wave but could not be located at a subsequent follow-up wave. A variety of public information sources are usually used for tracing, such as telephone directories, credit records, property records, court records, mortality records, (to identify deceased sampled subjects), and so on. It is important in the tracing procedure to verify the identity of the subjects located, to avoid false identification. Customized tracing procedures can also be utilized, such as visiting the subject's prior residences, neighbors, and known and possible associates. Those procedures are labor intensive, and thus are likely to be too costly for the national scope of the GWVHS sample. It is conceivable that those procedures can be utilized for participants clustered in a limited number of geographical areas if a (partially) clustered design is used for the GWVHS. While tracing is used to locating the missing subjects, tracking is used to maintain contact with subjects already located. During the baseline survey, the interviewer collects contact information (both primary and secondary) from the participants, to help locate the participants for subsequent follow-up surveys. The contact information is usually updated during each follow-up survey. For studies with infrequent survey frequencies, additional tracking procedures are usually deployed to maintain contact with the participants between waves. This includes sending postcards, birthday cards, and newsletters to the participants at regular intervals, requesting postal notification of change of address, and re- 17   Another reason for the "monotonic follow-up" might be the anticipation that the cases with incomplete data will be eliminated from the analysis. While this might be true for the way longitudinal data was analyzed, the analysis techniques developed in recent years, such as multilevel modeling, do not require all respondents to be observed at the same time points; thus the cases missing some waves can be used in the analysis under appropriate assumptions. 18   We distinguish tracking as following those with whom we have active contact, and tracing as locating those with whom we do not have active contact.

OCR for page 96
questing the participants to submit change-of-address information to the study (an incentive is usually offered to encourage the participants to provide this information). In addition to updating the contact information, some of those procedures (e.g., birthday cards and newsletters) might also enhance the goodwill among the participants, so as to facilitate their cooperation at subsequent followup surveys. Given the long lag between waves for GWVHS, additional procedures can be utilized to help maintain the contact information. One possibility is to conduct "light-duty" tracing procedures (such as retrieving easy-to-access public records) on the participants regularly. Burgess (1989) recommended that "If the intersurvey period is five years, . . . it may be more cost-effective to trace a person five times over five years than once after five years." Another possible procedure is to make brief telephone contacts with participants between waves, to greet the participants (hopefully to enhance the goodwill), and to request updates on contact information. It is conceivable that more intensive tracing procedures can also be used between waves to maintain the contact information. Those procedures are more costly, therefore it might be appropriate to restrict them to participants anticipated to be more difficult to follow, such as those who were difficult to locate during an earlier wave. Measurement Error Empirical data are almost always subject to measurement error. Survey data are no exception. Some types of measurement errors are general, and apply to both cross-sectional and panel surveys. Some types of measurement errors are specific to panel surveys. We discuss both types of measurement errors, and remedies that can help mitigate problems resulting from those measurement errors (see, e.g., Bailar, 1989; Groves, 1989; Groves and Couper, 1998; Kish, 1965, Chapter 13; and Lessler and Kalsbeek, 1992). General Measurement Error Issues Part of the measurement error can be attributable to the respondent. The respondent might intentionally provide an inaccurate response to a survey question. For example, the respondent might intentionally provide a socially desirable response, or refuse to report a stigmatized condition. The inaccurate response might also be given unintentionally, because the respondent does not have the necessary information, or does not want to make the effort to compile the necessary information. Part of the measurement error can be attributable to the interviewer—this is applicable to face-to-face and telephone interviews delivered by an interviewer, it is not applicable to self-administered mail surveys. For example, the interviewer might not accurately follow the branching logic to deliver the appropriate

OCR for page 96
survey questions to the respondent; might not convey the survey question clearly to the respondent, might not guide and motivate the respondent to compile and process the information necessary to provide accurate responses, might record the respondent's responses erroneously, or might not be alert in identifying inconsistencies in the respondent's responses and request the respondent to confirm the responses. In the worst scenario, the data might be forged in part or in its entirety by the interviewer. Part of the measurement error can be attributable to the survey instrument. For example, the branching logic in the survey instrument might be inappropriate, leading the respondent to miss applicable questions; the survey questions might not be organized in a user-friendly sequence to make it easy for the respondent to compile and process the information accurately; the wording of the survey question might not be cognitively clear, resulting in confusion and misinterpretation by the respondent; the response categories might not be defined clearly (mutually exclusive and exhaustive) for the respondent to classify his or her status according to the given categories. Finally, part of the measurement error might be attributable to data processing subsequent to the interview, such as data entry errors, coding errors, secondary errors introduced in data editing, errors in matching records, and so on. The nature of measurement error can usually be classified into systematic error and random error. The quality of survey responses is usually characterized using validity and reliability: validity measures the level of systematic error, reliability measures the level of random error. We use the term "accuracy" below to refer to the combination of validity and reliability. The level of measurement error can be evaluated using a number of techniques, such as test-retest (to assess reliability), comparisons with alternative data sources such as records data (to assess validity), and so on. Systematic error occurs when similar measurement error persists across multiple waves of surveys, and/or when similar measurement error occurs across respondents. For example, the respondents might overreport outpatient medical visits systematically. Systematic error usually leads to bias in estimated population parameters such as the prevalence for a disease condition or the average level of outpatient service use. Random error usually varies over time across multiple waves of surveys, and/or varies across respondents. For estimating aggregate population parameters such as the disease prevalence or average service use, random error usually results in reduced power and precision, but does not result in bias. However, random error might result in the overestimation for individual level gross changes. There are many techniques and procedures that can be used to mitigate measurement error in the survey data. We describe several below. Many of the general sources of measurement errors can be mitigated with computer-assisted survey techniques. For example, the computerized survey instrument usually incorporates built-in branching logic, thus avoiding interviewer mistakes in following the branching logic. It is of course still crucial that the branching logic be designed accurately and programmed accurately. Data

OCR for page 96
entry errors are essentially eliminated in computerized surveys, to the extent the interviewer records the respondent's responses accurately. Thorough interviewer training and monitoring is essential to mitigate measurement error. In addition, the match between the interviewer and the respondent can help improve the rapport for the interview, such as the match in race and ethnicity or the use of HIV-positive interviewers in surveys of HIV-positive respondents. In-depth cognitive testing of the survey instrument can be used to identify ambiguities in the wording of the survey questions and response categories. The results of the cognitive testing can be used to revise the instrument, improve the clarity, and reduce the measurement error. Similar laboratory-based testing of other design features about survey questions and instruments, such as the sequential order of survey questions in an instrument, can also help address potential measurement error issues. An alternative to laboratory-based testing prior to the deployment of the survey is to include substudies in the survey study to assess important measurement error issues. As an example, the RAND Health Insurance Experiment (Newhouse et al., 1993) included a substudy on the frequency of health reports, randomizing the respondents to various levels of reporting frequency, to assess the potential that the health report might prompt the respondents to seek medical care. Some sources of measurement error can be mitigated with the appropriate choice of survey modality. For example, audio-assisted interview can be incorporated into the face-to-face modality for sensitive and stigmatized topics to reduce the respondent's concern about providing socially undesirable responses to the interviewer. Sometime a randomized response design (Horvitz, 1967) is used, in which the respondent's response is randomized to help alleviate his concerns. For survey questions that inquire about the respondent's past experience or future anticipation, the level of measurement error is determined by the reference period (see below). Therefore it is important to make an effort to choose an appropriate reference period to reduce the measurement error. Reference Period The nature of the measurement error for a specific attribute usually depends on the time frame for the trait being measured. Some traits are usually time-invariant, such as birth year, gender, race, and ethnicity. Time frame is usually irrelevant for those traits. Some traits vary over time, therefore the specific measure needs to take the time frame into consideration. Some measures are specific to the current status and thus can be viewed as snapshots, such as current health status (excellent, good, fair, poor), current marital status, and current employment status. Some measures inquire about events that occurred during the specified time interval (the reference period ), such as the number of outpatient medical visits during the last 6 weeks. Some measures inquire about the

OCR for page 96
accumulation or the central tendency of a time-variant trait over a reference period, such as the household income during the calendar year 1998 (accumulated over time), the number of cigarettes smoked each day (on the average) over the last 6 weeks, and the level of satisfaction with the primary care provider during the last 6 weeks (presumably the central tendency over this time interval). Most survey questions are retrospective and inquire about reference periods that occurred in the past. Some might be prospective and inquire about the respondent's anticipation for the future. The reference period might be a fixed time interval determined by the calendar (such as the calendar year 1998), a fixed time interval defined relative to the time of the interview (the last 6 weeks, the next 2 weeks), a time interval defined relative to an easily recognized milestone (since the last interview, since the most recent discharge from a hospital, until the anticipated surgery). Note that the duration for the milestone-based reference periods might vary from respondent to respondent; it might even be unknown (for prospective milestones). The analysis needs to take those variations into account—there is more chance for events to occur in a longer reference period. A special type of milestone-based reference period is the lifetime experience (since the respondent's birth), or lifetime anticipation (till the respondent's death).19 The nature of the reference period for a specific survey measure has important implications on the measurement error (see, e.g., Bailar, 1989; Neter and Waksberg, 1964). The respondent might misclassify the time for specific events relative to the reference period, resulting in telescoping (inclusion of events that occurred outside the reference period) and omission of events that occurred inside the reference period. Omission can also occur irrespective of the reference period: the reason for the omission might be the respondent's failure to recognize or report a specific event, rather than the respondent's failure to classify the time for the event accurately. "Fabrication" of nonexistent events can also occur irrespective of the reference period. The accuracy of the survey response usually decreases with the length of the reference period: both telescoping and omission are more likely to occur when the respondent is required to recall or anticipate events distant in the past or the future. Exceptions to this general rule might occur if the longer reference period is easier for the respondent to recognize. For example, it might be easier for the respondent to report taxable income for the calendar year 1998 (the available information is likely organized by calendar year) rather than for the last 6 months (the respondent might have difficulties determining whether specific payments were received within or prior to the last 6 months). The presence of milestones might also help the respondent to respond accurately, even though the reference period might be longer than an alternative shorter fixed reference period. 19   The analysis of lifetime experience data needs to take the duration into consideration. For example, the lifetime prevalence for a specific disease condition is likely lower for respondents in their 30s than respondents in their 40s, because the latter group had more time to develop the condition.

OCR for page 96
The appropriate reference period to be used in a survey question depends on the trait being inquired. For major events such as hospitalization (to be more specific, discharge from a hospital), the accuracy of the respondent's recall usually remains high even for long time periods of 6 months or a year. For less "impressive" events such as outpatient visits, the accuracy might deteriorate substantially beyond a few weeks. In addition to the accuracy of the survey measure, the choice of the appropriate reference period should also take into consideration the sampling variation associated with the time frame. In the absence of measurement error, the statistical information in the survey measure increases with the length of the reference period. In a sense, the effective sample size should be measured in terms of person-time. The longer reference period allows more events to be accumulated, thus provides more information. For example, it is conceivable that we can obtain nearly perfect reporting of hospital discharges during the last 7 days. However, very few respondents experienced hospital discharges during such a short reference period, therefore the precision of the estimated rate of hospitalization will be poor due to the high sampling error. The rate of hospitalization needs to be defined relative to time, such as the number of hospital discharges per thousand person-years. A sample of a thousand individuals asked about a 7-day reference period only contributes about 20 person-years' worth of data. The same sample asked about a 12-month reference period will contribute a thousand person-years' worth of data. The latter design might be preferable even if the measurement error might be larger with the 12-month reference period. The ultimate choice of the time interval needs to be based on the trade-off between the reduction in the sampling error and the increase in measurement error due to the use of the longer time interval. Most data items to be used in the GWVHS will likely be standard, with known properties in the quality of the recall measures. If some recall data elements are new, or if there are concerns about the recall properties among the GW veterans for some existing data elements, one might consider a substudy to use different recall periods for randomly partitioned subsamples, say, inquire about 3 months for a random subsample, and 6 months for others. The consistency between the two versions of the survey instrument can then be tested using the two subsamples. Measurement Error Issues Specific to Panel Surveys One of the advantages in using panel surveys is that the previous interview and/or events reported during the previous interview can be used as milestones to bound the reference period, to help improve the accuracy of the respondent's recall. This advantage is applicable when the reference period coincides with the lag between successive waves of surveys. However, this advantage is unlikely to be applicable to the GWVHS. The anticipated lag time between waves for GWVHS is much longer than the reference periods appropriate for most health

OCR for page 96
outcome measures, therefore it is unlikely that prior interviews can be used as milestones to bound the reference period for subsequent interviews. A unique measurement error issue for panel surveys is panel conditioning: the observed responses might be affected by the participation in the panel, thus compromising the validity of the data obtained in later waves (see, e.g., Bailar, 1975, 1989; Cantor, 1989; Corder and Horvitz, 1989; Holt, 1989; Presser, 1989; Silberstein and Jacobs, 1989; and Waterton and Lievesley, 1989). There are a number of possible interpretations for the panel conditioning. First, the participation in the panel might affect the respondents' actual behavior. For example, the survey might serve as a prompt for the participants to attend to their health care needs. Under this scenario, the survey responses in the subsequent waves might reflect the actual behavior and its consequences, but the behavior might not be representative of what would take place in the general population in the absence of the earlier survey. The potential for the participation effect is especially important if physical examination is conducted on a subsample of the GWVHS participants: the physical examination might reveal a health condition that requires medical care, thus having an impact on the health status for the participants in this subsample. The impact might be both short-term and long-term: the medical care received might affect the trajectory of the health status. Second, the participants might learn from earlier waves that certain "trigger" items would lead to additional items; they might avoid the burden by responding to the "trigger" items negatively in future waves to avoid the additional items. Under this scenario, the survey responses in the subsequent waves would be biased towards underreporting of the "trigger" conditions. Third, the participants might learn from earlier waves what is the information required for the survey; thus, they are more capable of compiling and processing the information required to provide accurate responses to the survey questions. Under this scenario, the panel conditioning will reduce the measurement error. The presence of panel conditioning is easy to detect under the rotating panel design. For each wave of survey, we have respondents at various levels of "seniority" on the panel: some are new, some have had some experience on the panel in earlier waves, some have completed their tenure on the panel and are ready to retire from the panel. We can therefore compare the responses given by respondents at various levels of "seniority" on the panel to assess the presence of panel conditioning. (It is important, though, to control for attrition in those comparisons.) If panel conditioning is judged to be important, the rotating panel design should be considered to make it easy to address panel conditioning. It is much more difficult to assess panel conditioning with either the permanent panel design or repeated panel surveys without temporal overlap. The comparison across waves cannot be used to assess panel conditioning because it is confounded with true changes over time. It is conceivable that some comparisons with records data can be made, maybe for a subsample, to assess the meas-

OCR for page 96
urement error due to panel conditioning. This will not address the impact of panel conditioning on actual behavior. It is possible to assess panel conditioning by using a substudy that varies the follow-up frequency. For example, we can take a random subsample and interview them at a higher frequency, say, annually, to compare their responses to the responses in the rest of the sample. This might be too costly to be worthwhile. Of course we also obtain more data in the subsample; thus, we might be able to reduce the overall sample size. The lag between waves is anticipated to be fairly long for the GWVHS. Therefore it is plausible that panel conditioning is unlikely to happen, except for the long-term impact of the physical examination. Therefore, we should focus the assessment of panel conditioning on the impact of the physical examination, and place a low priority on the other components of panel conditioning. More specifically, if we do not detect a long-term impact due to the physical examination, it would be reasonable to assume the absence of other components of panel conditioning. If we do detect a long-term impact due to the examination, we might need to consider either rotating the panel or switching to a new panel. If physical examination is to be conducted in the GWVHS, it should be designed as a randomized substudy for GWVHS, with a random subsample assigned to receive the examination. It will then be easy to assess the long-term impact of the examination, by comparing the health status in the subsample versus the rest of the sample. References Armitage, P. 1960. Sequential Medical Trials. Springfield, Illinois: Thomas. Bailar, B.A. 1989. Information Needs, Surveys, and Measurement Errors. In: Panel Surveys, (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 1–25. Bailar, B.A. 1975. The Effects of Rotation Group Bias on Estimates from Panel Surveys. Journal of the American Statistical Association 70(349):23–30. Brick, J.M., and Kalton, G. 1996. Handling Missing Data in Survey Research. Statistical Methods in Medical Research. 5:215–238. Burgess, R.D. 1989. Major Issues and Implications of Treating Survey Respondents. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 52–75. Cantor, D. 1989. Substantive Implications of Longitudinal Design Features: The National Crime Survey as a Case Study. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 25–51. Coad, D.S., and Rosenberger, W.F. 1999. A Comparison of the Randomized Play-the-Winner Rule and the Triangular Test for Clinical Trials with Binary Responses. Statistics in Medicine 18:761–769. Copas, A.J., and Farewell, V.T. 1998. Dealing with Non-Ignorable Non-Response by Using an "Enthusiasm-to-Respond" Variable. Journal of the Royal Statistical Society, Series A 161(3):385–396.

OCR for page 96
Corder, L.S., and Horvitz, D.G. 1989. Panel Effects in the National Medical Care Utilization and Expenditure Survey. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 304–319. Day, N.E. 1969. Two-Stage Designs for Clinical Trials. Biometrics 25:111–118. Diehr, P., Martin, D.C., Koepsell, T., Cheadlee, A., et al. 1995. Optimal Survey Design for Community Intervention Evaluations: Cohort or Cross-Sectional? Journal of Clinical Epidemiology 48(12):1461–1472. Diggle, P.J. 1989. Testing for Random Dropouts in Repeated Measurement Data. Biometrics 45:1255–1258. Diggle, P.J., and Kenward, M.G. 1994. Informative Drop-Out in Longitudinal Data Analysis. Applied Statistics 43(1):49–93. Diggle, P.J., Liang, K-Y., and Zeger, S.L. 1994. Analysis of Longitudinal Data. Oxford: Clarendon Press. Duncan, G.J., and Kalton, G. 1987. Issues of Design and Analysis of Surveys across Time. International Statistical Review 55(1):97–117. Duncan, G.J., Juster, F.T., and Morgan, J.N. 1984. The Role of Panel Studies in a World of Scarce Research Resources. In: The Collection and Analysis of Economic and Behavior Data (Eds. S. Sudman and M.A. Spaeth). Champaign, Ill.: Bureau of Economic and Business Research & Survey Research Laboratory. Pp. 94–129. Gail, M.H., Mark, S.D., Carroll, R.J., Green, S.B., and Pee, D. 1996. On Design Considerations and Randomization-Based Inference for Community Intervention Trials. Statistics in Medicine 15:1069–1092. Groves, R.M. 1989. Survey Errors and Survey Costs. New York: John Wiley. Groves, R.M., and Couper, M.P. 1998. Nonresponse in Household Interview Surveys. New York: John Wiley. Heckman, J.J. 1979. Sample Selection Bias as a Specification Error. Econometrica 47:153–161. Heckman, J.J., and Robb, R. 1989. The Value of Longitudinal Data for Solving the Problem of Selection Bias in Evaluating the Impact of Treatments on Outcomes. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 512–539. Hedeker, D., Gibbons, R.D., and Waternaux, C. 1999. Sample Size Estimation for Longitudinal Designs with Attrition: Comparing Time-Related Contrasts Between Two Groups. Journal of Educational and Behavioral Statistics , in press. Hirano, K., Imbens, G.W., Ridder, G., and Rubin, D.B. 1998. Combining Panel Data Sets with Attrition and Refreshment Samples, NBER Technical Working Paper No. 230. Pp. 1–37. Holt, D. 1989. Panel Conditioning; Discussion. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 340–347. Horvitz, D.G., Shah, B.V., and Simmons, W.R. 1967 The Unrelated Question Randomized Response Model. American Statistical Association: Proceedings of the Social Statistics Section, Pp. 67–72. Hsiao, C. 1986. Analysis of Panel Data. New York: Cambridge University Press. Hsiao, C. 1985. Benefits and Limitations of Panel Data. Economic Reviews 4(1):121–174. Jabine, T.B., King, K.E., Petroni, R.J. 1990. Survey of Income and Program Participation Quality Profile. Bureau of the Census, U.S. Department of Commerce, Washington D.C. Kalton, G. 1986. Handling Wave Nonresponse in Panel Surveys. Journal of Official Statistics 2(3):303–314.

OCR for page 96
Kalton, G., and Citro, C.F. 1993. Panel Surveys: Adding the Fourth Dimension. Survey Methodology 19(2):205–215. Kish, L. 1965. Survey Sampling. New York: John Wiley. Kyriazidou, E. 1997. Estimation of a Panel Data Sample Selection Model. Econometrica 65:1335–1364. Lai, T.L., Levin, B., Robbins, H., and Siegmund, D. 1980. Sequential Medical Trials. Proceedings of the National Academy of Sciences USA 77(6):3135–3138. Laird, N.M. 1988. Missing Data in Longitudinal Studies. Statistics in Medicine 7:305–315. Lepkowski, J.M. 1989. Treatment of Wave Nonresponse in Panel Surveys. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 348–374. Lessler, J.T., and Kalsbeek, W.D. 1992. Nonsampling Error in Surveys. New York: John Wiley. Little, R.A. 1993. Pattern-Mixture Models for Multivariate Incomplete Data. Journal of the American Statistical Association 88:125–134. Little, R.A. 1994. A Class of Pattern-Mixture Models for Normal Incomplete Data. Biometrika 81(3):471–483. Little, R.A., and Rubin, D.B. 1987. Statistical Analysis with Missing Data. New York: John Wiley. Maxwell, S.E. 1998. Longitudinal Designs in Randomized Group Comparisons: When Will Intermediate Observations Increase Statistical Power? Psychological Methods 3(3):275–290. McHorney, C.A., Kosinski, M., and Ware, J.E. 1994. Comparisons of the Costs and Quality of Norms for the SF-36 Health Survey Collected by Mail versus Telephone Interview: Results from a National Survey. Medical Care 32(6):351–367. Neter, J., and Waksberg, J. 1964. A Study of Response Errors in Expenditure Data from Household Interviews. Journal of the American Statistical Association 59:18–55. Newhouse, J.P., and the Insurance Experiment Group. 1993. Free for All? Lessons from the RAND Health Insurance Experiment. Cambridge Massachusetts: Harvard University Press. Overall, J.E., and Doyle, S.R. 1994. Estimating Sample Sizes for Repeated Measurement Designs. Controlled Clinical Trials 15:100–123. Pierce, P. 1997. Physical and Emotional Health of Gulf War Veteran Women. Aviation, Space, and Environmental Medicine, P. 68. Presser, S. 1989. Collection and Design Issues: Discussion. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 75–79. Robbins, H. 1974. A Sequential Test for Two Binomial Populations. Proceedings of the National Academy of Sciences, USA 71:4435–4436. Rubin, D.B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: John Wiley. Rubin, D.B. 1996. Multiple Imputation after 18+ Years. Journal of the American Statistical Association 91:473–489. Schafer, J.L. 1997. Analysis of Incomplete Multivariate Data. London: Chapman and Hall. Silberstein, A.R, and Jacobs, C.A. 1989. Symptoms of Repeated Interview Effects in the Consumer Expenditure Interview Survey. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 289–303.

OCR for page 96
Stretch, R.H., Bliese, P.D., Marlowe, D.H., Wright, K.M., Knudson, K.H., and Hoover, C.H. 1995. Physical Health Symptomatology of Gulf War-Era Service Personnel from the States of Pennsylvania and Hawaii. Military Medicine 160:131–136. Waterton, J., and Lievesley, D. 1989. Evidence of Conditioning Effects in the British Social Attitudes Panel. In: Panel Surveys (Eds. D. Kasprzyk, G. Duncan, G. Kalton, and M.P. Singh). New York: John Wiley. Pp. 319–339. Wei, L.J., and Durham, S. 1978. The randomized play-the-winner rule in medical trials. Journal of the American Statistical Association 73:840–843. Weinberger, M., Nagle, B., Hanlon, J.T., Samsa, G.P., et al. 1994 Assessing Health-Related Quality of Life in Elderly Outpatients: Telephone versus Face-to-Face Administration. Journal of the American Geriatrics Society 42:1295–1299. Weinberger, M., Oddone, E.Z., Samsa, G.P., and Landsman, P.B. 1996. Are Health-Related Quality-of-Life Measures Affected by the Mode of Administration? Journal of Clinical Epidemiology 49(2):135–140. Weinstein, M.C. 1974. Allocation of Subjects in Medical Experiments. New England Journal of Medicine 291:1278–1285. Whitehead, J. 1997. The Design and Analysis of Sequential Clinical Trials, revised 2nd edition. New York: John Wiley. Wu, A.W., Jacobson, D.L., Berzon, R.A., Revicki, D.A., et al. 1997. The Effect of Mode of Administration on Medical Outcomes Study Health Ratings and EuroQol Scores in AIDS. Quality of Life Research 6:3–10. Zelen, M. 1969. Play the Winner Rule and the Controlled Clinical Trial. Journal of the American Statistical Association 64:131–146.