Read "Evaluating AIDS Prevention Programs: Expanded Edition" at NAP.edu

« Previous: C Methodological Issues in AIDS Survey

Page 317 Cite

Suggested Citation:"D Sampling and Randomization: Technical Questions about Evaluating CDC's Three Major AIDS Prevention Programs." National Research Council. 1991. Evaluating AIDS Prevention Programs: Expanded Edition. Washington, DC: The National Academies Press. doi: 10.17226/1535.

Page 318 Cite

Page 319 Cite

Page 320 Cite

Page 321 Cite

Page 322 Cite

Page 323 Cite

Page 324 Cite

Page 325 Cite

Page 326 Cite

Page 327 Cite

Page 328 Cite

Page 329 Cite

Page 330 Cite

Page 331 Cite

Page 332 Cite

Page 333 Cite

Page 334 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

D Sampling and Randomization: Technical Questions about Evaluating CDC's Three Major AIDS Prevention Programs Following the release of the first edition of Evaluating AIDS Prevention Programs, CDC program personnel met with the pane] and raised a number of questions about the report. This appendix deals with essentially technical matters relating to the implementation of some of the report suggestions in particular, questions about sampling and the random assignment of treatment and control groups. Appendix E deals with the evaluation of projects that are ancillary to, emerging, or related to those discussed in Chapters 3 and 5. The first section of this appendix treats the following technical issues related to sampling: the number of case studies to be used in a pro- cess evaluation of the counseling and testing program; the sample sizes needed to evaluate the effectiveness of all three programs; suggestions for controlling attntion; and the comparison of convenience samples and probability samples. The second section addresses two aspects of using randomized experiments to evaluate a project's effectiveness: successful experiments In the AIDS prevention arena and the ethics of no-treatment controls. The panel's objective is to outline some of the general principles Involved with sampling and randomization as part of research design. Because the panel's original task was one of developing overall evaluation strategy rawer than rendering detailed technical advice, we have been reluctant to provide any kind of specificity on questions of sample size, the use of convenience samples, and so on. The panel believes strongly that 317

318 ~ APPENDIX D technical advice of this sort is so context-driven that each set of evaluation objectives warrants its own response. Such advice is best fashioned by statistical and subject matter experts who can assess each evaluation problem on its own terries. Thus, our foremost recommendation is that CDC either develop the requisite in-house expertise among personnel responsible for evaluation research or contract for expert services when these types of questions arise. Nonetheless, we offer the following general information in the hopes that it will prove to be useful. SAMPLING ISSUES Personnel from the National AIDS Information and Education Program (NATEP) and the Center for Prevention Services (CPS) raised questions about the optimal number of case studies and sample sizes needed for evaluating a project's effectiveness. Related to the issue of sample size is the question of how to control attrition. In addition to addressing these sampling issues, this section includes some thoughts about using convenience samples when it is not possible to calTy out probability sampling. Number of Case Studies The purpose of conducting case studies of counseling and testing sites is to identify the variables to be considered in evaluating how well services are delivered: i.e., who is being served, do they complete the service protocol, what are the baIIiers, and so on. The question is supple: How many case studies need to be performed? The answer is complicated: As many as it takes to identify the relevant variables and no more. Unfortunately, it is impossible to predict how many case studies this will entail. Moreover, no "optimal number" exists, and it is impossible to recognize that a satisfactory number has been covered until that number has been exceeded. In other words, when the researcher recognizes Cat additional case studies are shedding no more significant light on one's understanding of service delivery, it no longer makes practical sense to continue such field research.] A good sampling scheme is important In malting a correct decision. The pane] believes that a stratified sample of counseling and testing sites is the best method for gathering case data on service delivery variables. In Chapter 4, the pane] suggested a 2 x 2 x 2 matrix (stratifying by seroprevalence rates, activity, and target group), for case studies of a 1 Obviously, if the goals of service delivery or the needs of evaluation research change, new case studies will again become necessary.

SAMPLING AND RANDOMIZATION ~ 319 sample of community-based organization (CBO) projects. This scheme would require a minimum of ~ case studies.2 The pane! believes that the sample of case studies of counseling and testing projects should be similarly laid out but will be larger because of the greater number of site . . vanatlons. A stratification scheme would probably best be planned by CDC program personnel, who are familiar with key variables In the different projects that the agency funds as well as the distnbution of those variables. Nonetheless, the pane] suggests the following stratification variables: · Type of facility, e.g., health department, family planning citric, drug treatment center, clinic for treatment of sexually transmitted disease, and so on (already the matrix is larger than that proposed for CBOs because of the diversity in types of setting); Seroprevalence rates or number of AIDS cases (i.e., low, middle, or high prevalence areas); Type of region (i.e., urban or rural). Sites should be selected on the basis of the important service variables, not simply because they are convenient or their staffs are cooperative. The important dimensions should incorporate the diversity among counseling and testing sites. As case studies are conducted, program staff at CDC or the staff's evaluation consultants will need to carefully assess information as it is collected from site visits to determine when it is sufficient, so that resources are not needlessly spent on gathering redundant information. Staff will also need to keep abreast of organizational and goal-related changes at the project level, so that the information does not become outdated. Estimating Sample Sizes On the panel's experience, sample size calculations rarely overestimate the number of units required and quite commonly underestimate them. To determine sample size for use in an efficacy or effectiveness trial, the investigator must first specify four factors. For the simplest possible case,3 they are: 2As ~nennoned in Chapter 4, some cells in Me ma~x may be empty, in which case We number of case studies required would be smaller. 31his explication presents a simplified overview that omits many complications that are not treated here. Detailed treatments of this topic can be found in Lipsey (1990), Cohen (1988), and Kraemer and Thiemann (1987).

320 ~ APPENDIX D 1. The kind of analysis that will be performed (e.g., ~ test, regression, estimation of a proportion or difference in pro- portions, etc.) and the statistical mode} that is assumed to describe the data distnbution. 2. The minimum effect the experimenter wishes to detect, if indeed there is an eKect. For instance, if an investigator's outcome variable is a respondent's number of sexual part- ners, the desired "effect size" might be a reduction by a factor of 2. There are standard ways to express the effect size, and they depend on the type of analysis that will be performed. For example, for comparing two group means, the effect size (ES) is usually defined as the difference between the two population means fibs, R2), in units of stan- dard deviations Alp. The formula is: ~1II2 ES= 3. The "[eve] of significance" (~) at which the test wiU be performed. The level of significance is the probability of concluding there is an effect when none really exists. Con- ventional values are .01 and .05.4 4. The desired "power" for the test. Power refers to the prob- ability that the minimum effect specified (or a larger effect) will be detected, if it exists. These factors are sufficient to determine the needed sample sizes, but they must be specified in light of their possible uses. Although the last three factors may be specified by a sophisticated investigator, combining the four factors to derive the implied sample size is best left to a statistician. For example, a statistician might suggest careful research designs such as blocking and matching to reduce Me size of the standard deviation Alp, which in turn can either increase the size of the effect one will detect or reduce We size of the sample necessary for the study. As intuition may suggest, the appropriate sample sizes increase as smaller effect sizes or smaller it's are chosen and/or as the desired power increases. Statistical texts can help investigators determine these venous factors and then use them to find the appropriate sample size, but even these specialized texts and their reference tables require an 4It should be recognized that besides merely testing the null hypothesis (that the difference between groups is zero), one may have a particular interest in obtaining an estimate of the magnitude of the treatment effect with a given degree of precision. The texts noted in the preceding footnote will provide detailed guidance in this regard.

SAMPLING AND RANDOMIZATION ~ 321 understanding of several statistical concepts. Moreover, tables do not exist for comparisons more complicated than t tests and analyses of variance. To resolve the matter, the pane] advises that the investigator specify the desired factors (effect size, a, power level) and then consult with a statistical expert to determine the necessary sample size. Controlling Attrition In Chapter 6, the pane] discussed attrition and a lack of compliance as potentially important detractors from a project's effectiveness, and we noted that such phenomena are useful endpoints to be studied because they can reveal whether a project is too unattractive to retain its participants. A project that cannot motivate its participants to comply or to stick with the protocol cannot be practically effective. In some cases, however, attrition occurs for reasons unrelated to project attractiveness. The loss of data through attrition is a potentially serious source of bias, with the level of the problem depending in part on the amount of attrition and in part on other factors. Attntion becomes more senous, for example, if the outcome vanable is relatively uncom- mon, if the treatment is causing expenmental group members to drop out, or if the lack of treatment is causing control group members to drop out. Whether attrition is a problem depends on a given situation, so that no "standard appropriate attrition rate" can be advised. The pane} wishes to suggest ways to help contain attrition. In Chapter 4, He pane] mentioned two ways: confidentiality guarantees and some form of compensation to respondents. We further discuss these and other suggestions below. Confidentiality Guarantees Assurances of confidentiality, which should be fairly easy to guarantee in any COO study, typically have been found to decrease attrition and item nonresponse (Singer, 1978; Pane} on Privacy and Confidentiality as Factors In Survey Response, 1979~. Anonymity may be slid more successful in reducing nonresponse (Moore, Lessler, and Caspar, 1989), but it obviously hinders follow-up. To meet this challenge, researchers involved In CDC's community demonstration projects have tried a code name system to track individuals. Under this system, media announce- ments summon respondents by code groups for periodic reinterviews. Return rates, however, have been modest, ranging from 50 to 80 percent of project participants, perhaps because the media approach is not suf- ficiently visible or persuasive. More intensive follow-up would require names and over locator information. Tanur (1982) has proposed one

322 ~ APPENDIX D method for protecting the anonymity of respondents for whom this infor- mation is known; i.e., making telephone follow-ups asking respondents to anonymously call back the interviewer later. Compensation Under certain conditions, compensation has been shown to be an effective method of curbing attrition. Ferber and Subman (1974) reviewed the effects of compensation on response rates in consumer surveys and found a mix of outcomes. For example, some form of compensation (cash, gifts) appeared to contribute to a higher response rate when the study conditions were burdensome to participants (e.g., when they were asked to participate in a longitudinal study, to keep records or dianes, or to come in to a study site). In less burdensome settings (such as one-time interviews with no written records), compensation was not particularly helpful in increasing response. Moreover, compensation seemed to be more effective among certain groups of respondents than others (e.g., it was more effective among participants with lower incomes and education than it was among middle class participants). Cannell and Henson (1974) posit that survey participants must be sufficiently motivated to provide information. When the purpose of the study is perceived to be compatible with persona or social goals, compensation may not be important; however, money might motivate the respondent who feels that no other goal for participating ex~sts.5 In the AIDS prevention arena, few studies have been made on the impact of compensation to complete a study's protocol, but the studies that are available suggest that the goals or motivations of participants are indeed unportant in recruitment and attrition. For example, Fox and Jones (1989) found that participants in the Baltimore MACS study reported that prizes were not a major reason for continuing to volunteer. In fact, 74 percent of recruits returned for follow-up after a single mailed notice was sent (follow-up was even higher 84 to 90 percent after telephone requests for return). Carballo-Dieguez and colleagues (1989) provide an example in which compensation was counterproductive. They found that a $10~our payment appeared to be the major motivation for participation in a S-year study on HIV disease progression among intravenous drug users and led some candidates to become what investigators characterized as aggressive and manipulative to secure enrollment. 5 The most important source of motivation appeared to be the investigators' interactions with the par- ticipants (Cannel! and Henson' 1974).

SAMPLING AND RANDOMIZATION ~ 323 In an interesting contrast, Davis, Faust, and Ordentlich (1984) turned the concept of financial incentives on its head In a successful plan to re- duce dropout rates in a smoking cessation program. In a randomized al- ternative treatments study, researchers obtained $20 deposits from volun- teers who received one of four different self-help packages; them deposits were refunded only after five follow-up interviews were completed (re- gardIess of outcome). The follow-up rate was 95 percent. Implementing this suggestion clearly would not be feasible in low-income community or outreach projects; it might, however, be possible in other types of projects, such as one that provides valuable resources like counseling activities to middle class gay males. Stabilization Funds To retain respondents in an intervention study, some (CDC) personnel suggested providing emergency "stabilization" fimds to project partic- ipants. Such fiends go beyond a token compensation and, for some participants, can mean the difference between leaving the study area or staying to receive an intervention and participate in its evaluation. The panel looked at this suggestion from two sides. On the one hand, we saw that it may at times be necessary to alter the social environment to provide the intervention. Conversely, however, if the purpose is to evaluate the intervention, it is damaging to alter the environment because it may contaminate the intervention with an "additional value" (here, He emergency funds). One method of incorporating such incentives into a research design is to provide the additional value to the whole pool of program candidates before assigning individuals to the experimental and condor groups.6 In this way, the samples are drawn from a homogeneous population. ~ fact, this method could enhance the feasibility of randomization because an investigator is likely to have a larger pool of willing study participants once members of a community learn that a given evaluation will provide them additional funds. However, the provision of the additional value would change the evaluation from a study of effectiveness to a study of efficacy.7 Cultivating and Tracking Respondents Other ways Hat have helped to avoid attrition include: familiarizing respondents at first contact with the importance and purpose of the study 6Designs that provide incentives only to participants and not to controls confound the incentive win die treatment. The results of such designs would be particularly difficult to interpret. 7 ' See Chapter 3 for a full discussion of "efficacy" versus "effectiveness."

324 ~ APPENDIX D and cultivating their participation; proxy reporting (less likely but possible when sex and drug partners are recruited into a project); participant screening (described in Chapter 61; and rigorous follow-up. The last is strengthened when the researcher gathers all relevant information about all respondents Experimental and condom group members alike) at the time of initial contact and, if multiple follow-ups are anticipated, every time that respondents are recontacted. "All relevant information" is meant to include locator information, characteristics of the target population, and information about variables affecting the content of the treatment that are related to the outcome. Gathenng such comprehensive information serves multiple purposes, but it is especially important for tracking respondents. In addition to rou- t~ne information such as respondents' names and addresses, nontraditional identifying data, such as alternative locator information, social security number, and date of birch should be collected. Follow-up is facilitated when the researcher has the name and address of several persons who are likely to know a respondent's whereabouts; alternative locator info~a- tion is especially useful In situations that involve mobile populations such as intravenous Mug users or prostitutes. Access to respondents' social security numbers and birth dates also facilitates locating them through archival records such as voter registries, tax roils, motor vehicle records, credit bureaus, marriage licenses, real estate records, death certificates, wills, and the like. Federal agencies that remain in touch with some respondents include the Veterans Administration, the Social Secunty Ad- min~stration, and the Internal Revenue Service. Successful follow-up by mail includes the use of postal services such as forwarding and record up- dates. Telephone techniques include the use of directones, local and long distance operators, and reverse directories that provide numbers of former neighbors. Telephone searches may include searches made directly for the respondent, for known relatives or alternative locators, or for persons with the same last name who may be related to the respondent. Tracking adolescents poses some particular problems. For one thing, state law sometimes prevents a researcher from gaining access to adoles- cents without parental consent. In some contexts, when the adolescent comes to the researcher, the teen is legally considered an "emancipated minor" with power to consent. For an adolescent to initiate and persist in these contacts, he or she must be strongly motivated to participate In the research. In cases such as these, compensation may be an effective motivating tool. On the other hand, where parental consent is given, investigators can actively follow up their adolescent participants. Pirie and colleagues

SAMPLING AND RANDOMIZATION ~ 325 (1989) report on studies for which good background data on the adoles- cents and their parents and guardians (names and social security numbers) was helpful, as was the cooperation of school districts in tracking students and transfers. Other public records were generally not successful-sources for tracking adolescents, with the exception of Diverts license records in a study centering on suburban adolescents. Telephone tracking was important, but had to be modified to focus on parents or guardians; tele- phone tracking was particularly important in rural areas, where listings for a given name often led to persons who knew or were related to the respondents. Personnel for Tracking Respondents Even with good information for tracking respondents, investigators must use sheer persistence to approach the goal of 100 percent follow-up. For maximum follow-up, the research team has to have the personnel neces- sary to do the tracking (which implies having the necessary resources to support such personnel). Trackers need not be research investigators, but they do need to be trained in the search techniques discussed above. The time necessary to follow up respondents should not be under- estimated. In addition to tracking participants through a paper trail of records, field work is necessary to locate many individuals. Field work involves tracking time and interview time, both of which need to be factored into each scheduled follow-up. In some cases, the tracking pro- cedure may be more intensive than the intervention itself. For example, tracking participants from a drug treatment center may be much more labor intensive than providing the original counseling intervention. Modeling Attrition The second reason to get as much information about individuals in the study at intake is that, if attrition does occur, researchers will be able to estimate the characteristics of nonrespondents. Since nonrespondents differ from respondents in terms of their refusal to respond or Weir being untraceable, they may also differ in terms of the variable that the evaluator is trying to measure. When they do, the validity of evaluation results will be subject to question. Where attrition occurs, there is a need to model the causes and distribution of nonresponse. Ignoring nonresponse altogether implies acceptance of a model that says Mat nonrespondents are distributed in the same way as respondents.8 Alternatives to ignoring nonresponse all This admits to some refinement depending on the analytic strategy employed. A typical default as- sumption in a life-table analysis is that persons lost resemble persons followed from He time of loss onward, not from time 0.

326 ~ APPENDIX D call for estimatingor guessing the ways in which nonrespondents may differ. One way is to assume that they resemble some specified subset of the respondents. Other ways may exploit information about the nonrespondents (e.g., demographic features, responses at Initial contact, etc.) to estimate or impute their later, missing responses. (For furler discussion of missing data, see, e.g., Little and Rubin, 1987.) Another approach is to locate and interview the dropouts to find out what caused attrition and thus to mode} self-selection bias.9 (As noted in Chapter 6, it should be clearly understood that attrition and noncompliance in an experiment introduce uncertainties that directly parallel those that arise in nonexperimental studies. Modeling their ef- fects, in turn, invites inferential uncertainties parallel to those that beset modeling effects in nonrandom~zed studies.) Convenience and Probability Sampling As noted In Chapter 5, We panel has recommended that CDC conduct population surveys Cat include potential and actual clients of counseling and testing services. By measuring a population's experience with and desire for these services, such surveys could be used to evaluate barriers to access and provide insight into perceived availability, needs for the services, and fears about the system. In addition to community and general population surveys, the pane] recommended surveys of high-risk or hard-to-reach groups using probability samples whenever possible. We recognized that it may sometimes be difficult to construct the sampling frames from which probability samples of some high-risk groups can be drawn. The numbers and demographic profiles of gay men and intravenous drug users, for example, are not known with any certainty, nor are definitions of group membership always clear. The pane! observed that because of He difficulty and cost ~n- volved win population-based samples, replicable convenience samples can sometimes be used. The term "convenience" sample is not meant to convey a naive or effortless assemblage of study participants. Conve- n~ence samples are simply a type of nonprobability sample and can be devised in many ways, with some designs weaker or stronger than others. For example, "accidental" samples are drawn from subjects at hand and are rawer easy to implement, but are far from being representative of We general population. "Purposive" samples are more carefully constructed to reflect the researcher's best judgment about what is tYDical of the 9Rossi and Freeman (1982) suggest community surveys as often Me only feasible means of discov- enng nonparticipants; the panel considers this alternative more difficult than getting the important infonnation at intake and tracking tirelessly.

SAMPLING AND RANDOMIZATION ~ 327 larger population; they probably have a more substantive claim to an adequate coverage of the population. Regardless of construction design, estimates of population parameters from these sorts of convenience sam- ples have unknowable amounts of bias and vanance, and results are not generalizable to the constituents of the high-nsk groups. For comparing interventions, however, convenience samples may be useful. Alternative interventions can be compared by assigning them to randomly chosen subsets of a convenience sample of persons, or clinics, or other relevant unit of analysis. The random assignment solves the question of internal validity. The main hazard to external validity arises from the possibility of qualitative interaction between treatment effects and population subgroups. This risk cannot be dismissed, but it can be typically expected to be less threatening than Me risk of large direct differences between population and convenience samples. The pane! did not discuss convenience samples at much length, but Me parent committee in its first report (Turner, Miller, and Moses, 1989) did review a number of nonrandom or nonprobability sample studies of gay and bisexual men and of drug users. Briefly, the studies can be arranged along a spectrum of the strength of their sampling schemes. The pane] believes that it would be helpful to highlight and update these examples here. Sample Studies of Gay and Bisexual Men Convenience samples recruited from narrowly circumscribed or "acciden- tal" sources (e.g., STD clinics or gay bars) are frequently used, but their potential for infening effects to the whole population is seriously flawed. For example, gay men recruited from an STD clinic (e.g., Swarthout et al., 1989) will quite likely have different information needs as well as a different awareness of available testing facilities than the gay population at large; if they are seropositive they may have different medical referral needs as wed. Samples Cat recruit volunteers through public notices (e.g., the Baltimore MACS sampled) are somewhat more useful. Although stall a form of accidental sampling, such kinds of nonprobability samples are improved by casting a wider net, and the volunteers will probably have a more diverse base of needs for and concerns about counseling and testing services. Nevertheless, data derived from such a design will be biased by the self-selection of respondents into the sample. Nonprobability samples and probability samples of narrow ur~verses can be purposively enlarged to ensure differences among respondents by 10MACS~he Multicenter AIDS Cohort Studie~are described in Chapter 6.

328 ~ APPENDIX D including presumably representative groups in a sample. An example of such a purposive sample is a cohort assembled by Martin (19861. The design began with a probability sample of men belonging to at least one gay organization; the sample was supplemented with self-selected volunteers, recruits from a Gay Pride festival, respondents from a public health citric, and a snowball sampling from persons already enrolled in the study. This purposive sampling example is only illustrative- not definitive" as it likely overselects respondents with reasonably high knowledge of available services. For the purpose of measuring barriers to counseling and testing, it might be more helpful to diversify the sample, e.g., by supplementing We base probability sample with patrons of gay bars rather Man users of health clinics. Finally, a probability sample of gay and bisexual men is not impos- sible and of course would produce the most defensible data. One such sample was drawn by the San Francisco Men's Health Study from the Castro district of San Francisco, an area highly populated by gay men and having the highest incidence of AIDS cases In the city (see Winkelste~n et al., 1987~. The sample was representative of that community, and such a design might be quite appropriate in other high-profile gay communities for evaluating the accessibility of testing services. Sample Studies of Intravenous Drug Users Intravenous drug users are a difficult population to survey because of the clandestine nature of drug use activities and the difficulty of defining who is or has been a user at risk of REV from needle-sharLng. Nonetheless, the spectrum of sampling done among intravenous drug users is similar to that done among gay men. Surveys have been largely limited to accidental convenience samples of subpopulations, but purposive and probability sampling have been possible. Members of drug treatment centers constitute the most accessible populations for convenience samples, and numerous examples exist of research samples drawn from methadone and detoxification clinics. As- signment to treatment, however, is nonrandom, and results are not repre- sentative of the general drug-using population. Researchers would face i\A snowball sample is a sampling method in which each person interviewed is asked to suggest additional people for interviewing. 12The design was a clustered probability sampling of single men aged 25 to 55 in the 19 San Francisco census tracts that comprised the area of the city with the highest AIDS incidence rate. Note, however, that despite its scientific sampling plan, the representativeness of the survey could have been flawed by a fairly high nonresponse rate (41 percent), although investigators judged differences between re- spondents and nonrespondents to be "insufficient" to warrant that conclusion.

SAMPLING AND RANDOMIZATION ~ 329 similar problems getting information about testing and counseling ser- vices from such a sample: treatment clientele are likely to have different testing needs and different perceptions about the availability of services than persons who choose to continue using drugs or cannot get into treat- ment. Other frequently used convenience samples include drug users in hospitals, emergency rooms, and health clinics. A more varied but still accidental population is arresters, of whom some 15-50 percent can be identified as drug users (Eckerman et al., 1976~. Using a sample of arresters would probably result in a more diverse group than a sample of clinic clients in terms of individuals' awareness of counseling and testing services and barriers to access. Be- cause self-reports of drug use by arresters may be unreliable, however, screening such as urinalysis may be necessary, making this design more difficult to implement because researchers cannot be sure of the individ- uals' consent. Moreover, arrestees may constitute a more desperate class of drug users those who resort to crime to support their habitthan is representative of the population. Purposive, nonprobability samples using street outreach to recruit IV drug users often attempt to draw on a broader cross-section and be more representative of the drug-using population than samples drawn from a~Testees. For example, several studies have sampled IV drug users recruited from the streets of neighborhoods where drug use prevalence is high (e.g., Abdul-Quader et al., 1989; Inciardi et al., 1989; Wiebel et al., l989~. As with gay study recruits, street user recruits are probably more representative of the broader population than are institutional populations and, because they are active users and vulnerable to health problems, would likely have a variety of needs for counseling and testing services. Still, the conclusions from these samples cannot be generalized to the total population. Some researchers have purposively enlarged nonprobability samples to ensure differences among respondents. Such purposive sampling co- horts have been assembled by researchers In Portland, Oregon (Sibthorpe et al., 1989, recruited from a corrections facility, county health clinics, private welfare organizations, and street outreach), at Johns Hopkins in Baltimore (e.g., Nelson et al., 1989, recruited from street outreach, clinics for sexually transmitted disease, emergency rooms, and drug treatment centers), and New York (Carballo-Dieguez et al., 1989, recruited from a poster campaign, methadone clinics, and inpatient wards). Depending on Heir final composition, these samples may provide some good results. Nonetheless, they are not substitutes for probability sampling and will never provide wholly representative results.

330 ~ APPENDIX D Finally, a probability sample is possible, at least of street drug users. Such a sample can be drawn from a systematic mapping of drug-related activity that includes the enumeration of activities and individuals as well as the selection of potential informants (such as ex-users) to identify active users (e.g., McAuliffe et al. [1987] used this method to deliver AIDS education to randomized neighborhoods of intravenous drug users). This sort of probability sampling could provide good data for analyzing access and barriers to counseling and testing services on the part of noninstitutionalized IV drug users. RANDOMIZATION CDC staff members expressed interest in learning more about success- fully randomized samples in the AIDS prevention arena. This section provides some additional examples of such samples, including examples of experiments with no-treatment control groups. The section also read- d~resses the ethics of implementing randomized teals with no-~eatment controls. Examples of Randomized Experiments In Chapter 4, the panel recommended a strategy of evaluating health education/nsk reduction through randomized experiments In the context of street outreach. Although few evaluation studies of this sort have been published, the strategy is certainly feasible. One example is the sample drawn by McAuliffe and colleagues (1987), who used ax-addicts to deliver AIDS health education to intravenous drug users in randomly assigned neighborhoods of Baltimore; the experimental group had sig- nificantly more knowledge at follow-up than did control group members who did not receive the intervention, although there were no significant behavioral differences. Few formal examples of randomized evaluation of street outreach studies are available in the literature; however, anecdotal reports and dis- cussions with community-based providers indicate substantial opportuni- ties and support for systematically testing venous strategies. Community- based studies offer multiple units Hat can be randomized, such as street corners, street blocks, public housing communities, and less well-defined neighborhoods. Because of Ignited resources, outreach efforts in these communities sometimes have to be employed in a delayed implementation design. Alternative methods of outreach, such as comparing indigenous outreach workers with health professionals, may also be possible. It may be possible Mat only a handful of sweet outreach projects will meet the criteria, discussed In Chapter 4, for randomizing to no- treatment. Similarly, the possibilities in other community-based settings

SAMPLING AND RANDOMIZATION ~ 331 may be few, but where they occur they should be creatively used. As an example of the latter, Bellingharn and Gillies (1989) recently reported on a successful randomized control trial In Great Bntain, In which six youth training centers were randomly assigned to receive an AIDS education comic book. No differences between the groups were detected at pretest, but at pastiest the knowledge scores of the experimental group were significantly higher. Randomizing alternative treatments may be easier still. A recent ex- ample of a "natural" experiment where student classes were randomly as- signed to alternative treatments was reported by Zither and Ziffer (19891. Investigators austere pretests and posttests to students enrolled in parapet one-semester courses on AIDS. The class that received a values and attitudes component In addition to a basic "facts" course showed significant attitudinal change compared with the class who received facts only. The experience of the panelists indicates that the research commu- nity is weB aware of most of the steps that lead to a well-controlled experiment. We wish to emphasize, however, a sometimes neglected step, which is to involve project practitioners in the development of the research protocol. Such involvement, we believe, ought to facilitate co- operation and active participation in experimental studies. By designating a particular program to help in the development of the protocol and to serve as the test site, a prototype is created for the experimental trials. Staff of the prototype project could also assist In the training of new randomly selected sites In the controlled experiment. The Ethics of No-treatment Controls Although in Chapter 5 the panel does not recommend randomized studies with no-treatment controls for evaluating counseling and testing, the pane] does recommend such a design in Chapter 4 for evaluating new CBO projects Panelists debated this issue long and hard. ~ not recommending the design for counseling and testing, the panel's conclusion is based on an ethical consideration that needs to be made more explicit, especially because it may sometimes apply to certain CBO projects: the pane! believes that efficacious patient care is essential and, on ethical grounds, should not be withheld for purposes of evaluation. This aspect should be considered along with He other justifications listed In Chapter 4 for no-~eatment coneo} conditions: scarce resources with which to provide a service; interventions that are of unproven value; and availability of related services elsewhere. The pane] notes an important difference between many CBO in- terventions and CDC's larger counseling and testing program. Despite

332 ~ APPENDIX D CDC's characterization of counseling and testing as an AIDS prevention program, the services can and should be distinguished. Unlike other projects, it is not simply a behavioral intervention. Rather, it is a pro- gram that offers HIV testing. Although testing can provide important information to be used in decisions about sexuality, contraceptive use, and needie-shanng, the test itself is a diagnostic tool and is an important aspect of patient care that the panel believes should be available to all who seek it. Because of this distinction, HIV testing should never be withheld for purposes of evaluating its effectiveness. Similarly, although counseling alone might be considered an intervention to encourage be- havioral change, when it is joined to testing it becomes part of an effective medical care procedure (as a means of explaining test results and as a psychological and social support). Thus, counseling (in the context of counseling and HIV testings also should not be withheld for purposes of evaluation. The pane! therefore recommends an evaluation strategy for counseling and testing in which alternative counseling treatments are randomized and their relative effectiveness assessed. This strategy re- tains the essential parts of the service: the diagnostic technology of HIV testing and the counseling that is part of patient care.~3 At the same time, it allows for the evaluation of alternative counseling methodologies that may be found to have superior value in promoting behavioral change. In deciding whether to without a given CBO service, care must be taken to distinguish whether He service offered is an integral aspect of patient care or is an intervention of unproven worth that is available elsewhere. Consider, for example, a CBO project that provides bleach to intravenous Hug users. Bleach is known to be an effective agent for reducing HIV transmission; thus, information about the utility of bleach as a preventive too} should not be withers. It is not known, however, whether providing a supply of bleach is effective in getting people to use it. Assuming that bleach is otherwise readily accessible, it would be ethical to randomly assign He provision of bleach samples across communities or organizations. REFERENCES Abdul-Quader, A., Tross, S., Des Jarlais, D. C., Kouzi, A., Friedman, S. R., and McCoy, E. (1989) Predictors of Attempted Sexual Behavior Change in a Street Sample of Active Male IV Drug Users in New York City. Presented at the Fit International Conference on AIDS, Montreal, June 4-9. Bellingham, K. and Gillies, P. (1989) AIDS Education for Youth - A Randomised Controlled Trial. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. is although the panel believes it would be unethical not to offer counseling as a part of patient care, it is important to recognize that patients have the prerogative to refuse counseling.

SAMPLING AND RANDOMIZATION ~ 333 Cannell, C. F., and Henson, R. (1974) Incentives, motives, and response bias. Annals of Economic and Social Measurement 3~2~:307-317. Carballo-Dieguez, A., El-Sadr, W., Gorman, J., Joseph, M., McKinnon, J., and Sorrell, S. (1989) Research with Intravenous Drug Users: Problems and Practical Recommendations. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences. Rev. ed. Hillsdale, N.J.: Lawrence Erlbaum Associates. Cook, T. D., and Campbell, D. T. (1979) Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston: Houghton Mifflin. Davis, A. L., Faust, R., and Ordentlich, M. (1984) Self-help smoking cessation and maintenance programs: A comparative study with 12-month follow-up by the American Lung Association. Americas: Journal of Public Health 74~111:1212- 1217. Eckerman, W. C., Rachal, J. V., Hubbard, R. L., and Poole, W. K. (1976) Methodological issues in identifying drug users. In Drug Use and Crime. Report of the Panel on Drug Use and Criminal Behavior (Appendix). Research Tnangle Park, N. C.: Research Triangle Institute. Ferber, R., and Sudman, S. (1974) Effects of compensation in consumer expenditure studies. Annals of Economic and Social Measurement 3~21:319-331. Fox, R., and Jones, L. T. (1989) Maintaining followup in prospective epidemiologic studies of HIV infection: Experience in the Baltimore MACS study. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. Inciardi, J. A., Chitwood, D., McCoy, C. B., and McBride, D. C. (1989) Needle sharing behaviors and HIV serostatus in Miami, Florida. Presented at the Fifth International Conference on AIDS, Montreal, June =9. Kraemer, H. C., and l~hiemann, S. (1987) How Many Subjects? Statistical Power Analysis in Research. Newbury Park, Calif.: Sage Publications. Lipsey, M. W. (1990) Design Sensitivity: Statistical Power for Experimental Research. Newbu~y Park, Calif.: Sage Publications. Little, R. J. A., and Rubin, D. B. (1987) Statistical Analysis with Missing Data. New York: Wiley. Martin, J. L. (1986) AIDS risk reduction recommendations and sexual behavior patterns among gay men: A multifactorial categorical approach to assessing change. Health Education Quarterly 13~4~:347-358. McAuliffe, W. E., Doenng, S., Breer, P., Silverman, H., Branson, B., and Williams, K. (1987) An evaluation of using ex-addict outreach workers to educate intravenous drug users about AIDS prevention. Presented at He Third International Conference on AIDS, Washington, D.C., June 1-5. Moore, R. P., Lessler, J. T., Caspar, R. A. (1989) Technical Report: Results of Intensive interviews to Study Nonresponse in the National Household Seroprevalence Survey. Research Tnangle Park, N.C.: Research Triangle Institute. Nelson, K. E., Vlahov, D., Solomon, L., Lindsay, A., and Chowdhury, N. (1989) Clinical symptoms and medical histories of a cohort of IV drug users: Correlation with HIV seroprevalence. Presented at the Fifth International Conference on AIDS, Montreal, June =9. Panel on Privacy and Confidentiality as Factors in Survey Response (1979) Privacy and Confidentiality as Factors in Survey Response. Report of the NRC Committee on National Statistics. Washington, D.C.: National Academy Press.

334 ~ APPENDIX D Pirie, P. L., Thomson, S. J., Mann, S. L., Peterson, A. V., MuIray, D. M., Flay, B. R., and Best, J. A. (1989) Tracking and attrition in longitudinal school-based smoking prevention research. Preventive Medicine 18:249-256. Rossi, P. H., and Freeman, H. E. (1982) Evaluation: A Systematic Approach. 2nd ed. Beverly Hills, Calif.: Sage Publications. Sibthorpe, B. M., Fleming, D., McAlister, R., Klockner, R., and Gould, J. (1989) Needle sharing among IVDU's where needles are available without prescription. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. Singer, E. (1978) Informed consent: Consequences for response rate and response quality in social surveys. American Sociological Review 43:144-162. Swarthout, D., Gonsiorek, J., Simpson, M., and Henry, K. (1989) A behavioral approach to REV prevention among sero-negative or untested gay/bisexual men with a history of unsafe behavior. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. Tanur, J. M. (1982) Advances in methods for large-scale surveys and experiments. In R. M. Adams, N. J. Smelser, and D. J. Treiman, eds., Behavioral aru] Social Science Research: A National Resource. Part II. Report of the NRC Committee on Basic Research in the Behavioral and Social Sciences. Washington, D.C.: National Academy Press. Turner, C. F., Miller, H. G., and Moses, L. E., eds. (1989) AIDS, Sexual Behavior, and intravenous Drug Use. Washington, D.C.: National Academy Press. Jezebel, W., Altman, N., Chene, D., and Fritz, R. (1989) Risk talking and risk reduction among IV drug users in 4 US cities. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. W~nkelstein, W., Samuel, M., Padian, N. S., Wiley, J. A., Lang, W., Anderson, R. E., and Levy, J. A. (1987) The San Francisco Men's Health Study: m. Reduction in human i~rununodeficiency virus transmission among homosexuaVbisexual men, 1982-86. American Journal of Public Health 76~9~:685-689. Ziffer, A., and Differ, J. (1989) The need for psychosocial emphasis in academic courses on AIDS. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9.

Next: E Ancillary, Emerging, and Related Projects »

Evaluating AIDS Prevention Programs: Expanded Edition (1991)

Chapter: D Sampling and Randomization: Technical Questions about Evaluating CDC's Three Major AIDS Prevention Programs

Welcome to OpenBook!

Get Email Updates