Read "AIDS, Sexual Behavior, and Intravenous Drug Use" at NAP.edu

« Previous: 4 Facilitating Change in Health Behaviors

Page 316 Cite

Suggested Citation:"5 Evaluating the Effects of AIDS Interventions." National Research Council. 1989. AIDS, Sexual Behavior, and Intravenous Drug Use. Washington, DC: The National Academies Press. doi: 10.17226/1195.

Page 317 Cite

Page 318 Cite

Page 319 Cite

Page 320 Cite

Page 321 Cite

Page 322 Cite

Page 323 Cite

Page 324 Cite

Page 325 Cite

Page 326 Cite

Page 327 Cite

Page 328 Cite

Page 329 Cite

Page 330 Cite

Page 331 Cite

Page 332 Cite

Page 333 Cite

Page 334 Cite

Page 335 Cite

Page 336 Cite

Page 337 Cite

Page 338 Cite

Page 339 Cite

Page 340 Cite

Page 341 Cite

Page 342 Cite

Page 343 Cite

Page 344 Cite

Page 345 Cite

Page 346 Cite

Page 347 Cite

Page 348 Cite

Page 349 Cite

Page 350 Cite

Page 351 Cite

Page 352 Cite

Page 353 Cite

Page 354 Cite

Page 355 Cite

Page 356 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 Evaluating the Effects of AIDS Interventions Previous chapters of this report have dealt with understanding the behaviors that transmit HIV, monitoring the spread of infection, and designing and implementing intervention programs to oppose the further spread of the disease. The committee has called for the implementation of planned variations of programs to determine how best to facilitate change in those behaviors associated with risk. Making those determinations requires sound evaluations of the different program variations. Yet evaluation is rarely part of a program's activities. In its review of existing intervention programs, the committee was distressed to find a dearth of associated evaluation activity. Committee members were also disappointed to see a lack of data on behavioral variables for those evaluations that had been conducted. The committee believes that the time has come to make a commit- ment to the rational design of intervention strategies and to careful evaluation of the effectiveness of those strategies through controlled experiments that use carefully defined populations. Knowlecige must be gained from current intervention programs to improve future ef- forts. Evaluation is the process that will enable us to learn from experience. The committee recommencis that the Office of the Assistant Secretary for Health take responsibility for an evaluation strategy that will provide timely information on the relative effectiveness of different AIDS intervention programs. The political realities of evaluation point to both positive and negative aspects of the process. On the one hand, good evaluations 316

EVALUATING AIDS INTERVENTIONS ~ 317 can generate support for effective programs. Well-publicizecT findings of evaluation activities can legitimately defend successful programs that may be viewed as politically sensitive or controversial, while gestures that were merely symbolic can be shown to be ineffective. On the other hand, evaluation efforts are likely to show that programs are less effective than might be hoped. Perfect studies and absolute, permanent change in behavior are standards that are rarely, if ever, met. Every effort should be made to ensure that evidence of imperfect improvement is not used to overturn programs that may be viewed as politically undesirable. There is every reason to believe that the nation can and will do better in determining which interventions change behavior and which do not. Discussions with many people who have been on the front lines of AIDS prevention activities since the early days of the epidemic reveal their clesire for evaluation of their work. To ciate, however, most of these individuals have not conducted evaluationsnot be- cause they were unwilling but because they lacked the capability. The links between those who provide services ant! manage programs, on the one hand, and those who conduct research and evaluation, on the other hand, have not been strong in the past. This chapter discusses basic approaches to and problems in- herent in conducting controlled experiments and evaluations. The committee recognizes that it will not meet the needs of all readers: for some, it will be too basic. For those who are not yet familiar with the techniques of experimental (resign and evaluation, how- ever, we hope it will be a useful introduction. Yet it must be noted that any document such as this cannot take the place of individual, program-specific consultation. It is therefore imperative to establish supportive, productive linkages among program and evaluation pro- fessionals so that future prevention efforts can result in sound, useful information. DIMENSIONS OF EVALUATION It is not always easy to learn from experience, but it is certainly possible. To increase the likelihood of such learning requires the advance planning of evaluations as well as the precise, controlled execution of programs. Good evaluation does not just happen; it must be planned for and arranged. Evaluation is a systematic process that produces a trustworthy account of what was attempted and why; through the examination of results the outcomes of intervention programs it answers the questions, "What was ~lone?" "To whom,

318 ~ LIMITING THE SPREAD OF HIV and how?" and "What outcomes were observed?" Well-clesigned evaluation permits us to ciraw inferences from the data and addresses the difficult question: "What do the outcomes mean?" Well-executed evaluations provide credible information about program effectiveness. Such information is critical to developing rational policies, allocating limited resources, and serving needs in a targeted, productive, ant! economical fashion. Ensuring complete, high-quality evaluation requires advance specification of the program. Thus, in preparing for an evaluation, the design of a program can sometimes be improved by increasing its specificity and establishing standards of performance at the outset. At its best, a process in which program innovations are informed by feedback from careful, prompt evaluations can lead to the timely elimination of ineffective concepts and designs and the selection and adoption of effective ones. A successful evaluation of an intervention program must provide answers to several key questions: 1. What were the objectives of the intervention? 2. How was the intervention designed to be conducted? 3. How was the intervention actually conducted? Who participated? Were there any unexpected problems? What parts of the program were easier to conduct than was anticipated? What parts were harder? 4. What outcomes were observed, and how were they mea- sured? 5. What were the results of the intervention? These questions are presented in a logical order of progression; they are also ranked according to the ease with which they can be answered those at the end are harder to answer than those at the beginning. It is not uncommon to find reports of programs that use the term evaluation to refer to activities that would answer only the first three questions. The committee does not follow that usage. Evaluation refers to the whole set of questions, with particular emphasis on the last two. It is important to distinguish between the outcomes of the pro- gram and the results of the intervention: an outcome (denotes what occurred; an evaluation seeks to determine whether the outcome resulted from the intervention or from some other external factor.

EVALUATING AIDS INTERVENTIONS ~ 319 Program Objectives An intervention program may start with a protocola document that includes a description of the intervention, the activities to be undertaken, the groups targeted to receive the intervention, and the roles and responsibilities of the individuals or groups undertaking the tasks. (Ffequently, funding agencies require such a document from organizations or individuals who are applying for a grant or seeking a contract to support the program.) This protocol is an integral part of an intervention: it should spell out unambiguously the objectives of the program and how they will be measured, as well as the operational content of the work to be performed. The program objectives are the clesired outcomes. They do not, for example, specify an intent to spend a certain sum on an activity or to deliver a certain number of advertisements or pamphlets; rather, they relate to the reasons that motivate the program. Objectives state the outcomes the program seeks to achieve. Program Design Two key elements in program design are defining the measures of the outcome or effect of the program and selecting the treatment unit. A variety of outcome measures can be chosen for study. It may help to conceptualize them along two dimensions: (1) their relevance as indicators of program achievement and (2) the feasibility of actually measuring them. For example, both the behavior and knowledge of subjects may be affected by a program; yet the former, although harder to measure, may be more relevant to the program's objectives. Many AIDS prevention programs have chosen knowledge as the outcome measure; it is easier to gauge than behavior but less relevant to the process of preventing the spread of HIV infection. It is not unusual for programs to include multiple outcome measures that vary in importance. The treatment unit refers to the body to which the intervention will be applied. Variations in this construct and in outcome mea- sures are illustrated in the following four examples of intervention programs. . Two different pamphlets on the same subject are pre- pared. They are sent to inclividuals calling an AIDS hot line and are distribute`] in an alternating sequence. The outcome to be measured is whether recipients return a card asking for more information.

320 ~ LIMITING THE SPREAD OF HIV . Two cTiscussion/instruction curricula about AIDS and HTV infection are prepared for use in high school health education classes. The outcome to be measured is the score on a test of knowledge. A subset of all STD clinics in a large standard metropoli- tan statistical area is ranclomly chosen to introduce a change in fee schedules. The outcome to be measured is the change in patient load. . A community-level prevention program establishes a co- ordinated set of interventions involving community lead- ership, social service agencies, the media, community associations, and other groups. The two outcomes to be measured are knowledge (as assessed by testing) ancT condom sales in the community's ret ail outlets. These fictitious interventions are appliec3 to very different treat- ment units. In the first example, the treatment unit is an individual person who receives either pamphlet A or pamphlet B. If either "treatment" were to be applied again, it woul(1 be applied to a per- son. In the second example, the high school health education class is the treatment unit; everyone in a specific class is given either cur- ricuTum A or curriculum B. If either treatment were to be applied again, it would be applied to a class. The treatment unit in the third example is the clinic. In the fourth example, the treatment unit is the whole community. To make inferences from the results of an evaluation, the treat- ment units that are analyzed must correspond to those that are sampled. For example, when organizations or groups are randomly chosen to receive a given intervention, the sample size is the number of organizations or groups. Because in~livi~luals within each broader unit do not yield independent observations, they cannot be used as statistical units. The consistency of the outcome measures or effects of a par- ticular intervention during repetitions of it is critical in appraising the intervention. It is important to remember that repetitions of a treatment or intervention are counted as the number of treatment units to which the intervention is applied. Planning the intervention program and planning its evaluation should go hand in hand. The evaluation plan is an appropriate part of the protocol, which should state how the measurement and analysis of intendecl outcomes will be conducted. The comparisons to be used to assess the program's results are also part of an evaluation plan.

EVALUATING AIDS INTERVENTIONS ~ 321 Program Implementation Whatever results are judged to have occurred because of an inter- vention, they must relate to the program as it was actually carried . out rather than to the program as it was planned or intended. Thus, it is necessary to determine what services were in fact delivered, to whom, and how. Evaluations that address the actual execution of the program are usually called process evaluations or implementation analyses that is, if the evaluator's role is passive. When the evaluator takes a more active role, these efforts are referred to as trouble shooting or forma- tive evaluations. Process evaluation has three important purposes: (1) to verify that planned services are actually offered and received, (2) to determine how the quality or extent of a service varied, and (3) to develop ideas about how to improve the organization or delivery , ~ ~ o. : services. The best of such evaluations recognize that no policy is ever fully Implemented and no service ever delivered exactly as planned. Eval- uation that discovers, for instance, that a fraction of the condoms received by individuals remained unused, that many free needles were stolen or not received by the target group, or that "counseling" in some cases was based on inaccurate and naive information is im- portant. There are many good examples of formative evaluations. For example, in a study of whether a special community-based pro- gram for the severely and persistently mentally ill worked better than conventional approaches, Brekke and Test (1987) undertook an ap- proach that monitored a sample of clients regularly over three years to determine where treatment took place, the amount and nature of treatment, who received it, and how continuity of care was achieved. Similarly, in a multisite study of which of three regimens worked best to ameliorate the problems of patients afflicted with mental de- pression, Waskow (1984) and Elkin and colleagues (in press) ensured that therapists used the treatment variation that they had agreed to use and determined how they did so. Their work included the de- velopment of manuals that stipulated guidelines, the preparation of training regimens for therapists, and studies of interactions between clients and therapists; their evaluation data also included more con- ventional measures of the number, frequency, and length of therapy sessions. Process evaluations can be designed in different ways to focus more squarely on the treatment target. Recent work conducted by the Center for AIDS Prevention Studies in San Fiancisco investigated three stages in the prevention process: (1) ensuring that individuals

322 ~ LIMITING THE SPREAD OF HIV who engage in high-risk behavior know they are (loin" so; (2) under- standing how, why, and when commitments are made to reduce such behavior; and (3) understanding the factors that influence the way people seek out and act on alternative approaches to risk behavior reduction (Catania et al., 1988~. The best of such process evaluations also direct attention to the families and partners of individuals at high risk in an attempt to understand the frequency and character of support from those sources and how that support may discourage or foster high-risk behavior. Well-executed process evaluations also make plain the stan- dards or criteria against which an organization's performance will be judged. Standardized criteria for assessing programs have been established in some substantive areas by different groups and orga- nizations. The American Public Health Association has developed criteria for health education programs that have been used in the design of a university-based AIDS program that seeks to reduce risk-associated behavior (VaTcliserri et al., 1987~. Defining ant} Measuring Outcomes As noted earlier, there are two dimensions to outcome measures: relevance and feasibility. Often, there are many possible outcome measures inherent in the design of an intervention, and their relevance is obvious. For example, consider an instructional program aimed at persuading people to use condoms. Each of the following questions corresponds to at least one outcome measure: Did the subjects attend the program? Did they pay attention? Did they understand the message correctly? Did they believe and accept it? Did they thereafter use condoms? Did they benefit from that behavior? The questions become progressively harder to answer; in addi- tion, the answer to each succeeding question is more important than the one before it. The choice of an outcome measure or measures is largely a matter of balancing importance against feasibility. In the case of AIDS, the issue of time must also be considered. The extent and character of current condom use are important pieces of information, but the need to use condoms goes beyond the present. Moreover, the lengthy incubation period of HIV may mean that an- swers to the later questions on the list may only be acquired some

EVALUATING AIDS INTERVENTIONS ~ 323 years in the future, when the effect of condom use on HTV prevalence can be seen. Obstacles to the feasibility of outcome measurement, the second of the two dimensions, span a wide range of difficulty. An outcome may simply be impossible to measure. For example, the incidence of HIV infection in the general population is not known because such knowledge wouIc! require repeated serologic testing of a very large probability sample of the population, which is currently impossible. For an outcome like sexual behavior, which is effectively impossible to observe, one can use surrogates or verbal reports about the behav- ior. Other obstacles to measurement are illustrated in the following examples: the failure of a respondent to understand what is asked- in some STD clinics, there are concerns that clients may not understand the term vaginal intercourse; . nonresponse, perhaps by withholding cooperation Hull and colleagues (1988) reported that the 18 per- cent of their sample of patients from an STD clinic who refused antibody testing contained more seropositives than the 82 percent who were tested; . the sheer difficulty of collating information from many sources for each subjectthis has been a problem in measuring the costs associated with health care utiliza- tion; . fearful and inaccurate responses from people who are worried about their vulnerable status or from those who are concerned about legal strictures that may threaten them; . perplexity about how to measure complex concepts, such as perceived self-efficacy; and . cultural and linguistic barriers that may lead to nonco- operation and misunderstanding on the part of respon- dents, investigator, or both. The extent and severity of such obstacles will necessarily in- fluence which outcome measures are chosen for study. Although they may not be widely known, there are many instruments to mea- sure such constructs as the quality of life, depression, perceived sel£efficacy, and satisfaction with care, as well as instruments for measuring knowledge and comprehension (Mitchell, 1985~. Once the outcome measure or measures have been chosen (per- haps following a pilot test, a procedure the committee endorses),

324 ~ LIMITING THE SPREAD OF HIV there remains the task of actually doing the measuring. Often, a difficult measurement problem can be solved by using a surrogate. For example, the incidence of STDs can serve as a prompt, sensitive indicator of behavioral change to prevent sexually acquired HIV in- fection because STDs and HIV can be spread by the same behaviors. If a particular outcome is simply too difficult to measure accurately, a major component of that outcome may serve as a reasonable al- ternative. Thus, ascertaining the total costs of medical care for each of many AIDS patients can be daunting, bud accurate figures for the number of days spent in the hospital, in the intensive care unit, and in nursing facilities may be readily available. Although these estimates ignore unit costs and omit the cost of such items as drugs and respiratory therapy, they may be better outcome measures than the conceptually complete but inaccurately measured total cost. Finding a good surrogate for outcomes that are difficult to mea- sure and devising ways to cross-check those measures call for ingenu- ity and imagination. Inferring Results: The Value of Controller! Experiments Good evaluations can provide accurate descriptions of the interven- tion process and measure the outcomes. Yet there remains the prob- lem of inferring the effects of the intervention. Describing the process and stating the results are not sufficient. A good evaluation should say something about the relationship between the intervention and the outcome. A patient who receives a worthless treatment for the common cold is still likely to get better within a week because that is what usually happens with a cold, with or without the benefit of treatment. To infer that a given intervention has produced a partic- ular effect involves comparing what did happen with the intervention to what would have happened without it. Because it is not possible to make this comparison directly, inference strategies rely on proxies for what would have happened. Such proxies include the patient's past history and comparison groups of various sorts. In some circumstances, extrapolating a trend from a patient's pretreatment history as a proxy for what would have happened is the best that can be done. Yet extrapolation is always problematic: past records may be incomplete, and it is impossible to control for all intervening factors. This type of approach should probably be used as a near-to-last resort. Uncler certain conditions, comparison groups can be quite use- fuT for inferring results, although defining and recruiting a suitably

EVALUATING AIDS INTERVENTIONS ~ 325 similar control group can be difficult. For example, after selecting ethnically similar control and treatment groups, an investigator may find that one group is, on average, older, sicker, or more educated. There are evaluation strategies that attempt to adjust for the dif- ferences between the two groups, but making those adjustments is seldom easy. Three types of information or knowledge are required: (1) knowledge of intervening variables that also affect the outcome of the intervention and that consequently need adjustment to make the groups comparable; (2) measurements on all intervening variables for all subjects; and (3) knowledge of how to make the adjustments properly, which in turn requires an understanding of the functional relationship between the intervening variables and the outcome vari- ables. Satisfying each of these information requirements is likely to be more difficult than attaining the primary goal of the activity, which, simply stated, answers the question, "Does this intervention produce beneficial effects?" With differently constituted groups, inferences about results are hostage to uncertainty about the extent to which the observed out- come actually results from the intervention and is not an artifact of intergroup differences. Fortunately, there is a remedy: establish one, singly constituted group in which to assess treatment effects. To be included in the group, inclividuals must satisfy the inclusion ant! exclusion criteria for the study. A subset of this group is then randomly chosen to receive the intervention, thus forming two com- parable groups. They are not identical, but because they are two random samples drawn from the same population, they are as simi- lar as is possible. Moreover, they are not systematically different in any respect, which is important for all variables those known and those as yet unidentified that can influence the outcome. Dividing a singly constituted group into random and therefore comparable subgroups cuts through the tangle of causation and establishes a basis for the valid comparison of treated and untreated subjects. After establishing two or more comparable subgroups, a good evaluation must ensure that outcome measurement is performed symmetrically for all subjects. For example, if treated subjects are examined and tested at hospital A and untreated subjects are exam- ined and tested at hospital B. it is impossible to determine whether observed differences are due to treatment or are merely artifacts of noncomparable outcome measurement. The foregoing ideas are cen- tral to randomized clinical trials and randomized field experiments, which are discussed in the next section. Although highly desirable,

326 ~ LIMITING THE SPREAD OF HIV partitioning strategies of this type are not practical for every ques- tion. The nonparticipation of some individuals or high attrition rates among participants may cause an investigator to use methods that are less satisfactory for inferring results. This, unfortunately, has been the case with randomized trials of some drug treatment strate- gies: for example, many {V drug users prefer methadone maintenance over detoxification, making it difficult to recruit subjects for random assignment treatment studies. Constraints on Evaluation The above sections have described the basic characteristics of eval- uation. Three additional considerations are discussed here. First, the size of the study, or the number of treatment units, is a function of several factors. Budget constraints may influence the size of the study, in all likelihood, by setting the upper limits. Moreover, time limits may affect the capacity to coordinate a study ant! thus set limits on the number of units. important statistical issues. The major one is determining how large a sample must be to reliably detect the impact of an intervention strategy with a states! (albeit hypothetical) degree of effectiveness. Analyses of statistical power help to avoid study designs that are not sufficiently sensitive to detect an intervention's effects. Analyses of power can dictate an increase in the size of the study to achieve the necessary sensitivity, or, occasionally, suggest a reduction in size, thus saving time and other resources. Finally, in planning programs and evaluations, consideration should be given to pilot tests. The committee strongly believes that every intervention program and every evaluation program should be tested in advance, on a small scale and in a realistic way, to identify problems before more substantial resources are expen(le(l. It is possible to avoid using funds or other resources on programs or evaluations that, with a small pilot test, can easily be seen to be infeasible. A large number of AIDS intervention programs are currently being implemented. Goof! evaluations of these programs may be difficult to perform; they will almost certainly be expensive. To improve the likelihood that high-quaTity evidence of program effectiveness will be obtained, it would be justified to focus what are necessarily finite resources on the best-designecl, best-implemente<1 intervention programs. In addition to pragmatic considerations, there are

EVALUATING AIDS INTERVENTIONS ~ 327 RANDOMIZED FIELD EXPERIMENTS An important component of a well-designed and well-executed study is the random assignment of individuals (or groups) to alternative treatment protocols. In this section, we propose and examine the use of the ranclomized field experiment in the behavioral sciences. A randomized field experiment is a particular kind of controlled exper- iment; it requires that individuals, organizations, or other treatment units be randomly assigned to one of two or more treatments or program variations. Random assignment ensures that the estimated differences between the groups so constituted are statistically unbi- ased; that is, that any differences in effects measured between them are a result of treatment. The absence of statistical bias in groups constituted in this fashion stems from the fact that random assign- ment prevents systematic differences among them, differences that can and usually do affect groups composed in ways that are not ran- dom. In other words, randomization ensures that hidden systematic group differences will not be erroneously identified as real differences that can be attributed to the effects of a program. (Of course, this guarantee about random outcome must be probabilistically hedged, and that is what the significance test does: it points to differences that are too large to reasonably attribute to group disparities induced by the randomization.) Over the past decade, the number ant! quality of randomized field tests for planning and evaluation have increased dramatically. This methodological approach has proved to be as important for behavioral field trials of AIDS intervention strategies as it has been shown to be for clinical trials of chemotherapeutic agents. The com- mittee recommends the expanclec! use of randomized field experiments for evaluating new intervention programs on both indiviclual and community levels. Examples Designing and conclucting randomized field experiments are demand- ing tasks; the examples in this section may be helpful in understand- ing the feasibility of using this type of experiment under various conditions. Coates and coworkers (1987) tested a new stress management training intervention with men who were infected with HTV. The program sought to understand the effects of stress management on risk-associated behaviors and immune functions. Individuals

328 ~ LIMITING THE SPREAD OF HIV were randomly assigned to the treatment or to a waiting list, which constituted the control group. The results showed a decrease in the number of sexual partners for the experimental group, but there was no discernible effect on immune functions. Valcliserri and colleagues (1986) and Leviton and coworkers (1988) mounted field tests comparing two risk intervention strategies targeted at homosexual and bisexual men. The participants were randomly assigned to one of two small groups; each group received either a treatment that stressed information or a treatment that stressed information and the development of social and coping skills. Follow-up studies at 6 and 12 months suggested that-information and skills training brought considerably more benefits than information alone, a finding that accords generally with the evaluation of similar programs for other health problems. The feasibility of formal randomized tests in some clinical set- tings has been shown in programs that are designed to increase knowledge about and compliance with regimens to treat and pre- vent venereal disease. For example, Solomon and DeJong (1986a) randomly assigned clinic patients with gonorrhea to videotape in- structions or to a control condition (no videotape). The results showed that tape viewers understood more about the disease than patients who die! not view the videotape and also exhibited increased willingness to return to the clinic for follow-up care (to test for cure). However, viewing the videotape produced no detectable effect on whether patients informed their sex partners of exposure to infec- tion. In another research effort, videotapes to increase condom use among STD patients were tested at Boston City Hospital. The stu(ly, which used randomly constituted viewing and nonviewing groups in two experiments, found that viewing had a remarkable impact on patients' redemptions of coupons for free condoms. The researchers presumed that the redemption of coupons meant the increased use of condoms (Solomon and DeJong, 1986c). Specially developed venereal disease videotapes and redesigned packaging for antibiotic treatments (in this case, tetracycline) are other interventions that have been tester] in randomized field exper- iments (Solomon et al., 1986~. Clinic patients who were assigned ranclomly to the tape and specially packaged antibiotics exhibited a 10-fold increase in their adherence to the prescribed regimen in com- parison with those assigned singly to a group receiving only conven- tional therapeutic instructions. The videotape also had a measurable effect on delaying patients' reported resumption of sexual behavior.

EVALUATING AIDS INTERVENTIONS ~ 329 Formal randomized field experiments of programs directed at TV drug users are not common, but they could be an important tool for learning more about reducing risk-associated behavior in this group. For example, a program conducted in Baltimore (McAuTiffe et al., 1987) provider! IV drug users with information about AIDS and the sterilization of injection equipment. The outreach workers who disseminated the information were ax-addicts who were ranclomly assigned to work in half of the targeted urban areas. The analyses of the program compared knowledge levels and risk reduction in the areas serviced by outreach workers with knowledge and risk reduction in areas in which they did not work. The outcome measures showed remarkable changes in knowledge for the group receiving the treatment (information), but in contrast to other studies (see Chapter 3), there was no detectable program effect on the reduction of risk-associated behavior. In evaluations of programs that have been designed to reduce risky behavior among adolescents, the treatment units have been institutions (e.g., schools), classrooms, and individuals. In testing "soft" or so-called gateway drug education programs for seventh- and eighth-grade students, for example, Moskowitz and colleagues (1984) randomly assigned school classrooms either to participate in an innovative program or to be part of a control (no program) group. The two groups of classrooms were approximately equivalent; there were no evident systematic differences that could be thought to influence estimates of the outcomes. The results, based on tracking the two groups over time, revealed few Tong-term effects. Similarly, in other studies, schools have been matched and randomly assignec3 to promising "cooperative learning strategies" for classroom instruction related to substance abuse (Moskowitz et al., 1983~. Blythe and coworkers (1981) and Schinke and colleagues (1981) tested intensive interpersonal skills training in a pregnancy prevention program for female aclolescents that used singly constituted groups to which girls were randomly assigned. These experiments and others (e.g., Schinke et al., 1981) were undertaken to estimate the relative effects of various programs on adolescents' abilities to identify problems and to resist and handle pressure that put them at risk. Participants in the experimental group reviewer! by Schinke and colleagues (1981) were better informed, more committed to postponing pregnancy and using contraceptive devices, and more able to resist peer pressure than inclividuals in the control group.

330 ~ LIMITING THE SPREAD OF HIV Benefits and Costs of Well-Designed Experiments The scientific and social benefits of randomized field experiments such as the ones Ascribed above are many. Most importantly, such experiments permit a fair, unbiased comparison among treatments (or interventions), and they provide a statistically legitimate state- ment of one's confidence in the comparison. If AIDS prevention strategies are implemented as well-designed, well-evaluated experi- ments to reduce the risks associated with HIV, they can provide solid evidence about the efficacy of specific approaches. The institutional and policy benefits of using a randomized ex- perimental approach include (1) the recognition that hard-to-clo re- search can in fact be clone and (2) the recognition that better ev- idence is obtainable to inform decisions about allocating resources in the prevention and treatment of HIV infection. It is clear that the superior quality of the evidence from randomized experiments is recognized by agencies that are responsible for informing policy makers; see, for example, Chelimsky (1988) on AIDS; the U.S. Gen- eral Accounting Office (1986) on pregnancy prevention programs; the Institute of Medicine (1985) on medical devices; and Haynes and colleagues (1979) on compliance with therapeutic regimens. Randomized field experiments also encourage those who manage such projects to pay more attention to details that contribute to the proper execution and evaluation of the program. A well-planned program enables managers to anticipate the relative size of program effects and when effects will appear and to improve the search for, recruitment into, and retention of subjects in treatment protocols (Riecken et al., 1974~. The main barrier to such rigorous experiments is that they are cli~cult to design and execute. In adclition to the skills, money, and integrity that are demanded, there are ethical and legal prob- lems, and, often, formidable institutional resistance. Some of these problems can be resolvecI, as discussed in the next section. Threshold Conditions for Ranclomized Experiments Regardless of the feasibility and benefits of randomized tests of AIDS prevention programs, such tests are not always appropriate. Ethical propriety is an important aspect of whether a randomized test is an appropriate approach. The Federal Judicial Center (1981), the research arm of the U.S. Supreme Court, recently examined this topic, outlining four thresh- old conditions that must be present if randomizes! experiments are

EVALUATING AIDS INTERVENTIONS ~ 331 to be considered for use in the judicial system. These conditions are also applicable, within limits, to tests of new programs or to program variations that are clesigned to reduce the risk of HIV infection: 1. Does present practice or policy need! improvement? 2. Is there significant uncertainty about the effectiveness of the proposed treatment? 3. Are there acceptable alternatives to randomizecI experi- ments? 4. Will the results of the experiment be used to improve practice or policy? Affirmative answers to these questions justify serious consider- ation of the use of ranclomized experiments. A fifth consideration concerns the role of coercion in recruiting subjects, a condition that must be consiclered for AIDS interventions, especially among such vulnerable populations as institutionalized men and women. Does Present Practice or Policy Need Improvement? In reviewing AIDS intervention programs, there appear to be very few areas that do not neecl improvement. As this report has shown, far more knowledge is needed about (1) locating individuals who are at risk for HIV infection; (2) making programs more acceptable to those at highest risk, including adolescents; (3) making programs more effective; and (4) eliciting more accurate information from those who are reluctant or who fear to reveal themselves. Each of these needs is important. Is There Significant Uncertainty About the Electiveness of the Proposed Treatment? Whether there is "significant uncertainty" about a proposed inno- vation is often a matter of debate. For AZT (i.e., zidovudine), a treatment proposed for early HIV infection, its severely toxic effects as well as potentially beneficial effects engendered one kind of un- certainty that lee! to early, controlled tests of the cirug. Additional uncertainties about appropriate dose levels for AZT and other newer drugs, the appropriate conditions for their use, and interactions with other treatments also help to justify randomized experiments that are designee! to reduce cloubts about a proposed treatment. In the case of AIDS, an understandable inclination toward wish- fuT thinking and competing interests often complicate the normal uncertainty inherent in evaluating a new treatment. The developers

332 ~ LIMITING THE SPREAD OF HIV of a new drug regimen or therapeutic approach, or of a prevention program, for example, may be convinced that their innovation is effective, a conviction that is based on their evidence. Yet com- petitors may hold a contrary view based on other evidence. The strength of these convictions (which may stem in part from vested interests) makes the independent evaluation of the effectiveness of new treatments extremely important. It also underscores the need for randomized experiments, which preclude unconscious or deliber- ate attempts to bias results by using blind (and random allocation of individuals to alternative treatments and, when possible, blind measurement.) Are There Acceptable Alternatives to Randomized Experiments? In one sense, the question of whether there are good alternatives to randomized experiments is easy to answer. Random assignment produces equivalent groups that make subsequent comparisons as unambiguous as possible. The approach allows evaluators to (lisen- tangle the effects that result from differently constituted groups from the effects of the intervention, a distinction that in turn can prevent a good intervention from appearing worthless and a harmful one from appearing productive. For example, based on nonranclomized tests, oxygen-enrichec3 environments for premature infants were ini- tially thought to be beneficial. Subsequent randomizecl experiments showed that the treatment actually put infants at risk for vision Toss. Similarly, the results of experiments on the Salk vaccine and anti- coagulant drug therapy for acute myocardial infarction, as well as on other medical interventions, were different depending on whether randomized or other program approaches were used. (See Bunker et al. t1977] for case histories of these and other interventions.) Two methodological research technologies, quasi-experiment- ation (Cook and Campbell, 1979) and econometric modeling, have been developed to assess the quality of outcome data from experi- ments in which random assignment has not been used. These meth- ods depend heavily on existing data and strong theory, however, both of which are usually absent or fragmentary and inchoate for new target groups and programs. Individuals who are at risk of HIV infection, for example, are properly classifiable as a new target group because there are limited data and theory on their behavior 1 Blinding refers to the processes that prevent the people who measure outcomes from knowing whether the respondent was in the treatment or control group.

EVALUATING AIDS INTERVENTIONS ~ 333 and responsiveness to specific intervention strategies. This lack of information reinforces the need for randomized field tests of proposed HIV intervention programs. In another sense, the question of whether there is a sound al- ternative to randomized experimentation is difficult to answer. The appropriateness of alternative methods must be judged in relation to the setting in which they will be applied. There may be justification, for example, in running a simple before-and-after evaluation if the only plausible explanation for a change is the intervention. Alterna- tively, a statistical model may be used to predict what would have happened to incTivicluals in the absence of the program that is, pro- vicled the mode] has been shown to work well in other situations and the conditions that affect the model's performance have not changed. Unfortunately, using these alternative methods requires great tech- nical skill, and the user must still accept some threat to the validity of inferences. Will the Results of the Experiment Be Used; to Improve Practice or Policy? No one can guarantee that good evidence will be used in any public forum or, for that matter, in any scientific one. Indeed, bad evidence or a lack of evidence has been used on more than one occasion, in both democratic and totalitarian societies. One has only to recall, for example, Lysenko's genetics or the use of laetrile, orgone boxes, and other useless "medications" to realize that decisions are made and programs established regardless of the state of the data that exist to support them. Yet the fact that some policy makers (and patients) have de- pended on unreliable evidence does not mean that high-quaTity data, once collected, will not be used or that sound data should not be collected. In fact, as noted earlier in this chapter, evidence from ran- domized experiments in the social sciences has become increasingly important to decision making over the last decade. Voluntary Participation In considering the use of experiments in institutional contexts, the Federal Judicial Center (1981) noted that special attention must be paid to the mandatory character of the criminal justice system, in contrast to the "voluntary" character of social and sociomedical systems. RegarcIless of the institutionalized status of program partic- ipants, however, researchers must be extremely wary of introducing

334 ~ LIMITING THE SPREAD OF HIV or allowing coercive program elements. One way to do so involves the informer! consent of subjects. Other devices to ensure that co- ercion is eliminated include the use of institutional review boards to oversee the design and execution of programs, although institutional review boards have the responsibility to oversee only research; other program and administrative actions fall outside their purview. Ef- forts to ensure that all appropriate legal and ethical proprieties are observed in tests of HIV prevention programs that there is no ex- plicit or implicit pressure to participate must go well beyond simple efforts to ensure informed consent on the part of participants and the confidentiality of study results. Organizations as the Treatment Unit in Field Experiments In the first section of this chapter, the committee discussed the treatment unit, which is either a person or another entity to which the treatment is applied. This section considers in more detail the use of organizations as the units of treatment for randomized field experiments. Chapters 2, 3, and 4 of this document note the role of the social environment in molding inclividual behavior. In fact, the principles of behavior that are spelled out in Chapter 4 recognize that any behavior undertaken by an individual will be inhibited or facilitated by community structures and conditions. Consequently, community structures or organizations are appropriate and important targets for intervention programs. The term community organization is used to denote a variety of formal and informal units; thus, towns, neigh- borhoods, youth gangs, and pharmacies, diverse though they may be, are all appropriate units for controlled ranclomized tests of AIDS intervention programs. The primary motive in considering the use of communities as treatment units in randomized field experiments is that of practical- ity. Research findings on health promotion and disease prevention indicate that community-level approaches can be very effective (Far- quhar et al., 1977; Vincent et al., 1987~. It has been difficult, however, to evaluate programs that involve only one or two communities. For example, programs that have used cities as the treatment unit one city as the experiment ant! another as the controlhave found dif- ferences in the composition of the cities and the manner in which different groups gain access to the program (Betsey et al., 1985). Furthermore, the ways in which cities have measured outcomes of

EVALUATING AIDS INTERVENTIONS ~ 335 programs have differed. Even when cities appear to be similar, it is impossible to guarantee that they have been matched on all the vari- ables that may affect the outcome measures. To learn more about the efficacy of variants of community-level programs, we need studies that ranclomly assign multiple communities to treatment or control regimens. There is some research that offers guidance in this kind of exper- imental design. Programs to prevent smoking have matched cities and then randomly assigned one city of each matched pair to an in- tensive antismoking program. These community-level interventions have used the mass media, as well as face-to-face interactions at work sites and public locations and in other areas. In other studies, schools and classrooms have been randomly assigned to programs to prevent the use of alcohol, cigarettes, and marijuana (Schaps et al., 1982; Moskowitz et al., 1984~. Villages in India, Korea, and Puerto Rico have been randomly assigned to alternative fertility control programs to ascertain the best methods of decreasing birth rates and the health risks associated with pregnancy (Hill et al., 1959; R. Ffeedman et al., 1969~. There are also other examples of programs that have used random assignments of neighborhoods, stores, ant! community-based organizations for a variety of health promotion programs. Although there are no examples among AIDS prevention pro- grams of the randomized assignment of organizations to treatment or control groups, strategies that have been employed for non-AIDS pro- grams can provide useful information. The staff performance studies now being undertaken in franchise stores (R. F. Boruch, Northwest- ern University, personal communication, October 12, 1988) may shed some methodological light on programs to increase access to condoms through pharmacies. This research seeks to understand the effects of the ven(lor's demeanor and his or her degree of familiarity on customer's patterns of buying. Other research has used households that have been randomly chosen to receive alternative direct mailing strategies to identify methods to increase awareness of commercial products (Magicison, l988a,b). The information derived from such tests would be far better than that derived from, for example, fo- cus group approaches to determine the effect of media campaigns on people's knowlecige and attitudes about AIDS. Yet there are certain special issues that arise in the use of organi- zations as the treatment unit in large-scare field experiments. These issues have not received thorough attention in the past, largely be- cause such experiments have been few in number. But if the strategy

336 ~ LIMITING THE SPREAD OF HIV of using organizations as the treatment unit can be helpful in learning how to resolve problems in the settings in which they occur, then careful consideration must also be given to the methodological and practical constraints of this approach. The consequences of using organizations or communities as the treatment unit inclucle expense and a reduction of purported! statis- tical power (true power is a function of the number of units that are actually sampled and not the number of inclividuals). If, for exam- ple, there is an average of m individuals evaluated in each of n units, then the sample size is n. To achieve a-comparable study based on randomly chosen individuals, the number of observations that are collected and analyzed must be increased by a multiplicative factor of m. Consequently, effort and expense are expancled by this same factor, which must be taken into account or subsequent evaluation will be inadequate. An obvious preliminary issue to be resolved concerns the use of official records to measure outcomes. Record-keeping practices differ significantly among cities and other relevant units such as schools, hospitals, police departments, and the like. Developing core methods to measure knowledgeabout condoms, for example, or the use of concloms is essential. These methods will also help to stabilize dif- ferences that may occur in measurement quality if the units receiving the treatment improve their record-keeping methods over the course of the program while the control units do not. For instance, cities that have been randomly assigne(1 to a needle exchange program may become more attentive to mortality among IV drug users. As a result, mortality attributed to AIDS may increase in those cities relative to control cities simply because the measures used to gauge mortality have become more accurate. This problem is not different, in principle, from those that occur in programs that randomly as- sign inclivi~luals to alternative treatments. When official records are found to be disparate ancl vulnerable to artifactual change, formal surveys can be used to improve the quality of and the capability to interpret outcome measurement data. Another concern about community-level experiments involves the ability to adapt a treatment protocol to local conditions; this adaptation can be achieved by keeping the core elements of the treatment constant while varying other elements as needed. Such an approach requires measurements that allow evaluators to know the degree to which a treatment has been varied. Local demands are not unique to randomized field experiments, however; (remonstration projects have also been required to be responsive to local conditions.

EVALUATING AIDS INTERVENTIONS ~ 337 A problem that is unique to randomized field experiments is the random allocation of organizations to alternative treatments. For example, recruiting cities to participate in a program will require political leadership and bureaucratic skill. Fortunately, there are precedents for this work. Major collaborative studies in medicine have generated a body of knowledge about the administration of large clinical trials. Although these programs involve individuals as the treatment unit, the administrative and managerial lessons to be learned from such studies are pertinent in the case of larger units. Similarly, the large-scare social experiments that have been conducted over the past 15 years on economic welfare, insurance, and the like also provide valuable information. SPECIAL CONCERNS OF EVALUATION IN THE CONTEXT OF HIV INFECTION AND AIDS Research on HIV infection and AIDS presents many methodological challenges. Although some of them have been met, at least in part, by research programs on other sensitive, controversial, or illegal behav- iors, there are methoclological impediments inherent in this epidemic that can be overcome only by first investing in methodological re- search. Such research is necessary to provide the tools to understand complex behaviors that are enacted in diverse social situations. Thi section outlines several of those challenges. Target Groups AIDS prevention must necessarily focus on groups for whom risk- associated behavior is common. This criterion means that interven- tion programs will be directed toward groups that may be unfamiliar and inaccessible to the research community. For example, as Chap- ter 3 noted, IV drug users have good reasons to remain invisible; if and when they can be located, researchers may find themselves stymied by communication barriers. Linguistic ant! cultural minori- ties present analogous demancis and challenges. The fact that 25 percent of patient admissions to STD clinics are teenagers means that there is also an urgent need to better understand the beliefs, attitudes, motivations, and social networks of people in that age group. These circumstances raise two problems that demand the serious attention of those conducting HIV- and AIDS-related research. First, special efforts must be macle to increase access to hard-to-reach indi- viduals and to elicit valid, reliable information from them. Second,

338 ~ LIMITING THE SPREAD OF HIV in evaluating the effects of new intervention programs, hard-to-Iocate individuals must not only be found but must also be recruited for and engaged in the program. The problems of identifying, engaging, and retaining participants in experiments that are designed to learn which of several strategies works better have been severe enough in the medical arena to re- ceive sophisticated lay attention.2 Evidence on the problem has also emerged from randomized field experiments on training programs for vulnerable groups such as youths (Betsey et al., 1985) and minor- ity women who are single parents (Boruch et al., 1988~; from tests of criminal justice projects (Boruch et al., 1988~; and from other sources. There are many reasons why it is difficult to recruit sufficiently large samples for research programs. First, "guesstimates" of the number of individuals in need of a new service or alternative ser- vice may be wrong. (Recent examples of major disagreements about numbers include the claims about the number of homeless people in Chicago [Ross), 1987, 1988] and the number of individuals who were expected to contract swine flu tSilverstein, 1981~.) One of the more subtle reasons for recruitment problems lies partly in the need to de- sign the evaluation so that it is sensitive to the variations in outcome that can occur with different kinds of people. Such a design requires not only a sufficiently large sample but also homogeneous groups (with respect to age, sexual preference, or other characteristics). Focusing on homogeneous groups means developing clear eligibility requirements for participation in a program. Meeting these require- ments leads to the exclusion of a sizable fraction of a larger, more heterogeneous potential target population. The tension between en- suring both sensitivity and adequate sample size has been examined in such texts as FYieciman and colleagues (1985) and Riecken and coworkers (1974~. Developing the ability to locate and identify individuals who are at risk for AIDS and then engage them in field tests requires new information and perhaps new methods. At least four approaches that have been applied to resolving such problems in other settings are worth considering. First, preevaluation studies by ethnogra- phers can help to identify and provide some unclerstanding of the groups in which risk-associated behavior is prevalent. This kind of study involves careful observation and controlled interaction with 2See, for example, the history of diabetes control and its complications in F. Allen's Wall Street Journal article, "Diabetes Study Aimed at Gauging the Benefit of Strict Regimen Finds Recruiting Hard," March 31, 1987:33.

EVALUATING AIDS INTERVENTIONS ~ 339 individuals from groups of interest. Such work can be helpful in developing better ways to identify and recruit people for projects; to unclerstand their mobility, thus permitting investigators to follow a group over time; and to determine measures that will increase their willingness and capacity to participate. Other essential, early approaches include preeva-luation surveys and the exploitation of ex- isting local data systems. At best, these surveys or systems improve a researcher's ability to estimate the size of the target groups and the extent to which they are more or less eligible. Local data systems on the incidence of relevant behaviors, such as adolescent pregnancy, drug-related delinquency, and criminal activity, may provide a bet- ter statistical characterization of the groups. However, these data should be supplemented with surveys or ethnographic studies for the level of detailed understanding required for good program plan- ning. Moreover, a researcher must consider whether existing ciata are trustworthy. A second approach that could be taken is a pilot study of a planned experiment. The pilot study serves as a miniature exper- iment, incorporating the important elements of the larger effort, such as participant location, recruitment, engagement, and random assignment. Pilot studies have been helpful in improving the qual- ity of programs in various situations, inclucling juclicial settings and corrections contexts. A third approach to recruiting and retaining individuals for par- ticipation in research focuses on outreach specialists who can commu- nicate well with potential subjects and community groups, generate interest, and help ensure the continued participation of the sample. Outreach specialists have been used in manpower training experi- ments involving economically vulnerable women and other hard-to- reach groups. In turn, outreach worker training can be informed by local ethnographic work and those with experience in longitudinal surveys of other har~l-to-reach groups (e.g., homeless people or IV drug users). Knowledge of local networks, community-based orga- nizations, and the local media is extremely helpful. Engaging the interest and cooperation of potential respondents, using culturally appropriate incentives, and minimizing such disincentives as trans- portation problems or threats of social sanctions will improve both the rate of participation in the experiment and the quality of the evaluation. Finally, formal methodological research is needed to understand how and why individuals cooperate in AIDS research and to enhance their cooperation. Clausen and colleagues (1954) conducted early

340 ~ LIMITING THE SPREAD OF HIV work on cooperation: they surveyed parents to understand why they chose to cooperate with field experiments of the Salk polio vaccine, which was aciministered to their chiTciren. More recently, others have conducted methodological research on additional medical questions (Waskow, 1984; L. M. FYiedman et al., 1985~. Information Base Evaluators can draw on a large panoply of measurement devices (e.g., questionnaires and other instruments), interview techniques, and various research designs that have been developed and applied in diverse behavioral and social science contexts. For many areas, however, measurement is more the product of a craft than of scientific development. Few instruments have been systematically tested for reliability and validity. For some of the groups at greatest risk for HIV infection namely, {V drug users, men who have sex with other men but who do not define themselves as homosexuals, and linguistically and culturally isolated groups there is very little information. These deficits in knowledge pose special problems for planning interventions to prevent the spread of HIV infection and for the evaluations of those programs. Objectives of AIDS Intervention Programs The main objective of an AIDS intervention program in the United States is to retard the spread of HIV infection. Yet HIV infection is a difficult outcome to measure. At present, there is only one reliable population-based measure: the prevalence of infection in newborn infants in 30 metropolitan areas (see Chapter 1~. This ongoing sur- vey provides accurate information about the level of infection among childbearing women and thus offers an important vantage point from which to view the future spread of infection. The survey provides data on the geographical location of infection, but it floes not in- clude much other descriptive information. It will gather trustworthy information on a segment of the population, but it will probably be insufficiently specific to measure the effectiveness of interventions targeted at specific groups. The lack of more detailed information (e.g., the mother's address, her membership in treatment programs, her educational level, and other identifiers) to accompany the sero- prevalence data collected by the survey is a necessary consequence of the need to protect those individuals who bear the social and physi- cal burdens of infection. Protecting the privacy of research subjects

EVALUATING AIDS INTERVENTIONS ~ 341 is an important issue, which is discussed in the next section of this chapter. In addition to the newborn serologic survey, information about certain sentinels populations will be collected during CDC's family of surveys. These samples include individuals recruited from certain hospitals, STD clinics, drug treatment clinics, and the like. (For more cletaiT, see Chapter 1.) It is unclear how useful these data may be in evaluating intervention programs, but the possibility of using at least some of the information shouIcT be carefully yet vigorously explored. - - HIV has spread quickly within some groups. Changes in at- titudes and risk-associatec3 behavior also appear to have occurred rapidly in some subsets of the groups at highest risk to acquire AIDS. Some researchers have argued that there are few remarkable differ- ences to be seen in their studies of knowledge ant! attitudes about AIDS because of the vast, rapid changes that have been occurring at the national level. Such changes leave little room for measuring improvements that might otherwise have been in(luce<1 by a new in- tervention. The rapidity and pervasiveness of change indicate that efforts to understand the effects of intervention programs pose chal- lenges that have not previously been encountered. There may be ceilings on the size of the effect that any intervention may produce; to try to identify further small changes may not be warranted. The value of measuring relatively small increments of change will depend in part on the risk that is associated with the behaviors that are being altered. In measuring behavioral change, it is important to remember that change is dynamic and its measurement, at times, quite fallible. The quality of measurement of certain behaviors may not allow researchers to discern whether in fact change has really occurred. Improved surveys of {V drug users, for instance, may show an increase in drug use when no increase has actually happened; such differences simply reflect the use of more accurate measures. Changes of this sort in the quality of measurement are entirely plausible and complicate even the best evaluation designs (Campbell ant] Stanley, 1966~. Some of the chronic problems associated with HIV transmission in U.S. society have origins that prececled the onset of the AIDS epidemic. Determining the best tactics to reduce such practices as {V drug use and the health risks associated with prostitution will 3In epidemiology, sentinel groups are the harbingers of potential infection for the larger population.

342 ~ LIMITING THE SPREAD OF HIV require the design and delivery of better intervention programs with good evaluations. Ethical Issues Questions of individual privacy and the confidentiality of a per- son s research record often loom large in AIDS program evaluations. Positive serologic status constitutes sensitive information, as does self-reported drug use, prostitution, or anal intercourse; these latter behaviors can also make the individual who engages in them vuIner- able to a variety of social and legal sanctions. The ethical propriety of a randomized experiment that is clesigned to establish whether a particular program intervention works is also a major issue. Considerable progress has been made in understancling how to reduce or eliminate privacy problems at both the indiviclual and ex- perimental levels (National Research Council, 1975~. This experience stems largely from research on sensitive topics other than AIDS (e.g., illicit drug use). Three approaches have been useful in protecting privacy and attempting to meet individual interests in ways that do not com- promise the quality of the data or the evaluation. Procedural ap- proaches have used nontechnical devices, such as eliciting informa- tion from individuals who remained anonymous; this approach has been used frequently in AIDS prevention programs. Its limitations are not inconsequential: it is impossible to ascertain the reliability of self-reportec! data if the incliviclual cannot be located or identified. Allowing a respondent to use an alias can obviate this difficulty. This approach was used in a seroprevalence study of prostitutes (Darrow, 1988~. Statistical approaches permit the collection of (lata that can provide specific information on behavior and selected demographic characteristics without permitting the identification of any in(livid- ual. However, these procedures are cumbersome and therefore of limited usefulness in some settings. They have been used to good effect in studying illicit drug use among adolescents and other popu- lations (Boruch and Cecil, 1979~. Legal approaches, such as the Public Health Services Act (P.~. 91-513), provide protection against the use of research records for any nonresearch purpose. PHS protection is afforded in two ways. First, it provides explicit protection for research records on identifi- able individuals that are generated by the National Center for Health

EVALUATING AIDS INTERVENTIONS ~ 343 Statistics ant} its contractors; it also protects indiviclual records gen- erated by the National Center for Health Services Research. Second, the act permits the secretary of the Department of Health and Hu- man Services (DHHS) to award special confidentiality certificates to researchers who undertake work on mental health topics that fall within the PHS purview. However, the protection provided by this second mechanism is limited in several respects: for example, the conficlentiaTity certificate is discretionary, and certain kinds of judi- cial access to records are excluded from protection. Details on the benefits, limitations, and opportunities to be gained from improving legal approaches are cliscussed in Gray and Melton (1985) and in Boruch and Cecil (1979~. Approaches to resolving ethical problems engen~lere(1 by ran(lom- ized experiments have been a topic of serious interest since the early 1970s. Thoughtful treatments of constitutional law and the court's views of randomized tests are given in Breger (1983), TeitIebaum (1983), and Riviin and Timpane (1971~. Threshold conditions for deciding whether certain fundamental ethical standards have been met have been developed for research in other sensitive areas. The monograph of the Federal Juclicial Center (1981) that was discussed earlier provides a helpful enumeration of the criteria for making such decisions. Contemporary ethical prob- lems in biomedical research have been considered! in a range of pub- lications, workshops, anti training programs offered under the aegis of the DHHS Office of Protection of Human Subjects in Research. Other technical approaches to the design of more ethical ex- periments are also important (Cook and Campbell, 1979~. For ex- ample, it may be ethical to randomly allocate some individuals to a new treatment and others to a no-treatment control group when resources for the intervention are very scarce and its effectiveness has not been established. As resources become more plentiful and alternative options are devised, however, the comparison of different treatments acquires a more ethical hue. Statistical technology per- mits researchers to vary the ratio of those receiving one treatment or another, thus maximizing the number of people who might be helped by what is believed to be an effective strategy. The proper use of such methods increases the likelihood of providing valuable services while maintaining the sensitivity of the evaluation.

344 ~ LIMITING THE SPREAD OF HIV Attrition ant! Noncompliance Individuals or organizations may agree to take part in a controlled experiment and then later refuse to participate. Important method- ological and inference problems result from the attrition or failure of subjects to comply with the treatment to which they have been assigned. Even under controlled conditions, attrition anti noncom- pliance rates can induce biases in estimates of program effects; in poorly controlled experiments, such estimates become indefensible. Problems of this sort are not new. LashIey and Watson's (1922) studies of the effects of motion pictures on condom use among U.S. servicemen are an early illustration: because fewer than 800 of 1,200 viewers responded to their surveys, there had to be concern about the validity of their findings. A more recent example of a randomized experiment estimated the effect of a behavior modification program on delinquent behavior (Ostrom et al., 1971~. Some members of the groups assignee! to therapy failed to attend sessions regularly; some failed to attend sessions at all. In other words, some were noncompliers in the sense of failing to adhere to the treatment. In comparing the full randomized groups (i.e., ignoring noncompliers), the researchers could detect no effect of treatment on delinquent activity. They then comparer! the results for compliers only with the whole control group, under the presumption that such an analysis produces easily interpretable results. In fact, however, they could not know which members of the control group would have participated in the modification program consistently hacT they been assigned to it. (It is arguable, for example, that those who wouIcl have participated are those most likely to shed a delinquent life-styTe.) By comparing a selected subgroup of those randomly assigned to treatment with the entire control group, the authors confounded their finding: they created a situation in which the estimate of the program's effect is tangled inextricably with unknowable differences between the treatment group and the control group. Attrition and noncompliance problems occur in experiments on various topics, including studies of drug trials At. M. Ffiedman et al., 1985: Chapter 13~; income maintenance programs (Hausman and Wise, 1979; Boeckmann, 1981~; manpower training programs (Betsey et al., 1985; Boruch and Dennis, 1986~; and criminal justice research (Boruch et al., 1988~. To the extent that there is a high rate of attrition, the data become less defensible and analyses become much more complicated. Different attrition processes may cause estimates

EVALUATING AIDS INTERVENTIONS ~ 345 of program effects to be inflated, deflated, or left unchanged relative to the true effect. To the extent that the noncompliance rate is high, the temptation to compare subgroups is great; giving in to this temptation may seriously compromise the vaTiclity of the study. Survey research conducted over the past 20 years has provided a great deal of information about controlling attrition. Reports on this topic regularly appear in such periodicals as the Annual Proceedings of the American Statistical Association, Evaluation Review, Proceed- ings of the Annual Census Conference, and International Statistical Review. Bibliographies have been issuer! by the National Center for Health Services Research, the National Center for Health Statis- tics, and the U.S. Department of Education. Field tests have been conducted to determine better ways to locate respondents, provide incentives, reduce disincentives, and track and remind respondents to participate. Attention has been paid to applying these findings in a variety of research settings, including face-to-face interviews and telephone surveys. Despite the richness of research in this area, however, it is insuf- ficient to address all of the questions that arise in the design of AIDS intervention programs. More is known about tracking aclolescents over long periods of time than about the other groups of interest. It is especially important to learn more about retaining contact with particularly vulnerable groups such as IV drug users, who are not reached by the treatment system or by other organizations that serve this population. The gaps in knowledge about attrition and noncom- pliance invite the serious consideration of methoclological research to understand how to reach the hard to reach, how to locate and maintain close contact with them, and how to improve the capacity and willingness of such individuals to participate in research and evaluation. Measurement and Observation Eliciting accurate information from vulnerable individuals is not easy. For example, self-reports by {V drug users of their needle- sharing habits may be systematically distorteci. The distortion may be (leliberate as a result of threats, real or imagined, of legal or social sanctions. Such distortion is not uncommon in studies in related areas substance abuse, sexual activity, and the like. Distortion may also be systematic and not deliberate that is, attributable to simple memory lapse. For example, research (e.g., Mathiowetz, 1987) shows that reporting on spells of unemployment is remarkably imperfect. It

346 ~ LIMITING THE SPREAD OF HIV would be surprising if memory-related causes of distortion were any less influential in the recalling of drug-use or sexual activity. Finally, self-reportec3 data may be subject to nonsystematic variation that can be regarded as random. Interviewers, for instance, may vary subtly in the way they elicit information from drug dealers; the dealers in turn may vary in the accuracy of their reports on the sale of clean neecIles. The research literature on the accuracy of self-reported data on sensitive behavior is sparse because such research is extremely difficult. It is especially important to understand the relationship between self-reports and actual behavior, but it is rare to see re- search projects that can correlate self-reported data with direct, in- dependent observation or with observations that are arguably more accurate than retrospective self-reports. This pattern argues for methodological research that: . correlates self-reported data with direct observations of IV drug use and the shared use of needles; . correlates retrospective self-reports of sexual behavior among prostitutes, {V drug users, and others at risk for HIV infection with monitoring (e.g., more frequent interviews or the use of diaries) that is more proximate to the time of condom use or the use of other protections against infection; and . investigates the cognitive processes that individuals use to answer sensitive questions (e.g., memory flaws and distortion that is not deliberate). Although to date such research has not been extensive, the or- ganizational mechanisms to support it are in place. The National Laboratory for Collaborative Research in Cognition and Survey Mea- surement at the National Center for Health Statistics, ant! the grant mechanisms of the National Center for Health Services Research and the National Institute of Mental Health, are important resources for such activities, which are likely to take considerable time and con- sume other resources. Yet the problems engendered by inadequate knowledge of the quality of self-reported behavior will persist un- less the matter is given serious attention. For example, self-reportecl data collected in surveys on drug use are generally thought to be reliable; that is, the reports remain constant for an individual from one time to the next. However, less is known about the validity of these ciata the extent to which self-reports reflect actual behavior. Clearly, memory decay can occur, but the impact of the passage of

EVALUATING AIDS INTERVENTIONS ~ 347 time on reporter! data depends on the salience of the events and the manner in which questions are asked, among other factors. The litany of difficulties presented in this chapter is not intended to discourage evaluation activity. On the contrary: it is hoped that, by discussing the various methodological, legal, and ethical hurdles, each can, in some degree, be dealt with and made more tractable as experience is acquired. Indeed, each difficulty stands to benefit from the systematic approach and cumulation of knowledge that typify good programs of evaluation. IMPI`EMENTING GOOD EVAI,UATIONS Evaluation tennis to evoke sentiments on the part of those being evaluated that are similar to the responses inspired by a visit from an auditor or by interaction with a customs inspector. The feel- ings and fears are not surprising; nevertheless, they have impeded productive collaborative arrangements and severely diminished the returns that are otherwise attainable from good evaluation. Some of the impediments arise from program practitioners' lack of access to individuals with expertise in the area of evaluation. The com- mittee recommencis that evaluation support be provided to ensure collaboration between practitioners and evaluation researchers. The challenge to leadership and management is to remove im- pediments to evaluation. An effective strategy should, inasmuch as possible, remove all basis for the fear and trepidation that has ex- isted in the past. In essence, this means that no person or unit should be punished as a result of an evaluation. In making this strong statement, the committee recognizes that occasionally justi- fie(1 punishment will be withheld. Yet that may be a small cost to pay for fostering program evaluation, which is a valuable part of an organization's process of self-criticism. Oversights and problems are inevitable. The occurrence of some mistakes, some errors, some imperfections is not generally cause for punitive action; properly viewed, they present opportunities. If something goes wrong, evaluation may allow an un(lerstanding of how and why the problem occurred. This understanding in turn can enable a change or adjustment that may forestall similar error in the future. An organization that adopts this ethos and is recognized by staff to have done so, can learn much more effectively from its own experience.

348 ~ LIMITING THE SPREAD OF HIV People can learn from both successes and failures. Unfortu- nately, however, (lescriptions of failed interventions are less likely to be published than reports of successful ones, and they accrue little prestige for those who conduct them. Yet the publication of nega- tive research results can forestall further unfruitful efforts. Because available resources are limited, it is important to try to repeat suc- cessfuT interventions and not to repeat the clear failures. In reviewing AIDS intervention programs, the committee found that the descries tive information typically published about an actual intervention does not provi(le sufficient de-tai! to permit its replication. There- fore, the committee recommencis to the research community that the results of well-conductec} evaluations be published, regardless of the intervention's effectiveness. The commit- tee further recommencis that all evaluations publish cletailed descriptive information on the nature and methods of in- tervention programs, along with evaluation data to support claims of relative effectiveness. The resources required to perform evaluations include money, personnel with relevant expertise, and the time and attention of managers. All are essential components of the process, and some are not always readily available. :[t is evident that choices of emphasis and allocation must be male. Indeed, not every program should receive a full-blown evaluation, although every intervention should probably receive at least some minimal assessment, if only to know what was done and what actually occurred over the course of the program. Setting priorities for the use of evaluation resources ap- pears to rest on several factors: the importance of the intervention, the extent of existing knowledge concerning it, the perceived value of additional information, and the estimated feasibility of the assess- ment. In order to use available evaluation resources most efficiently, the committee recommencis that only the best-clesigned and best-implementec! intervention programs be selectee! to re- ceive those special resources that will be needed} to conduct scientific evaluations. In this chapter and those that precede it, we urge that interven- tions to prevent the spread of HIV be conducted in accordance with two principles: (1) plannecI variants of new interventions should be systematically used and should replace the "one-best-shot" approach; and (2) evaluations of new initiatives should be planned in advance and carefully executed. The committee believes that following these two principles will result in more effective programs of education and

EVALUATING AIDS INTERVENTIONS ~ 349 behavioral change in a shorter time frame. This belief rests on the following propositions: 1. More information is cleveloped and conserved when the evaluation of planner! variants is carried out. 2. Good program ideas are more promptly recognize<] and accepted. 3. Less effective ideas are more promptly recognized ant! eliminated. 4. Agreement on the relative merits of alternatives is eas- ier to reach and can be effected with more confidence when there are systematically acquired data concerning plausible alternatives. When possible, for at least each major type of intervention and each major target population, a minimum of two intervention pro- grams should be subjected to rigorous evaluations that are designed to produce research evidence of the highest possible quality. Vari- ants of intervention programs should be developed for and tested in different populations ant] in different geographic areas using random assignment strategy accompanied by careful evIauation. When ethi- cally possible, one of the variants should be a nontreatment control. The committee recognizes that difficulties will attend the effort to adopt this strategy, difficulties that include not only the challenges of unfamiliarity and the extra work required to prepare several vari- ants of a brochure, curriculum, radio message, or other intervention tool, but also the problems of actually performing evaluations. All such endeavors call for skills and additional resources that may be in short supply for the agencies that are aIreacly heavily committed to coping with AIDS. Despite the difficulties, we believe the achievable benefits are too important to pass by. The first steps to implement the ideas discussed above shouIc3 be taken promptly. The com- mittee recommencis that CDC substantially increase efforts, with links to extramural scientific resources, to assist health departments and others in mounting evaluations. State and local health departments as well as education departments will likely require additional resources as they mount evaluation efforts. The committee sees two steps as sufficient to initiate this process. First, CDC land any other agency that undertakes AIDS prevention programs) should assign to some administrative unit the responsibil- ity for ensuring the use of planned variants of intervention programs and for overseeing a system of evaluating such programs. Second, there should be easy access to extramural resources to help with the

350 ~ LIMITING THE SPREAD OF HIV task of evaluation. These resources might be consultants, commer- cial research organizations, committees of outside experts, or some combination of these individuals and bodies. BIBLIOGRAPHY The entries below inclucle references cited in the chapter and other publications of interest. Citations with an asterisk are material rele- vant to guidelines and standards of evidence for evaluation. Ackerman, A. M., Froman, D., and Becker, D. (1987) The multiple risk factor intervention trial: Implications for nurses. Progress in Cardiovascular Nursing 2:92-99. Alexander, J. F., and Parsons, B. V. (1973) Short-term behavioral intervention with delinquent families: Impact on family practices and recidivism. Journal of Abnormal Psychology 81:219-225. American Psychological Association, Committee for the Protection of Human Partic- ipants in Research (1982~. Ethical Principles in the Conduct of Research with Human Participants. Washington, D.C.: American Psychological Association. Barcikowski, R. S. (1981) Statistical power with group mean as the unit of analysis. Journal of Educational Statistics 6:267-285. Barnes, B. A. (1977) Discarded operations: Surgical innovation in trial and error. In J. P. Bunker, B. A. Barnes, and F. Mosteller, eds., Costs, Risks, and Benefits of Surgery. New York: Oxford University Press. Becker, M. H. (1985) Patient adherence to prescribed therapies. Medical Care 25:539- 555. Berk, R. A. (1986) Anticipating the Social Consequences of a Catastrophic AIDS Epidemic. Department of Sociology, University of California at Santa Barbara. Berk, R. A., and Rauma, D. (1983) Capitalizing on nonrandom assignment to treatments: A regression discontinuity evaluation of a crime control program. Journal of the American Statistical Association 78:21-27. Berk, R. A., et al. (1985) Social policy experimentation: A position paper. Evaluation Review 9:387-429. *Bernstein, I. N., and Freeman, H. E. (1975) Academic and Entrepreneurial Research. New York: Russell Sage Foundation. Betsey, C. L., Hollister, R. G., and Papageorgiou, M. R., eds. (1985) Youth Employ- ment and Training Programs: The YEDPA Years. Washington, D.C.: National Academy Press. Blythe, B. J., Gilchrist, L. D., and Schinke, S. (1981) Pregnancy prevention groups for adolescents. Social Work 26:503-504. Boeckmann, M. (1981) Rethinking the results of a negative income tax experiment. In R. F. Boruch, P. M. Wortman, and D. S. Dordray, eds., Reanalyzing Program Evaluations. San Francisco: Jossey-Bass. Boruch, R. F., and Cecil, J. S. (1979) Assuring Confidentiality of Social Research Data. Philadelphia: University of Pennsylvania Press. Boruch, R. F., and Dennis, M. (1986) Understanding respondent cooperation: Field experiments versus surveys. Pp. 296-318 in Proceedings: Second AnnualResearch Conference. Washington, D.C.: U.S. Department of Commerce. Boruch, R. F., Dennis, M., and Carter-Greer, K. (1988) Lessons from the Rocke- feller Foundation's Experiments on the Minority Female Single Parent Program. Evaluation Review 12:396-426.

EVALUATING AIDS INTERVENTIONS ~ 351 Boruch, R. F., McSweeny, A. J., and Soderstrom, J. (1978) Bibliography: Illustrative randomized experiments for program planning, development, and evaluation. Evaluation Quarterly 4:655-696. *Boruch, R. F., and Pearson, R. (1988) Comparative evaluation of longitudinal surveys. Evaluation Review 12:3-58. Boruch, R. F., Reiss, A., Larntz, K., Address, A., and Friedman, L. (1988) Report of the Program Review Team: Spouse Assault Replication Project. National Institute of Justice, Washington, D.C. Breger, M. J. (1983) Randomized social experiments and the law. Pp. 97-144 in R. F. Boruch and J. S. Cecil, eds., Solutions to Ethical and Legal Problems in Social Research. New York: Academic Press. Brekke, J. S., and Test, M. A. (1987) An empirical analysis of services delivered in a model community treatment program. Psychosocial Rehabilitation Journal 10:51-61. Brownell, K. D., Marlatt, G. A., Lichtenstein, E., and Wilson, G. T. (1986) Under- standing and preventing relapse. American Psychologist 41:762-782. Bunker, J. P., Barnes, B. A., and Mosteller, F. (1977) Costs, Risks, and Benefits of Surgery. New York: Oxford University Press. Campbell, D. T., and Stanley, J. S. (1966) Experimental and Quasi-Experimental Designs for Research. Chicago: Rand-McNally. Catania, J. A., Kegeles, S. M., and Coates, T. J. (1988) Towards an Understanding of Risk Behavior: The CAPS' AIDS Risk Reduction Model (ARRM). Center for AIDS Prevention Studies, Department of Psychiatry and Department of Medicine, University of California at San Francisco. January. *Chalmers, T. C. (1981) A method for assessing quality of a randomized control trial. Controlled Clinical Deals 2:31-49. Chelimsky, E. (1988) Educating People at Risk of AIDS. Testimony before the U.S. Congress, Committee on Governmental Affairs. U.S. General Accounting Office, Program Evaluation and Methodology Division, Washington, D.C. June 8. Clausen, J. A., Seidentefeld, M. A., and Deasy, L. C. (1954) Parent attitudes toward participation of their children in polio vaccine trials. American Journal of Public Health 44:1526-1536. Coates, T. J., McKusick, L., Kuno, R., and Sites, D. P. (1987) Stress Management Training Reduces Number of Sexual Partners but Does Not Enhance Immune Function in Men Infected with Human Immunodeficiency Virus (HIV). University of California at San Francisco. Condiotte, M. M., and Lichtenstein, E. (1981) Self-e~cacy and relapse in smoking cessation programs. Journal of Consulting and Clinical Psychology 49:648-658. Conner, R. F. (1982) Random assignment of clients in social experimentation. In J. E. Sieber, ea., The Ethics of Social Research: Surveys and Experiments. New York: Springer-Verlag. Cook, T., and Campbell, D. T. (1979) Quasi-Experimentation. Boston: Houghton- Mifflin. *Cordray, D. S. (1982) An assessment of the utility of the ERS standards. New Directions for Program Evaluation 15:67-82. Darrow, W. W. (1988) Behavioral research and AIDS prevention. Science 239:1477. *Davis, H. R., Windle, C., and Sharfstein, S. S. (1977) Developing guidelines for program evaluation capability in community mental health centers. Evaluation 4:25-34. Deniston, O. L., and Rosenstock, I. M. (1972) The Validity of Designs for Evaluating Health Services. School of Public Health, University of Michigan.

352 ~ LIMITING THE SPREAD OF HIV Dennis, M. (1988) Implementing Randomized Field Experiments: An Analysis of Civil and Criminal Justice Research. Ph.D. dissertation, Department of Psychology, Northwestern University. Des Jarlais, D. C. (1987) Effectiveness of AIDS Educational Programs for Intravenous Drug Users. Background paper prepared for the Health Program, Office of Technology Assessment, U.S. Congress, Washington, D.C. Dondero, T. J., Pappaioanou, M., and Curran, J. W. (1988) Monitoring the levels and trends of HIV infection: The Public Health Service's HIV surveillance program. Public Health Reports 103:213-220. Elkin, I., Pilkonis, P. A., Docherty, J. P., and Sotsky, S. (In press) Conceptual and methodological issues in comparative studies of psychotherapy and pharmacother- apy. American Journal of Psychiatry. Farquhar, J. W., Maccoby, N., and Wood, P. D. (1985) Education and community studies. Chapter 12 in W. W. Holland, R. Detels, and G. Knox, eds., Oxford Textbook of Public Health. London: Oxford University Press. Farquhar, J. W., Wood, P. D., and Breitose, I. T. (1977) Community education for cardiovascular health. Lancet 1:1191-1195. Federal Judicial Center. (1981) Social Experimentation and the Law. Washington, D.C.: Federal Judicial Center. Ferber, R., Sheatsley, P., Turner, A., and Waksberg, J. (1980) What Is a Survey? Washington, D.C.: American Statistical Association. Fienberg, S. E., Martin, M. E., and Straf, M. L., eds. (1985) Sharing Research Data. Washington, D.C.: National Academy Press. Flay, B. R. (1986) Efficacy and effectiveness trials and other phases of research in the development of health promotion programs. Preventive Medicine 15:451-474. Fraker, T., and Maynard, R. (1985) The Use of Comparison Group Designs in Evaluations of Employment Related Programs. Princeton, N.J.: Mathematics Policy Research. Fraker, T., and Maynard, R. (1987) Evaluating comparison group designs with employment related programs. Journal of Human Resources 22:195-227. Freedman, D., Pisani, R., and Purves, R. (1978) Statistics. New York: W. W. Norton. Freedman, R., Takeshita, J. Y., et al. (1969) Family Planning in Taiwan: An Experiment in Social Change. Princeton, N.J.: Princeton University Press. Freeman, H. E., and Rossi, P. H. (1981) Social experiments. Milbank Memorial Fund Quarterly Health and Society 59:340-374. Friedman, L. M., Furberg, C. D., and DeMets, D. L. (1981) Fundamentals of Clinical l~als. Boston: John Wright. Friedman, L. M., Furberg, C. D., and DeMets, D. L. (1985) Fundamentals of Clinical Trials, 2d ed. Littleton, Mass.: PSG Publishing, Inc. Friedman, S. R., De Jong, W. M., and Des Jarlais, D. C. (1988) Problems and dynamics of organizing intravenous drug users for AIDS prevention. Health Education Research: Theory and Practice 3:49-58. Glaser, E. M., Coffey, H. S., et al. (1967) Utilization of Applicable Research and Demonstration Results. Los Angeles: Human Interaction Research Institute. *Gordon, G., and Morse, E. V. (1975) Evaluation research. In A. Inkeles, ea., Annual Review of Sociology, vol. 10. Palo Alto, Calif.: Annual Reviews, Inc. Gray, J. N., and Melton, G. B. (1985) The law and ethics of psychosocial research on AIDS. University of Nebraska Law Review 64:637-688. Harkin, A. M., and Hurley, M. (1988) National survey on public knowledge of AIDS in Ireland. Health Education Research 3:25-29. Hausman, J. A., and Wise, D. A. (1979) Attrition bias in experimental and panel data: The Gary (Indiana) income maintenancy experiment. Econometrica 47:455-473.

EVALUATING AIDS INTERVENTIONS ~ 353 Haynes, R. B., Taylor, D. W., Snow, J. C., Sackett, D. L., et al. (1979) Appendix 1: Annotated and indexed bibliography on compliance with therapeutic and preventive regimens. In R. B. Haynes, D. W. Taylor, and D. L. Sackett, eds., Compliance in Health Care. Baltimore, Md.: Johns Hopkins University Press. Hill, R., Stycos, J. M., and Back, K. W. (1959) The Family and Population Control: A Puerto Rican Experiment in Social Change. Chapel Hill: University of North Carolina Press. Holland, P. W., and Rubin, D. B. (1986) Research designs and causal inferences: On Lord's paradox. In R. Pearson and R. Boruch, eds., Survey Research Designs. Lecture Notes in Statistics No. 38. New York: Springer-Verlag. Hull, H. F., Bettinger, C. J., Galiaher, M. M., Keller, N. M., Wilson, J., and Mertz, G. J. (1988) Comparison of HIV-antibody prevalence in patients consenting to and declining HIV-antibody testing in an STD clinic. Journal of the American Medical Assocation 260:935-938. Institute of Medicine. (1985) Assessing Medical Technologies. Washington, D.C.: National Academy Press. Jaffe, H. W., Rice, D. T., Voight, R., Fowler, J., and St. John, R. (1979) Selective mass treatment in a venereal disease control program. American Journal of Public Health 69:1181-1182. Janz, N. K., and Becker, M. H. (1984) The health belief model: A decade later. Health Education Quarterly 11:1-47. Job, R. F. (1988) Effective and ineffective use of fear in health promotion campaigns. American Journal of Public Health 78:163-167. *Joint Committee on Standards. (1981) Standards for Evaluations of Programs, Projects and Materials. New York: McGraw-Hill. Kadane, J. (1986) Progress toward a more ethical method for clinical trials. Journal of Medicine and Philosophy 11:285-404. Klein, N. C., Alexander, J. F., and Parsons, B. V. (1977) Impact of family systems intervention on recidivism and sibling delinquency. Journal of Consulting and Clinical Psychology 45:469-474. LaLonde, R. J. (1986) Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76:604-619. Lashley, K. S., and Watson, J. B. (1922) A Psychological Study of Motion Pictures in Relation to Venereal Disease Campaigns. Washington, D.C.: U.S. Interdepart- mental Social Science Board. Leamer, E. E. (1978) Specification Searches: Ad Hoc Inference with Nonexperimental Data. New York: John Wiley & Sons. Levine, R. (1987) Community Consulting. Presented at the annual meetings of the American Psychological Association, Symposium on Clinical Trials in AIDS, New York, August 30. Leviton, L., Valdiserri, R. O., Lyter, D. W., Callahan, C. M., Kingsley, L. A., and Rinaldo, C. R. (1988) AIDS Prevention in Gay and Bisexual Men: Experimental Evaluation of Attitude Change from Two Risk Reduction Interventions. Presented at the annual meeting of the American Evaluation Assocation, New Orleans, La., October 19. Light, R., and Pillemer, D. (1984) Summing Up: The Science of Reviewing Research. Cambridge, Mass.: Harvard University Press. Magidson, J. (1988a) CHAID, LOGIT and log linear modeling. In Marketing Infor- mation Systems. New York: McGraw-Hill and Data Pro Research. Magidson, J. (1988b) Progression beyond regression. DMA (Direct Marketing Associ- ation) Research Council Newsletter 4:6-7.

354 ~ LIMITING THE SPREAD OF HIV Mathiowetz, N. (1987) Response error: Correlation between estimation and episodic recall tasks. Pp. 43~435 in Proceedings of the Survey Research Methods Sec- tion: American Statistical Association. Washington, D.C.: American Statistical Association. Mathiowetz, N., and Duncan, G. J. (1984) Temporal patterns of response errors on retrospective reports of unemployment and occupation. Pp. 652-657 in Proceed- ings of the Survey Research Methods Section: American Statistical Association. Washington, D.C.: American Statistical Association. McAuliffe, W. E., Doering, S., Breer, P., Silverman, H., Branson, B., and Williams, K. (1987) An Evaluation of Using Ex-Addict Outreach Workers to Educate Intra- venous Drug Users About AIDS Prevention. Presented at the Third International AIDS Conference, Washington, D.C., June 1-5. *McTavish, D. G., Cleary, J. D., Brent, E. C., Perman, L., and Knudsen, K. R. (1977) Assessing research methodology: The structure of professional assessments of methodology. Sociological Methods and Research 6:3-44. Meier, P. (1972) The biggest public health experiment ever: The 1954 field trial of the Salk poliomyelitis vaccine. In J. M. Tanur, F. Mosteller, W. Kruskal, R. F. Link, R. S. Peters, and G. Rising, eds., Statistics: A Guide to the Unknown. San Francisco: Holden-Day. Miller, W. R. (1986) Inpatient alcoholism treatment. American Psychologist 41:794- 805. Mitchell, J. V., Jr., ed. (1985) The Ninth Mental Measurements Yearbook, 2 vols. Lincoln: Buros Institute of Mental Measurements, University of Nebraska. Morisky, D. E., DeMuth, N. M., Field-Fass, M., Green, L. W., and Levine, D. M. (1985) Evaluation of family health education to build social support for long term control of high blood pressure. Health Education Quarterly 12:35-50. Moskowitz, J., Malvin, J. H., Schaeffer, G. A., and Schaps, E. (1983) Evaluation of a cooperative learning strategy. American Educational Research Journal 20:687-696. Moskowitz, J., Malvin, J. H., Schaeffer, G., and Schaps, E. (1984) An experimental evaluation of a drug education course. Journal of Drug Education 14:9-22. Moskowitz, J., Schaps, E., Malvin, J., Schaeffer, G., and Condon, J. (1981) An Evaluation of an Innovative Drug Education Program: Follow-Up Results. Report to the National Institute on Drug Abuse. Pacific Institute for Research and Evaluation, Napa, Calif. Moskowitz, J. M., Schaps, E., Malvin, J. H., and Schaeffer, G. A. (1984) The effects of drug education at follow-up. Journal of Alcohol and Drug Education 30:45~9. *Mosteller, F. M., Gilbert, S. P., and McPeek, B. (1980) Reporting standards and research strategies: Agenda for an editor. Controlled Clinical Trials 1:37-58. National Cancer Institute. (1988) Summary: Community Intervention Trial for Smoking Cessation. Bethesda, Md.: National Institutes of Health. National Research Council. (1975) Protecting Individual Privacy in Evaluation Re- search. Report of the Committee on Federal Agency Evaluation Research. Washington, D.C.: National Academy Press. National Research Council. (1982) An Analysis of Marijuana Policy. Report of the Committee on Substance Abuse and Habitual Behavior. Washington, D.C.: National Academy Press. Ostrom, T. M., Steel, C. M., Rosenblood, L. K., and Mirels, H. L. (1971) Modification of delinquent behavior. Journal of Applied Social Psychology 1:118-136. *Pear, R. (1984) Taking the measure, or mismeasure of it al1. New York Times, August 28.

EVALUATING AIDS INTERVENTIONS ~ 355 Public Health Service. (1988) Understanding AIDS: A Message from the Surgeon General. Rockville, Md.: Public Health Service. Reisner, L., David, J., and Turnbull, B. (1988) Evaluation of the Chapter I Tech- nical Assistance Centers (TAC). Report to the U.S. Department of Education. Washington, D.C.: Policy Studies Associates. Riecken, H. W., Boruch, R. F., Campbell, D. T., Caplan, N., Glennau, T. K., Pratt, J. W., Rees, A., and Williams, W. W. (1974) Social Experimentation: A Method for Planning and Evaluating Social Programs. New York: Academic Press. Rivlin, A. M., and Timpane, M.,- eds. (1971) Ethical and Legal Issues of Social Experimentation. Washington, D.C.: Brookings Institution. Robertson, L. S., Kelley, A. B., O'Neill, B., Wixom, L. W., Eiswirth, R. S., and Haddon, W. (1974) A controlled study of the effect of television messages on safety belt use. American Journal of Public Health 64:1071-1080. Rosenbaum, P. (1987) A nontechnical introduction to statistical power and control of bias. Pp. 174-185 in J. Steinberg and M. Silverman, eds., Preventing Mental Disorders. Rockville, Md.: National Institute of Mental Health. Rossi, P. (1987) Estimating the number of homeless in Chicago. Pp. 1-7 in Proceedings of the Section on Survey Research Methods: American Statistical Association. Washington, D.C.: American Statistical Association. Rossi, P. (1988) Homelessness in America: Social Research and Policy. New York: Twentieth Century Fund. Rubin, D. (1974) Estimating causal effects of treatments in randomized and nonran- domized studies. Journal of Educational Psychology 66:688-701. Rubin, D. (1977) Assignment to treatment group on the basis of a covariate. Journal of Educational Statistics 2:1-26. Ruffin, J. N., Grizzle, J. E., Hightower, N. C., McHardy, G., Shull, H., and Krisher, J. B. (1969) A cooperative double blind evaluation of gastric freezing in the treatment of duodenal ulcer. New England Journal of Medicine 281:16-19. Rutstein, D. C. (1969) The ethical design of human experiments. Daedalus 98:523-541. Schaps, E., et al. (1981) A review of 127 drug education prevention evaluations. Journal of Drug Issues 11:17-43. Schaps, E., Moskowitz, J., Condon, J., and Malvin, J. (1982) A process and outcome evaluation of a drug education course. Journal of Drug Education 12:353-364. Schinke, S., Blythe, B. J., and Gilchrist, L. (1981) Cognitive-behavioral prevention of adolescent pregnancy. Journal of Counseling Psychology 28:451-454. Schneider, A. L. (1980) Effects of status offender deinstitutionalization in Clark County, Washington. Presented at the annual convention of the American Society of Criminology, San Francisco, Calif., November 5-8. Severy, L. J., and Whitaker, J. M. (1982) Juvenile diversion: An experimental analysis of effectiveness. Evaluation Review 6:753-774. Silverstein, A. M. (1981) Pure Politics and Impure Science: The Swine Flu Affair. Baltimore, Md.: Johns Hopkins University Press. Smith, T. W. (1983) The hidden 25 percent: An analysis of nonresponse on the 1980 General Social Survey. Public Opinion Quarterly 47:386-404. Solomon, M. Z., and DeJong, W. (1986a) The impact of clinic based educational videotapes on male gonorrhea patients' knowledge and treatment behavior. Ap- pendix A in Final Report to the Center for Prevention Services, Centers for Disease Control: October 1, 1983-September 30, 1986. Educational Development Center, Newton, Mass. Solomon, M. Z., and DeJong, W. (1986b) Recent sexually transmitted disease preven- tion efforts and their implications for AIDS health education. Health Education Quarterly 13:301-316.

356 ~ LIMITING THE SPREAD OF HIV Solomon, M. Z., and DeJong, W. (1986c) STD prevention through condom use: Changing patients' knowledge, attitudes, and behavior. Appendix A in Final Report to the Center for Prevention Services, Centers for Disease Control: October 1, 1983-September 30, 1986. Educational Development Center, Newton, Mass. Solomon, M. Z., DeJong, W., and Jodrie, T. A. (1986) Improving drug-regimen adherence among patients with sexually transmitted disease. Appendix A in Final Report to the Center for Prevention Services, Centers for Disease Control: October 1, 1983-September 30, 1986. Educational Development Center, Newton, Mass. Strecher, V. J., DeVellis, B., Becker, M., Rosenstock, I. M. (1986) The role of self efficacy in achieving health behavior change. Health Education Quarterly 13:73-91. Teitlebaum, L. E. (1983) Spurious, tractible, and intractible legal problems: A positivist approach to law and social science. Pp. 11-48 in R. F. Boruch and J. S. Cecil, eds., Solutions to Ethical and Legal Problems in Social Research. New York: Academic Press. Thistlethwaite, D. L., and Campbell, D. T. (1969) Regression-discontinuity analy- sis: An alternative to the ex-post facto experiment. Journal of Experimental Psychology 51:309-317. Tobler, N. (1986) Meta-analysis of 143 adolescent drug prevention programs: Quantita- tive outcome results of program participants compared to a control or comparison group. Journal of Drug Issues 16:537-567. Trochim, W. (1984) Research Design for Program Evaluation: The Regression Dis- continuity Approach. Beverly Hills, Calif.: Sage. *U.S. Department of Education, National Institute of Education. (1977) The Joint Dissemination Review Panel Ideabook. Washington, D.C.: National Institute of Education. *U.S. General Accounting Office. (1975) Evaluation and Analysis to Support Decision- making. Washington, D.C.: U.S. General Accounting Office. *U.S. General Accounting Once. (1978) Assessing Social Program Impact Evaluations: A Checklist Approach. Washington, D.C.: U.S. General Accounting Office. U.S. General Accounting Once. (1986) Teenage Pregnancy: 500,000 Births but Few Tested Programs. PEMD-86-16 BR. Washington, D.C.: U.S. General Accounting Once. Valdiserri, R. O., Leviton, L., et al. (1986) A Randomized Trial Evaluating Two Interventions Promoting AIDS Risk Reduction (proposal submitted to CDC). University of Pittsburgh. Valdiserri, R. O., Lyter, D. W., Leviton, L. C., Stoner, K., and Silvestre, A. (1987) Applying the criteria for the development of health promotion and education to AIDS risk reduction programs for gay men. Journal of Community Health 12:199-212. Vincent, M., Clearie, A. F., and Schlucter, M. D. (1987) Reducing adolescent pregnancy through school and community-based education. Journal of the American Medical Association 257:3382-3386. Waskow, J. E. (1984) Specification of the treatment variable in the NIMH treatment of depression collaborative research program. In J. Williams and R. L. Spitzer, eds., Psychotherapy Research. New York: Guilford Press. Weinstein, N. D. (1987) Unrealistic optimism about susceptibility to health problems: Conclusions from a community-wide sample. Journal of Behavioral Medicine 10:481-500.

Next: III Impediments to Research and Intervention »

AIDS, Sexual Behavior, and Intravenous Drug Use (1989)

Chapter: 5 Evaluating the Effects of AIDS Interventions

Welcome to OpenBook!

Get Email Updates