The National Academies Press

Currently Skimming:

6 Randomized and Observational Approaches to Evaluating the Effectiveness of AIDS Prevention Programs
Pages 124-194

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 124... ... (~ addition, Appendix F presents a background paper for this chapter which provides a detailed treatment of an econometric technique known as selection modeling and its potential uses.) On January 12-13, 1990, the panel hosted a Conference on Nonexpenmental Approaches to Evaluating AIDS Prevention Programs. Read the entire page →
From page 125... ... (A variant of this problem can also arise in randomized experiments, where the attrition of respondents from expenmental and control groups can introduce an analogous selection bias.) As explained in Chapter I, selection bias can Menially be controlled by He random assignment of individuals to one group or another. Read the entire page →
From page 126... ... Thus, a simple comparison of the risk reduction behavior of participants and nonparticipants in a program can yield a misleading estimate of the true effect of the program. In addition to randomized experiments, six observational research designs will be discussed in this chapter. Read the entire page →
From page 127... ... . Specific methods include: � analysis of covanance, � structural equation modeling, and � selection modeling. Read the entire page →
From page 128... ... Research procedures that produce findings that are subject to considerable uncertainties or that provoke lengthy debates among scientists about the suitability of particular analytic models may, in the opinion of this panel, impede crucial decision making about the allocation of resources for effective AIDS prevention programs. These factors underlie the panel's preference for well-executed randomized experiments, where such experiments are feasible. Read the entire page →
From page 129... ... In properly randomized experiments, statistical significance tests can indicate whether the observed differences In group outcomes are larger than can be explained by the random differences in the groups. By providing a statistically well-grounded basis for assessing the probability that observed differences in outcomes between groups are attributable to the treatment, we-execute randomized experiments reduce ambiguity in the interpretation of findings and provide the greatest opportunity for producing clear-cut results. Read the entire page →
From page 130... ... The Power of Experiments: An Example History provides a number of examples of the interpretive difficulties Mat can attend observational studies (or compromised experiments) and the power of a well-executed randomized experiment to provide definitive results. Read the entire page →
From page 131... ... Note that the first three of these are not solely problems for experiments; they can frustrate observational studies as well. The last, however, is a special risk of randomized experiments. Read the entire page →
From page 132... ... The potential importance of tracking compliance is well illustrated by an example. Clofibrate, a drug intended to treat coronary heart disease, was compared to a placebo In a very large clinical trial, and no significant beneficial effect was found.~4 Upon later analysis, however, it was observed that those patients assigned to Clofibrate who actually took at least SO percent of their medication had a much lower five-year mortality than those in the Clofibrate group who took less than 80 percent 130n the other hand, participants may drop out because their transportation falls through or they move away, which, one might expect, would not introduce selection bias. Read the entire page →
From page 133... ... Spillover The diffusion of treatment effects throughout the population can also obscure evaluation results. A major threat to the internal validity of the randomized experiment is "spillover." This phenomenon�the communication of ideas, skills, or even outcomes from the intervention group to the control group~an result In the dilution of estimated program effects in a variety of ways. Read the entire page →
From page 134... ... En fact, when thinking about AIDS interventions, it is apparent that many educational projects, such as the media campaign, are based on a diffusion theory that assumes that interpersonal contacts are made after media exposure. ~ such cases, organizational units are appropriate to study because spiBover within units is desired. Read the entire page →
From page 135... ... For this reason, it can be a wise idea to collect data for randomized experiments as if the experiment were going to fad] in one of these ways. Read the entire page →
From page 136... ... 136 ~ EVALUATING AIDS PREVENTION PROGRAMS In either case, only good compliers are allowed into the experiments.l6 � Indoctrination involves ~ns~uct~ng individuals when they enroll in a project about the expectations for the project. By supporting participants early in the project, when attrition rates are usually highest, investigators can often foster the understanding and trust needed to keep participation and compliance rates high and maintain a weB-executed experiment. Read the entire page →
From page 137... ... has endorsed randomized experiments for such evaluations. Under clearly defined conditions, the panel has also endorsed randomized field experiments to answer the question "Does the project work? Read the entire page →
From page 138... ... The cost of inconclusive nonexperimental studies during the infant blindness epidemic is illustrative. Because considerable uncertainties attend the interpretation of the results of nonrandomized experiments, firm inferences of causality may require additional labor on the part of investigators to rule out competing explanations, and widespread acceptance of the study's conclusions may be difficult to obtain. Read the entire page →
From page 139... ... A commonly held viewpoint is that current allocations for evaluation consume money that could be used to run additional AIDS prevention projects, a perception that can foster the resentment of evaluation efforts among practitioners, sponsors, and recipients. To avoid this kind of resentment and He pressure to forgo evaluation in the near term so that projects can be deployed, we believe that the responsibility for funding evaluation should be separated from He responsibility for running He programs.~7 The pane' recommends that the Office of the Assistant Secretary for Health allocate sufficient funds in the budget of each major AIDS prevention program, including future w~de-scale programs, to implement the evaluation strategies recommended herein. Read the entire page →
From page 140... ... The panel recognizes that evaluation research of any kind may be suspect In some communities. In such situations, communities may single out randomized experimental designs as particularly unattractive, for several reasons. Read the entire page →
From page 141... ... A recent example of alternative treatments is provided by Valdisern and colleagues' (1989) evaluation of the effects of two peer-led interventions to reduce risky sexual behaviors. Read the entire page →
From page 142... ... " The pane] believes that broad and insurmountable ethical barriers to randomized experiments do not arise except when the use of no-treatment controls is considered; we note that any particular study, however, may raise idiosyncratic ethical questions that must be resolved before this recommendation can be implemented. Read the entire page →
From page 143... ... Obviously, the feasibility of a randomized experiment is greatly diminished if a candidate population has already been exposed to the 22The device used to randomize assignments should be tested, understood, and correctly used. For example, researchers must comprehend how to use published random number lists, or if a computerized random number generator is used, a new seed must be selected every time a new list of numbers is created. Read the entire page →
From page 144... ... (If records are good enough, however, another evaluation design may be possible such as the interrupted time series analysis, to be discussed in the next section on quasi-expenments.) It is because randomized experiments are not always feasible or appropriate that we look to alternative methodologies. Read the entire page →
From page 145... ... discusses the conceptual foundations, assumptions, data needs, and possible inferences of time series and regression displacement/discontinuity designs. (Some of the time series and regression examples used here involve the analysis of "natural" events; natural experiments will be further discussed In a later section.) Read the entire page →
From page 146... ... A more recent study used both single-site and multiple time series analysis to test He effects of anonymity on the number of gay men seeking HIV testing. Prior to December 1986, all public HIV testing In Oregon was done confidentially. Read the entire page →
From page 147... ... Data Needs. Compared to randomized experiments, interrupted time series designs typically require more data to infer causation. Read the entire page →
From page 148... ... Using an interrupted time series design with "switching replications,"27 investigators discovered increased larceny rates following two separate local introductions of television in 1951 and 1955. Because of a freeze on new broadcasting licenses ordered by the Federal Communications Commission, television was introduced in this country on a staggered basis: some communities gained access to television before the freeze was initiated, and others had to wait until He freeze was lifted. Read the entire page →
From page 149... ... In the context of AIDS research, the panel believes Mat multiple time series analysis may provide a useful method for evaluating the effects of community projects or the media campaign when randomized experiments are not feasible. Regression Discontinuity or Regression Displacement These quasi-experimental designs are similar in concept to Me interrupted and multiple time series designs discussed above, but they do not require 29 In a regression equation, the error tenn represents the difference between the real value of the outcome and its predicted value. Read the entire page →
From page 150... ... for a significance test for regression discontinuity.) It should be clear that many regression displacement and regression discontinuity designs completely confound their treatment (e.g., Medica~) Read the entire page →
From page 151... ... Cal ._ [L 0 a' Q ~5 of 4.2 4.0 4 RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 151 A F E! Read the entire page →
From page 152... ... Using this method, the authors found that the group receiving We intervention had a significantly lower recidivism rate. To the best of the panel's knowledge, no evaluations using these designs have been made of AIDS prevention programs, so their value has not been established in this area. Read the entire page →
From page 153... ... believes Mat quasi-experimental and nonexperimental designs may be useful in the event that randomized experiments are not feasible. Read the entire page →
From page 154... ... Such current efforts to collect data on HIV infection and on the public's knowledge, attitudes, and beliefs about AIDS provide observations Hat might be used to support interrupted tune series or regression displacement designs. The Neonatal Screening Survey. Read the entire page →
From page 155... ... It may be possible to use data from the CDC/NTH newborn survey to evaluate such an intervention with a regression displacement design, for example. A decision point for which communities receive the intervention can be established on the basis of the incidence of HIV infection found from the newborn survey. Read the entire page →
From page 156... ... I . ~ B 0 1 2 3 4 ~ 6 Seropositive Pregnancies per 1,000, August 1 990 FIGURE ~2a Hypo~encal example of regression displacement analysis of die effects of a contraception campaign anned at women at high risk for HIV they receive the intervention In August 1990. Read the entire page →
From page 157... ... Thus, we believe that a healthy amount of caution should be exercised in accepting the plausibility of the assumptions of regression displacement designs. The National Health interview Survey A second, perhaps more promising, source of data for quasi-experimentation is the National Health Interview Survey conducted by the National Center for Health Statistics. Read the entire page →
From page 158... ... recommended using the Health Interview Survey to evaluate aggregate trends in knowledge and attitudes about AIDS following exposure to phases of CDC's media campaign, particularly its public service announcements. The evaluation design proposed was a time series analysis that would monitor mends in desired outcomes over the course of the campaign period. Read the entire page →
From page 159... ... group comes about for a fortuitous reason that makes it unlikely that selection biases could operate.36 The panel has mentioned several evaluations that have used time series analysis or 36Judgments as to what is truly "fortuitous" will be open to challenge. One can, however, imagine natural experiments that arise from situations that are indeed equivalent to a Due randomized experiment. Read the entire page →
From page 160... ... Still, design flaws such as these might be handed by keeping the alternative courses as parader as possible in their contexts. Identifying Natural Experiments Finding a natural experiment or natural laboratory takes resourcefulness in identifying, defining, and recruiting the groups under analysis, as well as a bit of luck and patience. Read the entire page →
From page 161... ... Data Needs of Natural Experiments It will not be possible to confirm the assumption that groups are identical except for the intervention. It would help, however, to at least partially corroborate the assumption by examining data on pre-intervention differences between two or more sites. Read the entire page →
From page 162... ... For example, it might be possible to identify four communities that are about to receive funding for CBO projects and then match them with four similar communities.38 In such designs both the treatment group and the comparison group are often given a pretest and a posttest. It is useful to recognize this design as a special case of the multiple interrupted tune series design, where the series is constituted by only two points in time. Read the entire page →
From page 163... ... , which were designed to track Me natural history of HIV disease among gay and bisexual men, have offered a sewing in which the effects of AIDS interventions have been evaluated using matching strategies. One example of retrospective matching is provided by Fox and colleagues (1987) Read the entire page →
From page 164... ... Paul both have populations of over 2.3 million, but Baltimore had 20.3 persons with AIDS per 100,000 in 1989, and Minneapolis had 6.6. It can hardy be assumed that local needs for AIDS interventions will be the same for the two communities; likewise, the citizens of these communities should not be assumed to fee] Read the entire page →
From page 165... ... The pane] repeats its belief that the state of the art in AIDS prevention research is underdeveloped wid1 respect to predicting how people would behave in the absence of the program (e.g., changes in sexual behavior or needle use) Read the entire page →
From page 166... ... believes that longitudinal cohort studies from which participants in an intervention may be matched with nonparticipants can be rich and useful sources of information for generating hypotheses about project effects. When cohorts are sampled in multiple sites, data collection can be enhanced by coordinating instrumentation across the locations, facilitating cross-site analyses. Read the entire page →
From page 167... ... Sponsored by the National Institute on Drug Abuse, the TOPS study seeks to understand the natural history of drug users before, dunng, and after treatment (see, e.g., Hubbard et al., 1988~. The ALIVE group in Baltimore is a cohort of drug users who are being followed to leaIn more about the natural history of HIV infection In this population. Read the entire page →
From page 168... ... In this section the panel looks at three types of modeling used to eliminate selection bias and to create comparable groups: analysis of covanance, structural equation models, and selection models. Analysis of Covariance Analysis of covariance is sometimes used on data from nonrandomized (quasi-experimental) Read the entire page →
From page 169... ... Finally, it assumes that no unspecified factors exist that affect both selection and outcome. Any observational method that attempts to control for selection bias must rely on assumptions that cannot be verified (or can only be ~mperfectly vended) Read the entire page →
From page 170... ... reported small positive effects of the program on white children when he reanalyzed Barnow's data using a model that incorporated allowances for measurement error In the tests and postulated correlations between disturbance terms in a structural equation mode] (an evaluation method that will be discussed below) Read the entire page →
From page 171... ... Without strong theory to guide this selection, equivocal answers may result, the approach of measuring as many Wings as possible and adjusting as best as possible cannot be relied upon to set mugs nght. 3 Structural Equation and Selection Models Structural Equation Models Simply put, a structural equation mode} is a statistical equation or system of equations that represents a causal linkage among two or more vanables.44 In the context of an evaluation study, this procedure may use complex models of behavior to explain Me effects of an intervention on a desired outcome, as mediated Trough a series of intervening variables and as co-influenced by factors that are exogenous to the intervention. Read the entire page →
From page 172... ... . It is nonetheless this panel's judgment that it is unlikely In the near term that structural equation modeling of nonrandom~zed studies could provide a firmer basis for evaluating the effectiveness of ADDS prevention programs than well-executed randomized experiments. Read the entire page →
From page 173... ... Selection Models Another approach to nonexperimental program evaluation comes under the heading of selection modeling. Here the problem of nonequivalent comparison groups is addressed by focusing explicitly on the determinants of project participation. Read the entire page →
From page 174... ... The simplest example is one in which unmeasured differences between participants and nonparticipants are explained by preproject measures of the outcome variable. For example, if the difference between those who do and do not participate in a counseling and testing project arises solely because the participants practiced risk reduction behaviors more frequently at some point prior to Weir entry into a project, having information of their behavior at that point and controlling for it statistically might eliminate the selection bias. Read the entire page →
From page 175... ... At present there are divergent and strongly held opinions about the potential uses and misuses of these procedures, and no practical experience applying these procedures to the task of evaluating AIDS prevention programs. In the following pages, the panel briefly reviews some of the key issues that have emerged in the debate over selection models. Read the entire page →
From page 176... ... Men Women Iwo-Staze Econometnc Models; Not in earnings equation but included in participation equation: Marital status, residency in an SMSA, 1976 employment status, number of children women's AFDC status in l9iS Men Women 1976 employment status, number of children Men Women No exclusion restrictions Men Women 886 (476) Read the entire page →
From page 177... ... . LaLonde, for example, makes side-by-side comparisons of the effects of a manpower training program estimated from a randomized experiment with estimates derived from several nonexperimental models. Read the entire page →
From page 178... ... Vanant 2 assumes that changes in the outcome variable will vary for people with different characteristics; the average of sample means is shown. SOURCE: Heckman and Hotz, 1990: Tables 3 and 4. Read the entire page →
From page 179... ... How can it be known with certainty that, for example, neighborhood location is indeed independent of the unmeasured differences in the outcome variable between participants and nonparticipants? As discussed in reference to natural experiments, the plausibility of the assumption cannot be tested without gathering additiona] Read the entire page →
From page 180... ... As in many areas of statistics, a tradeoff must be made between the potential reduction in bias and the potential increase in variance associated with a method that relies on weaker assumptions. The measures proposed for historical controls In selection modeling rely on multiple observations of the outcome vanable at `different points in tone. Read the entire page →
From page 181... ... The appropriateness of this view may change as experience accumulates In evaluating AIDS programs. ~ this regard the panel notes that selection models all aim at one desideratum: consistency.49 But a consistent estimate can have so much vanability as to be of no practical value for samples of realistic size. Read the entire page →
From page 182... ... believes Cat randomized controlled experiments ought to fond the backbone of CDC's evaluation strategy, we understand that they cannot constitute the exclusive strategy. Under some conditions, randomization is precluded, and an evaluation cannot be conducted unless other methodologies are considered. Read the entire page →
From page 183... ... (As discussed under the section on randomized experiments, a sufficient number of units is needed to detect differences in outcome variables, and Be cost of delivering an intervention to the necessary number of units e.g. communities may be prohibitively expensive when appropriate level of certainty about the answers that are obtained. Read the entire page →
From page 184... ... For example, a time series analysis of trends In condom sales, visits to STD clinics, and sales Of safe sex videos or books might be implemented. Note, however, that when multiple sites are involved, the pane] Read the entire page →
From page 185... ... As discussed in Chapters 4 and 5, random assignment to an intervention or to a control group fails to meet standards of ethical propriety if resources are In ample supply to provide the intervention, it is not otherwise available, and the beneficial effects of the intervention are assumed to outweigh any negative effects. HIV testing, for example, is believed to be an effective medical care procedure, thus making a randomized no-treatment control inappropriate for estimating the effect of CDC's counseling and testing program.54 In this case, it might be possible to use a time series design to examine the effectiveness of a new counseling and testing setting on the accessibility of services. Read the entire page →
From page 186... ... Randomized Experiments Assuming that randomized controlled trials are used, the assumptions underlying the inference of effects are generally easy to verify, which will facilitate acceptance of a study's interpretation. One still needs, however, to examine the data on project participants and the project itself, to insure the internal validity of the experiment. Read the entire page →
From page 187... ... Analysis of covanance, selection models, structural equation models and other statistical techniques require assumptions that are generally expressed in formal statistical terms that are somewhat removed from everyday experience. Analysis of covanance, for instance, assumes that the relationship between outcome variables, covanates, and the treatment can be adequately and fully expressed In a particular form of (singleequation) Read the entire page →
From page 188... ... A set of six criteria developed by Hill (1971) to assess observational studies In the field of medicine are of interest. Read the entire page →
From page 189... ... At present, the panel believes the randomized experiment to be the most appropriate design for outcome evaluation, both In teas of clarity and dispatch of results, all else being equal. At the same tune, we recognize that the strategy will not always be feasible or appropriate and, for these situations, other designs may have to be deployed until evidence accumulates to make their interpretation dependable or until a randomized experiment can be conducted. Read the entire page →
From page 190... ... Presented at the NRC Conference on Nonexperimental Approaches to Evaluating AIDS Prevention Programs, Washington, D.C., January 12-13. Read the entire page →
From page 191... ... (1973) Structural Equation Models in the Social Sciences. Read the entire page →
From page 192... ... (1988) Effects of HIV antibody test knowledge on subsequent sexual behaviors in a cohort of homosexually active men. Read the entire page →
From page 193... ... Presented at the NRC Conference on Nonexperimental Approaches to Evaluating AIDS Prevention Programs, Washington, D.C., January 12-13. Read the entire page →
From page 194... ... 194 ~ EVALUATING AIDS PREVENTION PROGRAMS W~nkelstein, W., Samuel, M., Padian, N S., Wiley, J.A., Lang, W., Anderson, R Read the entire page →

From page 124...

... (~ addition, Appendix F presents a background paper for this chapter which provides a detailed treatment of an econometric technique known as selection modeling and its potential uses.) On January 12-13, 1990, the panel hosted a Conference on Nonexpenmental Approaches to Evaluating AIDS Prevention Programs.

6 Randomized and Observational Approaches to Evaluating the Effectiveness of AIDS Prevention Programs Pages 124-194

6 Randomized and Observational Approaches to Evaluating the Effectiveness of AIDS Prevention Programs
Pages 124-194