Skip to main content

Currently Skimming:

4. Evaluation Methods and Issues
Pages 54-101

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 54...
... The third part of the chapter discusses several specific evaluation methodology issues in more detail: the reliability of nonexperimental evaluation methods, statistical power in nonexperimental methods, generalizability, process and qualitative research methods to complement formal evaluation analyses, and the importance of welfare dynamics for evaluation. The fourth part assesses the evaluation projects currently under way (discussed in Chapter 2)
From page 55...
... In nonexperimental methods, the outcomes are estimated by means of a comparison group, a group of individuals that are not randomly assigned to a comparison group, but who are considered to be similar to those who received the policy. The different types of policy alternatives of interest to different audiences all fit within the counterfactual conceptual framework.
From page 56...
... The experimental method is also not well positioned to estimate so-called "entry effects," effects that occur because a policy change affects the likelihood of becoming a welfare recipient in the first place. This problem may occur because most welfare experiments draw their experimental and control samples from welfare recipients and not from individuals who are not currently receiving welfare, but who may later do so.
From page 57...
... . Nonexperimental methods are necessarily more passive they can only estimate the effects of programs and policy changes that have actually been implemented, which may not be those of greatest interest.
From page 58...
... In the latter case, the data follow the individuals or families over time before and after a policy change to see how outcomes change. These are among the weakest nonexperimental methods because outcomes change over time for many reasons other than the policy change (for example, changes in the economy and in other policies)
From page 59...
... This difference can make it difficult to distinguish "true" effects of the legislation on exit rates that is, whether it really does cause a given recipient to leave welfare sooner than she would have otherwise from spurious "selection" effects, which arise if the exit rate in the second cohort differs from the first solely because of differences in the make-up of the caseloads. Another set of nonexperimental methods enjoying some popularity are "difference-in-difference" methods.
From page 60...
... The major disadvantage of nonexperimental methods in that it is difficult to assess the degree of bias in the estimates of a policy's or a program's effects because of threats to internal validity from the choice of a comparison group. This problem has been given extensive attention in the research literature on nonexperimental evaluation methods.
From page 61...
... Nonexperimental methods are, therefore, a necessary part of welfare reform evaluation. Process Analysis and Qualitative Methods Implementation and process analyses collect information on the implementation of policy changes; how those changes are operationalized within agencies, often at the local level; what kinds of services actually get delivered and how they get delivered; and, sometimes, how clients perceive the services.
From page 62...
... To some extent, such data may simply provide a better measure of outcomes than data collected through formal survey or administrative data outcome measurements because they provide much more in-depth information on how individuals and families are affected. In principle, it is possible that formal evaluations of different programs could yield similar estimates of outcomes but that quite different outcomes would be found with the qualitative analysis.
From page 63...
... EVALUATION METHODS FOR THE QUESTIONS OF INTEREST In Chapter 3 we delineated three formal evaluation questions of interest: What are the overall effects of structural welfare reform? What are the effects of individual, broad components of a welfare reform?
From page 64...
... In such a changed environment, neither experimental methods nor most traditional nonexperimental methods can provide reliable estimates of what would have happened to individuals and families in the absence of the reform having taken place. As noted previously in our discussion of the drawbacks to experimentation when cultural effects are part of the outcome, a control group in a randomized experiment that has been chosen just prior to the initiation of the reform will almost surely be affected by the broad effects created by the reform, thereby contaminating their outcomes as representing those that would occur in the absence of reform.
From page 65...
... The second method is the difference-indifference method. The time series and cohort comparison methods would be used in combination with either aggregate data or individual data on outcomes before and after the reform and attribute the change in outcomes to the reform.
From page 66...
... In part, this is because the expected magnitude is usually large so that the biases in nonexperimental methods are outweighed by the magnitude of effects. If the estimates are interpreted as approximate effects rather than precise ones and if they are treated as having a possibly significant margin of error, they can be quite informative, particularly if large effects are detected.
From page 67...
... Their value also depends on the credibility of the comparison group, as well as whether there is a sufficiently long time series to provide a reasonably reliable indication that the outcomes of the comparison groups were not trending at different rates than those of single mothers. Aside from these two methods, there are other nonexperimental methods that could occasionally be used to evaluate overall effects, although all have disadvantages.
From page 68...
... Administrative data bases at the state level are often not available in usable individual form and sometimes do not go back far enough because some welfare agencies have not archived old records. 12 Measures of the policy environment are particularly difficult to gather 12The need to track benefit receipt to enforce the limits will presumably force states to keep records longer.
From page 69...
... Estimating the Effects of Individual Broad Reform Components The possibilities for evaluating the effects of individual broad reform components are greater than for evaluating overall effects. There are both traditional experimental and nonexperimental methods that can be used for this type of evaluation, albeit not without difficulties.
From page 70...
... As with the discussion of methods in the previous section, good data are important to strengthening the conclusions that can be drawn from the evaluation of the effects of individual broad components of welfare reforms. Data issues are typically more important for nonexperimental evaluation than for experimental i3Variations in the type of work requirements, and the type of time limit, are more common; we classify these as detailed strategies which are discussed below.
From page 71...
... Estimating the Effects of Detailed Reform Strategies The effects of detailed strategies, such as different types of work and employment strategies, different time limit structures, different sanctions rules, and other such variations are important parts of the welfare reform evaluation effort for certain audiences, as discussed in Chapter 3. For the evaluation of alternative detailed strategies, randomized experiments are generally the strongest evaluation methodology.
From page 72...
... Time-series and cohort comparison methods are unlikely to be useful for evaluating detailed strategies for the same reasons they were unlikely to be useful for evaluating broad components. They require areas where the detailed strategy is changed over time, leaving all other components of the welfare program unchanged.
From page 73...
... Conclusion 4.3 Experimental methods are a powerful tool for evaluating the effects of broad components and detailed strategies within a fixed overall reform environment and for evaluating incremental changes in welfare programs. However, experimental methods have limitations and should be complemented with nonexperimental analyses to obtain a complete picture of the effects of reform.
From page 74...
... Assessing the Reliability of Nonexperimental Evaluation Methods Given the importance of nonexperimental methods for many of the evaluation questions surrounding welfare reform, it is desirable to have methods of assessing the reliability of nonexperimental methods for their accuracy. The most important threat to the validity of nonexperimental methods is that the comparison group used is dissimilar in some respect to the group affected by the reform (i.e., that internal validity is weak)
From page 75...
... Nevertheless, specification tests are a valuable tool and can be informative for many nonexperimental methods and estimators. They are underused in welfare program evaluation, and they need to be refined and developed further for best use.
From page 76...
... Likewise, if no experiments have been conducted to estimate the effects of broad components of welfare reform, as we noted previously is the case, nonexperimental methods that aim to estimate those 18Manski (1995) has proposed that the arbitrariness inherent in sensitivity testing be replaced by construction of logical bounds within which the true effect must lie.
From page 77...
... ASPE has also shown interest in the general issue of choice of nonexperimental method, and has worked with an external group of experts to develop an approach. Much more needs to be done in this direction and more progress needs to be made given the importance of nonexperimental methods to welfare program evaluation.
From page 78...
... However, experimental methods are typically designed to consider sample size and statistical power issues up front in the design phase of the study. Nonexperimental methods for assessing the effects of PRWORA and broad components of reform must rely on existing national level surveys that are designed for more general purposes and not for the specific evaluation questions or for specific strata of the population identified here.
From page 79...
... The former used aggregate data to estimate the effect of pre-PRWORA waivers on AFDC caseloads and the latter used the CPS to estimate those effects for both AFDC participation and other outcomes. Moffitt estimated only the overall effect of welfare reform; the CEA also estimated the effect of individual broad components as well.
From page 80...
... Termination time limits Work requirement time limits Earnings disregard FIGURE 4-1 Power for Council of Economic Advisers analysis. vidual components (sanctions, time limits, etc.)
From page 81...
... Finally, Adams and Hotz consider the effect on power of doubling the CPS sample size. The power of detecting the overall effects of welfare waivers on AFDC participation nses, at Moffitt's estimated effect size, from 80 percent to more than 95 percent.
From page 82...
... Administrative data sets are larger than the CPS when pooled across states and therefore hold promise in this regard (see Chapter 5~. Conclusions 4.5 Existing household surveys are of inadequate sample size to estimate all but the largest overall effects of welfare reform on individual outcomes using cross-state comparison methods.
From page 83...
... Nonexperimental methods are typically estimated on more comprehensive groups of the population and in more areas, and sometimes capture entry and macrocultural effects. However, they too are sometimes not fully representative or complete.
From page 84...
... In the end, of course, sensitivity testing is required, as it is in all microsimulation, to determine the range of uncertainty involved in the extrapolation. Likewise, microsimulation models have the capability to generalize results to new policies not previously tested.
From page 85...
... Pure time-series predictions of how outcomes will evolve over the future, for example, is a different role, and may not be the best for current problems. Nor is microsimulation an evaluation tool itself experimental and nonexperimental methods are suitable for that function.
From page 86...
... These potential payoffs to process and qualitative studies lead the panel to conclude that both methods have important roles to play in evaluating welfare reform. Chapter 2 described several process studies that are under way in welfare policy research.
From page 87...
... Department of Health and Human Services sponsor process research in a number of service delivery areas to better understand how service delivery administrations have implemented new welfare programs and the benefits and services families and children are receiving under these new programs. Despite the growing potential for process studies to provide needed information about what is happening in program implementation, methodological improvements in how process studies are conducted are needed for them to be maximally effective.
From page 88...
... They have become increasingly popular as an aid to program evaluation and have augmented both nonexperimental and experimental welfare reform studies. They are also an important component of several of the current major studies to monitor and evaluate welfare reform (Urban Change, Three-City Study)
From page 89...
... Recommendation 4.4 Qualitative and ethnographic studies of the low-income population and its relevant subpopulations and of social service agencies that provide services to these populations are an important part of the overall welfare program evaluation framework. The panel recommends the further use of well-designed qualitative and ethnographic studies in evaluations of welfare programs to complement other evaluation methods.
From page 90...
... wage data to present the same sort of analysis conducted by Moffitt documenting the proportions of long-termers, short-termers, and cyclers but for multiple cohorts over time, thereby demonstrating how the composition of the caseload has been changing. The analysis by Ver Ploeg focuses on leaver studies by analyzing data on welfare leavers from Wisconsin to explore how leaver outcomes differ for long-termers, short-termers, and cyclers and how other aspects of a leaver analysis are affected by incorporating the caseload dynamics perspective.
From page 91...
... Thus, cyclers in his analysis seem to be the most disadvantaged of the three experience groups. Stevens used Maryland administrative data and decomposed the AFDCTANF caseload from 1985 to 1998 into the three experience groups (using all AFDC cases opened and closed sometime during this period)
From page 92...
... When looking at the differential wage and employment outcomes of leavers, she found, perhaps surprisingly, differences by the three welfare experience groups that were quite modest: virtually all three groups had employment rates of 55-65 percent and all had approximately the same level of earnings. However, there were much stronger and more marked differences in leaver outcomes by the level of past work experience.
From page 93...
... Even the first of theseleaver studies is included only for discussion purposes, for most analysts agree that they are not intended as formal evaluations, at least as presently conducted. Leaver Studies The most common type of welfare reform study is the welfare leaver study, which examines the outcomes of a group of welfare recipients who have left the 26Some of these studies, like the Urban Change Study and Three City Study, and in certain uses, the National Survey of America's Families, have evaluation components.
From page 94...
... Most multiple cohort studies take any evidence of changing outcomes for leavers over timesuch as, lower employment rates as an indication that more women with low skills are leaving the rolls over time. However, this interpretation ignores the original purpose of multiple cohort designs, which is to estimate the effect of a policy change on the outcomes that a given recipient or type of recipient would have.
From page 95...
... Studies that compare current leavers to those who left welfare prior to welfare reform and studies of diverters, applicants, and nonapplicant eligibles need more emphasis. Recommendation 4.6 More methodological research is needed to assess and improve the credibility of the multiple cohort method of evaluating the overall effects of welfare reform.
From page 96...
... The experiments that have been undertaken over the past decade have generally been aimed at estimating the overall effects of a bundle of separate welfare reforms, including work requirements, sanctions, time limits, and other provisions, all enacted and tested simultaneously. With rare exceptions, there have been no experiments that have isolated individual broad components or detailed strategies, varying each while holding all the other features of welfare reform fixed.27 Although experiments of similar policy bundles have often been tested in more than one site, there has been no attempt to coordinate those bundles in a way that would permit isolation of broad components or detailed strategies (i.e., with two sites differing only in one respect)
From page 97...
... Caseload and Other Econometric Models A number of caseload and other econometric models have been used in evaluating welfare reform, as described in Chapter 2. All of them aim to estimate the overall effects of welfare reform, and a few attempt to estimate the effects of individual broad components as well.
From page 98...
... Conclusion 4.8 Caseload and other econometric models have produced a mixed set of results, partly because of data limitations and partly because of an inherent lack of policy variability. They have done somewhat better at producing ballpark estimates of the overall effects of welfare reform than at producing estimates of the effects of individual broad components.
From page 99...
... Experimental tests of the overall effects of PRWORA have also been conducted, have many limitations. There have also been econometric estimates of the effect of individual broad components of welfare reform, but these have more serious problems than those estimating the overall effects.
From page 100...
... Recommendation 4.9 In its annual report to Congress, ASPE should review the existing landscape of evaluation methods, whether the appropriate balance of experimental and different nonexperimental methods is being achieved, and how evaluation methodology fits into its own research agenda. At the state level, the capacity to conduct evaluations is very weak, both experimental and nonexperimental evaluations.
From page 101...
... Recommendation 4.11 The panel recommends that ASPE be the primary agency responsible for synthesizing findings from studies of the consequences of changes in welfare programs.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.