2
Framework, Principles, and Designs for Evaluation

As we stressed in Chapter 1, evaluating the effects of the new welfare legislation and related welfare reform initiatives at the state and local level around the country presents many challenges to data and evaluation methodology. In this chapter we present general principles for program evaluation and the major issues that any evaluation must confront, and we outline some of the choices and alternatives that are available in an evaluation. Although these general principles are quite well known to many welfare reform researchers, we review them to emphasize that our findings on individual state-level studies, discussed in Chapter 3, are based on and naturally follow from this set of general principles governing how evaluations should be conducted. In Chapter 3 we apply the principles to the 14 specific state studies we have assessed in detail, and also, more briefly, to other welfare reform examinations under way around the country.

We focus this discussion primarily on ''impact" evaluations as opposed to "process" evaluations. Impact evaluations (sometimes called outcome evaluations) concern the outcomes of a program on recipients, such as the effects on individual employment, earnings, and family income. Process evaluations (sometimes called implementation evaluations) describe how the program services are actually provided and then assess how well the services provided match the intended purpose of a program. They also assess the degree to which a program was successfully implemented and thus aid in characterizing the policy "treatment" that the participants and potential participants actually received. Although we do not provide an extended analysis of process evaluation, we do provide a brief discussion of it at the end of the chapter, given its importance.

The report also focuses only on the effects of reform on individuals, rather



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work 2 Framework, Principles, and Designs for Evaluation As we stressed in Chapter 1, evaluating the effects of the new welfare legislation and related welfare reform initiatives at the state and local level around the country presents many challenges to data and evaluation methodology. In this chapter we present general principles for program evaluation and the major issues that any evaluation must confront, and we outline some of the choices and alternatives that are available in an evaluation. Although these general principles are quite well known to many welfare reform researchers, we review them to emphasize that our findings on individual state-level studies, discussed in Chapter 3, are based on and naturally follow from this set of general principles governing how evaluations should be conducted. In Chapter 3 we apply the principles to the 14 specific state studies we have assessed in detail, and also, more briefly, to other welfare reform examinations under way around the country. We focus this discussion primarily on ''impact" evaluations as opposed to "process" evaluations. Impact evaluations (sometimes called outcome evaluations) concern the outcomes of a program on recipients, such as the effects on individual employment, earnings, and family income. Process evaluations (sometimes called implementation evaluations) describe how the program services are actually provided and then assess how well the services provided match the intended purpose of a program. They also assess the degree to which a program was successfully implemented and thus aid in characterizing the policy "treatment" that the participants and potential participants actually received. Although we do not provide an extended analysis of process evaluation, we do provide a brief discussion of it at the end of the chapter, given its importance. The report also focuses only on the effects of reform on individuals, rather

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work than the effects of reform on government itself. One view of the purpose of the welfare reform legislation is that it was intended to change the nature of how government delivers assistance to the poor, away from a purely eligibility-oriented and check-writing function to a function of encouraging work, promoting self-sufficiency, and providing the right signals and incentives for those to occur. As Nathan and Gais (1999) have described, the reform is resulting in a major change in welfare bureaucracies. Although this is a legitimate issue, evaluating the effects of PRWORA on governments themselves requires different evaluation methods than the methods discussed in this report (although process studies, which we do discuss, are one component of such evaluations). We organize our discussion of the general principles of impact evaluation in terms of four general issues that any impact study must address; we pose each in the form of a question: What are the research and policy questions of interest, and what are the precise objectives of the study? What are the study populations of interest, and what are the outcomes of interest on those populations? What evaluation methodologies are appropriate for achieving the goals of the study? What data sources are available to the study and how can they be used? Having a solid understanding of these issues is not only important for the design of new welfare reform studies, but also for interpreting the results of those studies that are currently under way and will be issuing findings over the next few years. As Chapter 3 details, the current studies differ, often on critical dimensions, in the way in which each of the four issues listed above is addressed. Some answer different questions, many study different populations, they often use different methodologies, and they frequently use very different data. Melding the results of such a diverse set of studies into a single coherent picture of the effects of the latest wave of welfare reform is a challenge that requires a clear understanding of the issues that we discuss in this chapter. RESEARCH AND POLICY QUESTIONS AND STUDY OBJECTIVES Broadly speaking, the question of interest in all welfare reform studies is the effect of reform on adults and children. The types of reforms that are of interest and the geographic level at which these effects are assessed are major issues in the research community. One key distinction, for example, is whether interest centers on the effect of an entire "bundle" of reforms—that is, a package containing provisions for work requirements, sanctions, time limits, a particular set of support services, and other features—or whether one is interested in the effects of each component separately, holding the others fixed. Most welfare reforms that

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work have been enacted in the last 10 years are, indeed, bundles of different types of reforms, sometimes introduced by policy makers on the presumption that the collective effect of all the components together is greater than the effect of each of them separately. Policy makers often discuss the importance of changing the overall "culture" of welfare and of changing the expectations that recipients have for welfare. Changing multiple components at the same time makes such changes in culture more likely. PRWORA itself legislated multiple changes of the old AFDC system, and each state has added more components to those required by the federal law. Thus, a strong case can be made that it is the effect of the entire bundle that is of major policy interest. Yet knowing the effect of an entire bundle of reforms does not provide a very good basis for future reforms or for determining which components work and which do not. Taken literally, knowing the effect of the bundle allows policy makers to decide only whether the entire bundle turned out to be a good policy or a bad policy, on the whole, and is informative only for the decision to either continue or end the whole bundle. However, it is likely that some components have favorable effects and others have unfavorable or no effects. Determining which components should be changed requires knowledge of the effects of each of them separately. Indeed, most observers expect that when PRWORA comes up for reauthorization in 2002, it is unlikely that a simple return to the old AFDC system and the old method of financing welfare will be an active option. Rather, it is far more likely that Congress and the President will be interested in modifying the current law to eliminate or change components of the law that have been judged to be ineffective or less effective than others. Determining the effects of each component separately will probably require choosing a "base" from which each component has changed. If it is indeed the case that the bundle of reforms has a greater effect than the sum of the effects of its components, then adding or subtracting any individual component will have a different effect if none of the other components is in place than if all the components are in place at the same time. If the basic structure of reforms enacted by PRWORA is taken as the base, for example, policy interest should center on the incremental effects of each policy component, holding that basic structure in place. In addition to these issues of the inherent questions of interest, there are several practical questions about the feasibility of estimating the effects of individual components, rather than the bundle. We discuss these issues when we discuss alternative evaluation designs below. Another issue that has assumed importance in recent welfare reform discussions is the relative importance of national-level estimates and state-specific estimates of the effects of reform. Many federal policy makers and members of Congress would like to know the total effect of reform in the country as a whole. PRWORA is, after all, a federal law and was intended to change the welfare system in the entire country. Yet, other analysts argue that an average estimate is

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work not of great interest because the diversity of state reforms is so great that an average would not be a very good indicator of any particular reform package. This approach does not mean that national-level estimates are not desirable, for one may well be interested in the range of effects across states, not just the average. However, an acceptable evaluation strategy for this approach would only require estimates on a subset of the states, if that subset captured the range of different reform policies that have been tried in all the states. Yet this approach leads back to the question of what the policy of interest is, for obtaining a range of effects across a set of states leads inevitably to a search for why those effects differ. This question in turn leads to a need to determine which elements of the bundle of reforms explain the differences. Even if the aim is to obtain only a range of estimates across selected states, cross-state comparability is necessarily a major issue. While these debates occur at the federal level, at the state level there is more interest in knowing the effect of a state's own specific reforms. Because evaluation, as well as operations, have shifted so heavily toward the states and away from Washington, welfare reform analysis in the current environment is much more state focused than it has previously been. State policy makers are often interested in comparing their state's policies to those of other states, but usually they are most interested in knowing the effects of their own policies first. This focus creates some difficulties in making national-level assessments of the effects of the policies and for determining what works and what does not, as discussed further below. Yet another important issue concerning the nature of the research and policy questions that are, or should be, asked involves the distinction between evaluation studies and monitoring (or descriptive) studies. We discuss this issue below. Evaluation Studies The classic type of study enshrined in textbooks and in program project studies is the evaluation study, whose objective is to estimate the causal connection between a program or policy and its effect.1 Any study of this type must necessarily have what is known in the evaluation field as a "counterfactual": the program or policy that is being compared with the program or policy under study. By definition, when one speaks of the effect of a new welfare reform program or policy, one must say what that effect is relative to; the latter is the counterfactual. The most common counterfactual is simply the program that existed prior to the program under study, which in most cases for TANF or an AFDC waiver is the basic AFDC program in a state prior to the introduction of the state's waiver 1    Throughout this report, "evaluation" refers to such an assessment of the effects of the program or policy.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work or PRWORA program. The effect of a program is generally taken to mean its effect relative to what existed before.2 While this is indeed the generally accepted counterfactual in current discussions, it should be noted that there are other counterfactuals of interest. One is a program bundle that is the same except for one changed feature. As we noted above, this may be the most important knowledge for incremental reform, where the alternative policy is not a return to AFDC, but a modification of current policy. Studies that consider this type of comparison will necessarily have an evaluation design that permits the estimation of the impact of the counterfactual. A program that modifies one element of a bundle is one example of such a counterfactual. A counterfactual could also be another state's program or policy. Monitoring and Descriptive Studies Many of the welfare reform studies currently under way have less ambitious goals than evaluation. These studies typically characterize their goals as "description" or "monitoring." A descriptive study is one that simply describes the characteristics of a population group relevant for policy—such as welfare leavers, welfare applicants, welfare eligibles, or just low-income families—and focuses on their levels of well-being. A monitoring study is one that follows such a population group over time, periodically describing and measuring its well-being along general and specific dimensions. In both descriptive and monitoring studies, there is no attempt to isolate the precise cause of the individual and family outcomes. No attempt is made to determine how much of the change (in the case of a monitoring study) is the result of welfare reform and how much is the result of other, simultaneous forces, such as trends in the economic environment. The monitoring approach is very closely related to a classic method known as a before-and-after, or pre-post, design, which we discuss below when we review alternative methodologies for conducting an evaluation. A before-and-after design uses roughly the same data strategy as a monitoring study, namely, the collection of data on outcomes before and after a policy change. However, in a before-and-after design the family and individual outcomes in the "after" phase are intended to be causally related to the policy. A design of this type can be distinguished from a monitoring study if it includes a strong analysis of the influence of alternative, simultaneously occurring forces, such as social and economic trends (e.g., changes in the unemployment rate) that may have been contributing to the trends in outcomes as well as policy. (Because this separation of policy effects and the effects of other forces is so difficult, before-and-after designs are one of the least desirable types of evaluation methodologies, as we 2    To be precise, given time and changes in a state's economic and social environment, the counterfactual is usually defined not as AFDC at a prior time, but as what the effects of AFDC would be in the current environment had it continued.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work discuss further below.) This kind of analysis is usually missing in a monitoring study. Also, some monitoring studies have no data from the period prior to the policy change and so are clearly distinguished from a before-and-after evaluation design. Describing and monitoring the populations of interest are, arguably, the necessary first steps prior to conducting an evaluation. Descriptive and monitoring studies can have tremendous utility in situations in which relatively little information on the recipient population is available. Studies of this type can be informative both to program managers and to the general policy-making and research community because the information gathered can be an indicator of the well-being of the target population intended to be served by the program, and whether that well-being is going up or down, or remaining unchanged. For example, descriptive studies can determine how many welfare leavers are in economic distress and can identify the existence of particular barriers to employment, such as health status, transportation needs, and access to child care. Ultimately, however, in order to learn the effects of a policy change, description and monitoring need to be followed by evaluation. Without evaluation, nothing can be firmly known about why the well-being of the population is changing the way it is. More important, if that well-being is deteriorating, even for only a minority of the population, a descriptive or monitoring study provides no guidance on how to reverse that trend and increase well-being, because nothing has been firmly learned about its causes. Thus, very little guidance can be given to policy makers regarding whether a policy should be modified. A potential danger of monitoring studies as well is that they are often misinterpreted as representing the results of a before-and-after design. Even though a monitoring study may carefully note that it has not established any cause-and-effect conclusions, the results may nevertheless be incorrectly labeled by others as demonstrating the effect of policy changes. This often occurs because many monitoring studies do not explicitly state the purpose of the study as monitoring, making the results easily interpreted as the results of a before-and-after evaluation. Given the weaknesses of the before-and-after methodology, such misinterpretations pose risks to good policy conclusions. Monitoring studies are sometimes justified as useful in establishing a baseline for the evaluation of future policy changes. For example, welfare reform in most states is, at this writing, still evolving: any data collection (or monitoring) effort under way can be viewed as establishing a baseline that can be compared with later outcomes. 3 A yet more long-run view is that PRWORA will, most likely, be modified, even if in only minor ways, so current monitoring studies can be viewed as establishing a baseline prior to those modifications. These interpreta- 3    It is possible, however, that such a baseline may have already missed certain attitude and perception changes, such as an increase in the stigma associated with welfare receipt, that were the result of the national debate and media attention on welfare reform prior to the law's enactment.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work tions and justifications for monitoring studies led to more general issues concerning the desirability of investments in data and in knowledge infrastructure as the basis for research and evaluation in the future, possibly as part of a general building of data infrastructure. STUDY POPULATIONS OF INTEREST In the broadest sense, the population of interest in any welfare reform study is the low-income or poor population in the United States. However, most welfare reform studies have a narrower focus because most policies and programs are aimed at a particular target population, usually the families or individuals that are eligible for program services. There is some danger in focusing only on the eligible population, however, because who is eligible and who is not can change over time, resulting in a shifting population of interest. Another complicating factor is that families sometimes have the ability to alter their behavior in order to make themselves eligible: for example, by spending down their asset levels. Nevertheless, eligible families are the first population of interest. Many welfare reform studies are even narrower in their focus, concentrating instead on the population of program participants, usually those who are receiving benefits at a particular time or at two or three different times. Such a focus comes naturally because participants are those actually receiving program benefits and services. Yet this focus runs the risk of missing important responses to a reform. Who is receiving benefits at any time may change, sometimes because of external changes in the socioeconomic or demographic environment and sometimes because of behavioral responses to the policy change itself. In either case, the types of individuals who are program participants can change in ways that will affect the findings of the study or at least that will require a careful delineation of what the study shows and what it does not. Studies of Recipients: Caseload Dynamics The principle that studying a population composed only of a sample of those on the rolls at a particular time is of great relevance and importance, yet it presents certain risks. Given its importance and the risks, an extended discussion of its different aspects is warranted. A useful perspective on the determinants of who is a participant and who is not is furnished by the framework of caseload dynamics, which views the caseload in a program as a fluid, ever-changing mix of families and individuals who move in and out of the program, possibly at frequent intervals. First-time entry by a TANF recipient, for example, occurs when a family suffers a drop in income, a woman has a nonmarital birth or experiences a divorce or separation, or some combination of these or other factors. Thus, first entry begins a recipient's experience with the system. First exit occurs when the recipient finds a job or gets married, when her child ages out of the age range of

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work eligibility, or any of a host of other events that leads the recipient to attempt self-sufficiency. Reentry occurs for those who are unsuccessful in obtaining self-sufficiency, even if temporarily, and who therefore return to the program for another period of benefit receipt, possibly because of the loss of a job, the dissolution of a marital or nonmarital union, or some other event. At any time, the caseload of a program is composed of families who are first-time entrants, as well as reentrants, and who have been on the rolls for varying lengths of time. The caseload dynamics perspective distinguishes between short-termers, cyclers, and long-termers. Short-termers, the least disadvantaged of the three, have only a brief experience with the welfare system and are, for the most part, relatively independent of welfare over their lifetimes. In contrast, cyclers move on and off the welfare rolls periodically and end up, over time, with a long-term dependence on the system for repeated assistance, being unable to achieve self-sufficiency. Long-termers, the most disadvantaged of the three, have long spells on welfare uninterrupted by time off the rolls, and have the heaviest dependence on the welfare system for support. These distinctions are important because research has shown—and intuition supports—that the different degrees of dependence on the welfare system are correlated with individual, family, and community characteristics. These characteristics include a recipient's level of education, work experience, physical and mental health status, history of drug abuse, past history of nonmarital childbearing, and family background and how well it has prepared the recipient for adulthood; the family, social, and community networks available to the recipient; the neighborhood environment from which the recipient comes; her exposure to others with social difficulties; and related factors. Among the types of recipients, short-termers are typically the best off, with relatively good educational and work backgrounds and a relative lack of severe health problems, and who come from better-off family and neighborhood backgrounds than other recipients. Long-termers are typically the worst off, with relatively poor educational and work backgrounds, often with a history of health problems and drug abuse, and with a history of unstable marital or other partner relationships. Cyclers are in the middle, ranked somewhere between the short-termers and long-termers in these respects; they may have some job market skills and some family or community support, for example, but not enough for permanent self-sufficiency. Among all families in a low-income welfare-eligible population, participants are, on average, worse off than those who are eligible nonparticipants. More important, as the socioeconomic and policy environments change, families move into and out of participation depending on their characteristics and situations. As the economy improves, for example, as it has in recent years, recipients who are better off in general and have greater skill potential tend to leave the program, so the worst-off cases remain. Thus, the caseload becomes increasingly composed of long-termers who have the greatest number of difficulties (sometimes also called the hard-to-serve). Not only do the exit rates of the better-off families

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work increase, but the first-time entry and reentry rates of such families also decline as individuals who have better income potential or networks of support are less likely to lose their jobs or supports and become participants. These changes reinforce the change in the composition of the caseload. Similarly, when policies change (such as those enacted in recent welfare reform), better-off recipients are likely to leave the program as they find jobs or other supports, and they are less likely to enter the program for the same reason; both exit rates and entry rates are affected, changing the composition of recipients. Some policy reforms, such as work requirements, have the net effect of encouraging recipients to leave welfare and discouraging them to enter welfare. Other reforms, such as time limits and sanctions and diversion, literally push recipients out of programs or prevent them from entering. These latter reforms provide a possible exception to the rule that it is always the better-off families that tend to be the first to leave or to fail to enter: for example, in some states the evidence suggests that sanctioned families tend to be among the worst-off cases. In other words, families that are relatively better off will be more likely to voluntarily leave programs, while those who are relatively worse off will be more likely to involuntarily leave programs. Implications What are the implications of a caseload dynamics perspective for the study of welfare reform policy changes? The major implication is that, while it is easy for a study to define its population of interest as recipients at one particular time, the resulting estimates of the policy effects on that population may not generalize to any other time or any other place. This limitation is because the composition of the recipient population (e.g., among long-termers, short-termers, and cyclers) changes in response to the state of the economy, the prior policies in place, and the nature of the eligible population from which recipients are drawn. This caseload dynamic is especially important for current studies of recipients or former recipients because there has been a significant decrease in the number of families receiving welfare since 1994. Between 1994 and December 1998, the number of families receiving AFDC/TANF declined from just over 5 million to 2.8 million (U.S. Department of Health and Human Services, 1998). A study of former recipients in 1998 is likely to show very different outcomes than a study of former recipients in 1994 because the caseload in 1994 was likely to have been composed of recipients with a greater mix of self-sufficiency levels (skills, education, and work experience) and of recipiency histories (long-termers, cyclers, and first-timers) than the caseload in 1998, which probably had less variation in self-sufficiency level and recipiency history and was likely composed of harder-to-serve recipients. Equally important, the effects of policy reforms are likely to be different in any comparison where the caseload composition has changed. For example, the

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work effect of imposing stricter work requirements would have been different if imposed in 1994 than in 1998, and, in turn, is likely to be different in a future time with a higher unemployment rate than in 1998. The effect of stricter work requirements in different states is also likely to be different if their unemployment rates are different or if their caseload compositions (short-termers, cyclers, and long-termers) is different for other reasons. The effects of time limits, sanctions, family caps, and other reforms is also likely to depend on caseload composition. The lesson of a caseload dynamics perspective for studying welfare reform is, at a minimum, that the findings of any particular study of recipients must be carefully described as pertaining to the particular population at that particular time. A more proactive lesson is that a good welfare reform study should distinguish between different types of recipients in describing its results. This critical element is a first step toward comparability across studies in different states and localities and across studies in the same state or locality at different times. Thus, all results and findings should be stratified by whether the recipients were long-termers, cyclers, or short-termers and by the other individual and neighborhood dimensions mentioned above. Distinguishing between groups should take place when measuring outcomes, such as earnings and income, either among those still on the rolls or welfare leavers. Adequate stratification along various dimensions has clear implications for the types of data needed for the study as well, which we discuss further below. Another implication of these principles concerns studies that examine only welfare leavers. Given the importance of first entry and reentry in the response to welfare reform, a study that intends to capture the full effects of the reform on the eligible population has to move beyond the examination of only leavers to an examination of the decisions of eligible nonparticipants, including their entry decisions. It is important to recognize that changes in welfare programs may affect the decisions of potentially eligible families before they apply or reapply for benefits. First, individuals who may be eligible for the program may not understand the new rules and, hence, may believe that they are no longer eligible to receive assistance. Second, some agencies have implemented formal diversion programs, which commonly offer a lump-sum payment or support services, such as job search support or transportation support, in exchange for not enrolling for the cash assistance program. Furthermore, some agencies are directly or indirectly sending signals to potential clients that the emphasis of welfare is now on employment and self-sufficiency and that more will be expected of them if they enroll in the cash assistance program, a sort of informal diversion program. In some cases, the names of the programs and agencies are the signals of a focus on employment and self-sufficiency. The leading signal to potential welfare recipients that agencies, politicians, and the media have been sending is that work effort is expected (Nathan and Gais, 1999). Both formal and informal diversion programs aimed at reducing entry to welfare programs are important to understand in evaluating the entry effects of

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work welfare reform. These diversion programs are also being implemented to stem reentry onto welfare for those who have voluntarily left or were sanctioned off of welfare. Understanding how these diversion programs, in conjunction with sanction policies, act to permanently keep potential recipients off welfare is also important in assessing the effects of time limits on permanently removing or keeping people from assistance. Studies of the broad effects of welfare reform should also seek to understand the behavioral responses of individuals who make themselves eligible or ineligible for participation, for example, by changes in marital status. Beyond these broad issues of the study population, there are of course many important subpopulations of interest in most welfare reform studies. Most welfare reform studies differentiate carefully between unemployed-parent and single-parent cases, teenage parent and older parent cases, child-only and non-child-only cases, and a variety of different programmatic categories (by age of children, for example). The subpopulations of interest in any particular study depend on the policy and program of interest and on which subpopulations are differentially treated by the policy.4 OUTCOMES AND TIME FRAMES The outcomes of interest in welfare evaluations vary widely: a comprehensive list of all possible outcomes of interest would be quite long. From a programmatic perspective, the effect of the reform on caseloads or, at the family level, on participation rates and recipient rates are clearly of key interest. The implications of caseload changes for costs, including costs net of the expense of operating and implementing the policy change, are usually also of interest to administrators and legislators. The policy and research community is interested in overall trends in family well-being and in how the reforms affect overall trends in poverty. The policy and research community is also often interested in the outcomes of those who begin or end participation because of a policy change. The typical outcomes considered are the employment and earnings of the mother or responsible adult in the case. Shared family or household income is also of interest, especially for former recipients who marry, or who, with their children, move in with or share supports with kin or friends, all of which are outcomes of interest themselves. The extent to which nonparticipant low-income families (because of program exit or failure to enter) rely on other programs or on families and friends for support is also an important question of interest. Dependence on other government programs (such as food stamps) implies that families are not 4    The PRWORA policy changes targeted at specific subpopulations are too numerous to explain in detail. However, for an example, a study might focus on the well-being of unmarried minor teenage parents and their children who, under the new rules, must live with an adult or in an adult-supervised setting and must participate in educational and training activities.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work A significant difficulty with UI earnings data is that they do not cover the entire workforce: they exclude many government workers, domestic workers, informal and temporary workers, and individuals in the underground economy. They also pertain only to individuals and not to families, and unearned income in general is not available from the records. Yet another issue is that earnings are reported only quarterly, which can create some difficulties for matching to monthly or weekly welfare participation or other records. Furthermore, in order to track workers who live in one state but work in another, a state would have to obtain the UI records of its neighboring state. One new potential source that might be useful in tracking workers across states is the Expanded Federal Parent Locator Service (EFPLS), which contains the National Directory of New Hires. The National Directory of New Hires contains quarterly reports from all states on wage and unemployment compensations of newly hired workers in a state. Many federal agencies, whose workers are not covered in UI reporting, will be reporting this information under EFPLS. Making the data available for research purposes could help analysts track employment outcomes of welfare recipients and potential recipients. Data from income tax records is another potential source of data for conducting evaluations. State tax records are likely to have wider coverage of the workforce than UI earnings data and will also have wider coverage of unearned income and the earnings of spouses. There are, however, serious privacy and confidentiality barriers to obtaining the use of tax records. Each of these data sources is susceptible to not covering the entire population of interest, at least to some degree. Some low-income individuals may not show up in any of the data sets, especially if they are not working in the formal economy and would not be covered under unemployment insurance or file tax returns. These individuals may be the worst-off cases in terms of formal labor market job skills, and missing them in an analysis could limit the generalizability of results. Administrative data are usually quite weak on socioeconomic characteristics of the recipient because that information is not generally needed to determine eligibility for a benefit or to judge compliance with requirements. Consequently, information on education, occupation, marital status, and other basic characteristics is usually not available from administrative data. However, linking administrative data sets can improve the coverage of socioeconomic characteristics of individuals and families because different programs need different information about a potential recipient to judge eligibility or compliance. Nonprogrammatic sources of data can increase coverage of socioeconomic characteristics: for example, vital statistics birth records can be used to monitor and understand fertility decisions. However, linking individual administrative data sets can be difficult because there is often not a unique identifier for each case and because other identifying variables (names, Social Security numbers, or birth dates) can be incorrectly recorded. Significant strides in the area of probabilistic record

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work matching, a technique that calculates a probability that two records with separate identifying information (name and birth date for example) are actually from the same person, have been made and can help address this problem. Finally, administrative data present significant difficulties if attempts are made to compare them across states. In many instances—for example, the many welfare leaver studies conducted in different states (see Chapter 3)—one may want to know if an outcome in one state is comparable to that in another (e.g., if a 50% employment rate among welfare leavers is really double the 25% rate in a different state). Unfortunately, data from administrative records are often not comparable because of variations in the definition of what a case is, what a program is, and how a case is tracked with administrative data. Different concepts are often used for variables with the same label, and the classification schemes used for recipients may be quite different. This variation has always existed, but it is growing with the devolution of program design to the states and the increased variety of types of programs across the country. This variation presents a serious challenge to making cross-state comparisons with administrative data. Survey Data Survey data have important advantages over administrative data. General household surveys contain information on family structure, family income, earnings in all sectors, hours of work, and all other major socioeconomic and demographic characteristics. Often, earnings and wages are available at relatively short time intervals. In addition, perhaps most importantly, general population surveys have information on individuals and families when they are not receiving welfare benefits, and thus can be used to assess well-being and to measure behavior during those periods. One source of household survey data are the national-level surveys, such as the Survey of Program Dynamics (SPD), Survey of Income and Program Participation (SIPP), Current Population Survey (CPS), Panel Study of Income Dynamics (PSID), and National Longitudinal Survey of Youth (NLSY). As we noted above, these data sets have a potential role to play in obtaining national-level estimates of the impact of welfare reform. Unfortunately, the usefulness of these surveys for the purpose of welfare program evaluation is significantly threatened by three factors. One is that most national surveys do not have very large sample sizes on the populations of interest in welfare reform. Even the CPS, the largest of the data sets, runs into potential sample size problems if an analysis is restricted to, say, less educated single mothers and conducted separately by race and ethnic group. A second drawback is that using national surveys to assess welfare reform requires that welfare rules be known for each state in a comparable form, and there have been, thus far, limits to the extent to which such information is col

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work lected and made available (see below).9 A third drawback is that most national household surveys collect only general socioeconomic information and do not obtain all the information from a respondent that welfare studies need, such as the respondent's history of receipt of welfare and other government program benefits, detailed accounts of sources of support, and the characteristics of the neighborhood in which the respondent lives. From the point of view of state-level evaluations, new household surveys of the population are an option that can be considered. The major barrier to their use is their significant expense. Fielding a survey is a major operation and can be quite costly, particularly if interviews are conducted in person rather than over the telephone. Survey expenses are also quite high if the sample is generated by screening at the household door, because considerable effort is required to locate the target sample. A frequently used alternative in welfare evaluations is to gather administrative data from welfare or other programs to generate a sample of current or former welfare recipients. The major disadvantage to such list frames is their partial coverage of the population, because many families who are not receiving welfare benefits will not be included in such administrative data. An additional difficulty is that, although forming a sample from administrative data lowers screening costs, locating and tracking former recipients (e.g., obtaining current addresses or telephone numbers) can also be time-consuming and expensive. In addition to the expense of household surveys, nonresponse10 and misreporting problems raise issues that can be difficult to address. Nonresponse in most household surveys is not random, and a low response rate in a survey leaves the potential for systematic bias due to nonresponse. Nonresponse rates can be particularly high in telephone surveys of low-income populations. Nonresponse rates in telephone surveys in general have grown with the increase in telemarketing and other factors (e.g., extensive polling, use of answering machines and other call-screening devices). Furthermore, the fraction of the low-income population without telephones or with disconnected telephone service, and the fraction who change telephone numbers frequently, is relatively high, leaving the potential for considerable sampling frame bias. Yet telephone surveys are often used because they are less expensive than in-person surveys. Indeed, there are serious tradeoff concerns between obtaining high-quality, high response-rate survey data with the limited resources of many states for data collection. This 9    The lack of comparable information on policy variables across states is also a problem when comparing outcomes across states using administrative data, and hence it is not inherently a problem with survey data. However, state-level administrative data can be used for state-level evaluations and hence is of some usefulness—at least for estimating the effect of the bundle of reforms. 10    Nonresponse can be by the respondent to a whole survey (unit nonresponse) or to one or more questions on a survey (item nonresponse).

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work tradeoff may lead to a need to conduct smaller-scale surveys in order to keep quality standards sufficiently high. Misreporting and underreporting of events and program participation is also a problem and is also difficult to detect. One of the main ways of detecting response errors is, in fact, the use of the administrative data discussed above, through cross-checking information with survey data. For example, TANF receipt information from survey data can be cross-checked with administrative records. Administrative data can also be used to gather missing information from survey nonrespondents, for example, earnings from UI records. Such data can also help detect any nonresponse bias11 in the surveys. The use of administrative data for detecting nonresponse is limited by the coverage of administrative data sets (e.g., UI data only cover those who are employed in the formal economy). However, the potential use of administrative data for this purpose is worth serious consideration. A more analytic difficulty with survey data is that they generally cannot be used to gather much retrospective information on earnings, employment, and welfare and other program participation while ensuring accurate answers. Consequently, historical information is difficult to obtain. This is a problem for welfare program evaluation, given that most evaluation methodologies require information on behavior and outcomes prior to the policy change as well as after the change. Most state-level surveys begin long after a new policy is in place, leaving the study without a pre-change, or baseline, measure. In contrast, with administrative data, the likelihood of the availability of at least some historical data is much greater. Other difficulties with survey data result from attempts to reinterview respondents over periodic intervals and hence create a longitudinal, or panel, data set. While the average cost of interviewing a family a second time is much less than the cost of locating and interviewing a family for the first time, a small fraction of families who move or who are difficult to locate at a later time can generate very high expenses for the data collectors. Nonresponse in a longitudinal context can be a problem as well, because the ability to locate and reinterview a family may be correlated with the values of the outcome variables of interest (employment, earnings, program participation, etc.) for assessing new welfare policies. Consequently, issues of nonresponse bias again appear. Another complication for panel data sets is following all family members when families split up. To track outcomes, especially for children, it is critically important to collect data on all members of the original family. For instance, one may want to evaluate the outcomes of children who have been separated from their families because of hardships. To do so would require following the children in the 11    Nonresponse bias is a systematic difference in the characteristics of respondents and non-respondents.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work family as well as the adult member(s) of the family, but it is often difficult to do so after the family has split, and it can be costly. Despite this rather long list of disadvantages, survey data nevertheless have strong advantages and must be considered part of the data collection strategy of any welfare reform study that desires a reasonably comprehensive picture of families and individuals who are not participating in welfare programs. Linking Administrative and Survey Data Linking administrative data sets to survey data sets offers the potential to take advantage of the features of both types of data. Surveys can gather information on program participants when they are not receiving benefits and can also supplement administrative data in gathering information on demographic and background characteristics of the populations of interest. Surveys can also collect data on the entire household and on informal sources of support. Administrative data, in contrast, can provide reliable data on program participation, potentially for long periods of time, and information on how recipients are treated by the program. Linked administrative data can provide information on the services recipients receive while they are on welfare, such as work supports under Welfare to Work, job training, and job search services. Linked administrative data can also be used to track a recipient's or former recipient's dependency on other social welfare programs, such as public housing, food stamps, and others. The use of administrative data can also reduce the costs of collecting data that would otherwise be obtained with a survey, such as date of birth. Information on common items available in both sources can be used to check data quality. Administrative data (e.g., on UI earnings and employment) can be used to assess the seriousness of any bias from nonresponse in a survey. As we stated in the opening of this section, the appropriate data sources for an evaluation depend on the evaluation methodology chosen. A monitoring study would be more effective if based on a linked administrative-survey data set on families over time. Before-and-after studies require historical data at either the individual level or aggregated at the state (or relevant policy area) level in order to account for changes in the external environment that may change individual welfare recipiency, and linked administrative-survey data sets would also make this type of study more effective. The pure cross-section design, the combination of cross-section and before-and-after design, and cohort designs (at least across states) also require considerable knowledge of participation histories and demographic and socioeconomic characteristics, but in these cases, information is required across different states. A challenge to the use of administrative data for these types of designs is whether cross-state comparability of administrative data is sufficient to make these methods possible (see Hotz et al., 1998, for a discussion of such comparability). Linking survey data to administrative data has thus far been on a state level,

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work probably because state agencies conducting welfare reform evaluations have easy access to administrative records. The extent to which survey and administrative data can be linked on a national-level basis with national household surveys remains to be seen. As we explain in Chapter 3, the Census Bureau, with support from ASPE, is looking into the feasibility of matching social security records to the SIPP and SPD data. Privacy and confidentiality are significant concerns for the development and linkage of administrative data sets and for survey data sets linked to administrative data sets. These concerns may limit the access outside researchers have to the data. The issue is also of concern for survey data, but is typically addressed through informed consent agreements and data masking procedures. Techniques and protocols for ensuring confidentiality and privacy continue to develop and will need to be developed further if linked data are to be more widely accessible. Data Providing Descriptions of Programs A third type of data, less often discussed, is that describing the welfare reform itself. Although it is commonly assumed that such data must necessarily be available, lack of accurate information about program rules and provisions has developed into a problem in current welfare reform efforts, and it is therefore necessary to note that collection of descriptive program data requires an independent effort. Prior to the wave of welfare reform that began in the early 1990s, all state AFDC programs had the same approximate structure, with a relatively similar set of rules governing eligibility and benefit computation. States had considerable leeway in setting benefit levels, but most other characteristics of the program were heavily regulated by the federal government, operating under the provisions of the Social Security Act, court interpretations of that act, and administrative decisions. States were required to report to the federal government the provisions of their state AFDC plans, their benefit levels, and a wide variety of other information to ensure that they were in compliance. In addition, because the matching-grant structure of the federal financial support for the system required information on average benefit levels in the states, those had to be reported as well. Requirements for reporting program rules to the federal government have changed greatly under PRWORA. Federal regulations include a requirement that states must provide an annual report on the characteristics of their TANF program rules. However, how these characteristics are reported is not standardized, and the wide variation in policy across states makes standardized reporting more difficult. The reporting requirements are fairly open-ended, possibly diminishing the usefulness of the data provided in these reports. States can use varying definitions in reporting and are likely to report only what is strictly defined in the final regulations since there are no incentives to report any other information and no funds from the federal government to do so, as there were under AFDC.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work Furthermore, it is not clear from the final regulations how states that have given counties authority to set their own program rules will report the program rules, though clearly the states and the federal government have an interest in knowing what these county rules are. Many states have their own state programs for low-income populations. For such programs, states are only required to report the characteristics of programs that use federal maintenance-of-effort funds that are provided for under PRWORA;12 states do not have to report rules of separate state programs that are funded from other state sources. But to evaluate the effects of the PRWORA legislation, it would be necessary to understand how these separate state programs interact with the federal requirements of TANF. For example, Illinois is using its own funds to pay benefits to recipients in months when they are working at least 25 hours per week, but receiving these benefits does not count against the 5-year time limit on receiving benefits (Illinois Department of Human Services, 1999). It is difficult to judge whether the requirements of states to report on program rules will be comprehensive and standardized enough for use in evaluations. A separate effort is being made along these lines by the Urban Institute, under contract to DHHS. The Urban Institute is collecting information on the TANF rules for all the states for 1996–1998 and is attempting to classify the rules in a typology that could allow state comparisons. A list of the summary categories of rules that will be collected in the project is shown in Box 2-1. This is an important effort that should be strongly encouraged and considerably broadened. The pace of the effort is discouragingly slow, given that PRWORA was passed in August 1996. The work deserves support to produce information on a more timely basis, and a long-run institutional commitment is required to ensure that this information will be forthcoming on a regular basis in the future. The current lack of information on state policies also poses a significant problem to any welfare reform evaluation that attempts to make cross-state comparisons. As discussed above, several of the major evaluation methodologies require such comparisons. Without reliable information on the programs enacted by the states and how they are changing over time, at a level of detail permitting accurate comparisons of how different states have approached the various major categories of reform policy (time limits, work requirements, sanctions, diversion, family caps, and so on), it is unlikely that credible cross-state comparisons will be possible. This would be an unfortunate outcome because the various policies adopted by the different states offer a valuable source of variation for estimating the effects of welfare policies. 12    States are not required to report very many details of the characteristics of these state programs.

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work PROCESS EVALUATIONS As we note at the beginning of this chapter, process evaluation plays an important role in supplementing and complementing outcome evaluation. Documenting the written program rules in each state is the essential first step to understanding the policies that face program participants and potential program participants. A further step toward fully understanding the treatment is to document how the written rules are actually implemented. Such studies are generally referred to as process, or implementation, evaluations. Process evaluations describe how program rules are operationalized and how the services are actually delivered. Implementation information is gathered by

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work visiting program offices (often across multiple service delivery areas), interviewing caseworkers, surveying administrators, directly observing client and caseworker interactions, or reviewing documentation of individual cases. Process evaluations can be used for administrative purposes, such as assessing caseworker and administrator performance, determining whether the intended policies are actually being implemented, or as an example of how services are provided in one area. Process evaluations can also be used in conjunction with outcome evaluations by linking the exposure individuals had to the program to the effects of policies on individuals. This use of process evaluations is the most relevant for the purposes of this report. Although it is always possible that a gap between the written policy and the

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work implementation of the policy exists, process analyses are particularly important in the post-PRWORA policy setting because there is greater variation in program rules and because responsibility for program design and administration has devolved to state and local levels. There is now more room for differential implementation of policies across service delivery areas because local welfare offices have more control over service provision than in the AFDC program. AFDC was an entitlement program in which caseworkers were basically charged only with determining eligibility and benefit levels, and there were quality control measures taken to ensure that eligibility and benefit calculations were implemented consistently. Now, however, local welfare offices are increasingly becoming integrated with other social program offices so that caseworkers serve as gatekeepers to a variety of services (job training, job search, transportation benefits, and child care benefits, all in addition to cash assistance). Understanding how integrated these services are in each service delivery area and how clients are treated is an important component of assessing the treatment and, subsequently, in drawing conclusions about the effects of the treatment. Consider the following example given in Corbett (1998). A new policy that many states have implemented is a diversion payment, a lump-sum payment given to cash assistance applicants in exchange for not enrolling in the continuing cash assistance program. One local agency may encourage applicants to take the diversion payment, while another agency may just mention the payment in passing. In order to evaluate the effect of the diversion payment on TANF participation (and in the gatekeeper setting, on other social program participation), an evaluation study would need to understand the degree to which clients were aware of and pushed toward taking the diversion payment. Process evaluations may also be useful in understanding how other social welfare programs have been affected by the change in cash assistance rules. For example, some administrative offices may direct potential cash assistance applicants or current recipients to other programs, such as food stamps, while other administrative offices may discourage the receipt of any form of assistance. Both possible cases would have implications for participation in other social welfare programs. While the necessity for conducting process evaluations is apparent, it is not always apparent how the results can be integrated with outcome evaluations. Studies that span many service delivery areas present especially difficult problems, because in order to link program implementations to individual case outcomes, specific information for each office from which cases in the sample receive services must be known. It is less difficult to link implementation results to outcome studies if the study sample covers only a few service delivery areas and implementations in only these areas must be assessed. Keeping up-to-date information on program implementations so that they are relevant to the study period is another challenge to effectively using process evaluations in conjunc-

OCR for page 16
Evaluating Welfare Reform: A Framework and Review of Current Work tion with outcome evaluations. Efforts to address these challenges deserve further attention in the evaluation research community. CONCLUSIONS The study of welfare reform and the evaluation of its effects presents many challenges. Examining the effect of complex bundles of individual reform programs, determining the influence of the composition of the welfare caseload on measured outcomes, developing a credible comparison group for those affected by welfare reform, and constructing an adequate database for measuring outcomes, as well as data describing policies across states, require thoughtful study designs as well as considerable resources. We conclude that while nonexperimental methodologies for evaluation have become the dominant method of evaluation at the current time, experimental methodologies still have a role to play and should be kept on the table as one means of evaluation. We conclude that monitoring and descriptive studies of welfare reform are important, but that evaluation studies—which estimate the effect of a program reform—should be the ultimate goal of welfare reform research. We emphasize that there is a role for both national-level welfare reform evaluation, which yields a comprehensive assessment of the effects of reform in all the states around the country, and for purely state-level studies, which yield estimates for individual states. Regarding data, the panel has found considerable weaknesses in the three elements of data infrastructure needed to evaluate welfare reform. Household survey data sets, which are rare at the state level, are more plentiful at the national level but suffer from small sample sizes, a lack of key variables, and the relative unavailability of comparable policy measures across states. State-level administrative data sets, which have traditionally been used for management rather than research purposes, are still at an early stage of development and need much more work before they can fulfill their potential. Comprehensive data on state welfare policies across states and over time on a comparable basis have yet to be published, and there is no systematic plan for collecting such data on a long-run, permanent basis within the federal government.