Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 175
9 Evidence of Program Effectiveness from National Data Bases In addition to the program-specif~c evaluations of YEDPA effective- ness that were reviewed in Chapters 5 through 8, there are several evaluations that attempted to use large, representative national samples to derive estimates of the impact of all federally funded employment and training programs. The most prominently used data bases in these studies were the Continuous Longitudinal Manpower Survey (CLMS) and a special youth sample of the National Longitudinal Survey (NLS). Both of these data bases involve relatively large samples--more than 60,000 in the CLMS and more than 12,000 in the NLS--that are drawn in a manner designed to permit generalizations, for CLMS, to the uni- verse of participants in CETA programs, and, for NLS, to all American youths. There was also a YEDPA attempt to collect data on the progress of its participants and activities. While the major charge of our committee was to focus on the YEDPA knowledge development activities, we also reviewed the findings from studies using these other data bases, and we evaluated the quality of YEDPA's Standardized Assessment System. The results of our review of this research are presented in detail in Appendices A and D. In this chapter we summarize our conclusions regarding this evidence. - THE CLMS AND NLS DATA BASES The studies based on the CLMS and the NLS use data gathered in a different manner and have a somewhat different (and wider) focus than the program-specific evaluations, and so provide an important supple- mentary perspective on the substance and problems of the individual YEDPA evaluations we reviewed. Moreover, these studies use data derived from samples with high sample-coverage rates and low sample attrition, and consequently they can provide a more adequate evidentiary basis (at least in respect to sampling mechanics) than many of the individual program studies we reviewed. Both the CLMS and the NLS are full probability samples whose sampling designs appear to have been well executed. Sample coverage appears high, and the available documentation shows considerable attention to important methodological details, such as adequacy of 175
OCR for page 176
176 sampling frame, careful screening of respondents to ensure that they are within the universe being sampled, extensive follow-up to ensure a high response rate, and so forth. For both the NLS and CLMS, comparison groups must be constructed. The basic goal in selecting a comparison group is to find a sample of individuals who closely resemble the participants in employment and training programs. Lacking an experimental design, in which individuals are randomly assigned to participant and control groups, a comparison group strategy is a next-best approach. (The problems inherent in this strategy are discussed below.) There are, nonetheless, important limitations to these data bases. First, they are not targeted on specific programs, and so the estimates of aggregate program effects may lump together the effects of effective and ineffective programs. Second, the data bases (particularly CLMS) limit the extent to which one can take account of the effects of local labor market conditions. And third, the data were not derived from experiments in which subjects were randomly assigned to take part in a program; consequently, the estimates of program effectiveness require strong assumptions about the adequacy of model specification and matching procedures used to construct synthetic control groups. Finally, we should point out that we received the CLMS-based reports in draft form late in the course of our work, and thus our evaluation of them has not been as intensive as that of the individual YEDPA reports. Findings From the CLMS The data from the CLMS have been analyzed by researchers from Westat, Inc. (who concentrated mainly on adult participants in CETA), SRI International, and the Urban Institute. For youth participants in CETA programs, Westat (1984) reported that youth work-experience programs have statistically insignificant effects on employment and earnings for all cohorts and all postprogram years and did not report other specific youth-related findings. The Urban Institute (Bass) et al., 1984:47), however, characterizes Westat's results from earlier reports as follows: In looking at youth, Westat (1982) has found that for those youngsters 14 to 15 years old, CETA has had little overall impact. For other young workers net gains are found, being highest once again for OJT [on-the-job training], followed by PSE [public service employment] and classroom training, and being negligible for work experience. The results found for young workers also tend to persist in the second postprogram year. Westat (1981) also produced a technical paper focusing on youth in CETA in which net gains were broken down by sex. As with adults, net gains were greatest for young females, being negligible or insignificant for males. After classifying youth according to their attachment to the labor force, net earnings gains were found to be greatest among structurally unemployed or discouraged workers.
OCR for page 177
177 SRI's analysis (Dickinson et al., 1984) differs from Westat's in two key respects: the selection of the comparison group and the sampling frame. SRI's estimates of program effects were substantially lower than Westat's [as summarized by the Urban Institute (Bass) et al., 1984~l, for both adults and youths, and the authors spend con- siderable time in identifying the sources of the differences. From their analyses, the SRI authors conclude that most of the differences could be attributed to choices made in the sampling frame and to an updating of 1979 Social Security earnings. SRI's findings for 1976 CETA enrollees were as follows: · Participation in CETA results in significantly lower post- program earnings for adult men (-$690) and young men (-$591) and statistically insignificant gains for adult women (+$13) and young women (+$185~. · All program activities have negative effects for men, while adult women benefit from Public Service Employment and young women from on-the-job training. Work experience has negative effects for all age and sex groups. · Both male and female participants are more likely to be employed after CETA, but males are less likely to be in high-paying jobs or to work long hours. · Length of stay in the program has a positive impact on postprogram earnings, with turning points for young men at 8 months and for young women at 1 month. · Placement on leaving the program leads to positive earnings gains. The Urban Institute (Bass) et al., 1984) report focuses separately on youths. The analysts used Westat's match groups from the Current Population Survey (CPS) and estimated net effects for six race/sex groups: male/female by white/black/Hispanic. Both random-effects estimators and fixed-effects estimators were used to identify net effects, but the emphasis was on fixed-effects models to control for selection bias. Net effects were estimated for two postprogram years, 1978 and 1979 (see Appendix D: Table D.2~. The Urban Institute found the following: · Significant earnings losses for young men of all races and no significant effects for young women, with effects persisting into the second postprogram year. · For Public Service Employment and on-the-job training, significant positive net effects for young women, particularly . . . minorities. · For work experience, significant negative or insignificant net effects for all groups. · Among groups, the most negative findings were for white males, the most positive for minority females. · Older youths (22-year-olds) and those who had worked less than quarter time had stronger gains or smaller losses than the younger group or those who had worked quarter time or more.
OCR for page 178
178 · Earnings gains resulted primarily from increased time in the labor force, time employed, and hours worked rather than from increased average hourly wages. Findings Prom the NLS Two studies have used the NLS data base to make estimates of the aggregate effects of government-sponsored employment and training programs on youths. One study (Moeller et al., 1983) was conducted by the Policy Research Group (PRO) of Washington, D.C.; the second study (Hahn and Lerman, 1983) was conducted by the Center for Employment and Income Studies (CEIS) of Brandeis University. Both studies evaluated the effects of CETA programs on youths although the PRG study expanded its scope to include such schooling programs as vocational education. The estimates made by both studies indicate relatively modest effects of employment and training programs on the subsequent income, employment status, and educational attainments of the youths who participated in those programs. For CETA programs, both studies find negative overall effects of CETA on employment, although PRG reports some positive effects at 2 years after CETA completion. Reviewing the PRG results and their own findings, Hahn and Lerman (1983:84) note: To conclude, both the PRG results and our own show negative and significant effects of CETA on employment variables. It is only after going out two years in time after CETA completion that the PRG report finds evidence of a positive, significant effect and that on only one variable, unsubsidized earnings. We cannot confirm this positive effect, but it would not be inconsistent with our results. It is difficult to claim this as an impressive success for CETA. The substantive findings from these NLS analyses are generally consistent with the weak and generally negative findings from the CLMS analyses, and we therefore do not review them in great detail here. Limitations to the Findings: Bias in Estimates of Effectiveness Across the three CLMS studies, there is a pattern of preponderantly negative net effects on youths, and the NLS studies show extremely weak effects of program participation. These results obviously invite the conclusion that federally funded employment and training programs have had (in the aggregate) either little effect or a deleterious effect on the future earnings and employment prospects of the youths who par- ticipated in the programs. There is, however, empirical evidence that suggests that these estimates may be biased. The evidence indicates that despite various intensive efforts to select comparison groups that are similar to participants in youth programs and to control for selection bias through the use of fixed
OCR for page 179
179 effects estimators, there may still be persistent and systematic (but unmeasured) differences in the earnings profiles of comparison groups and true controls. Such earnings differences, for example, might be due to such unobserved factors as (perceived or actual) differences between program participants and a constructed comparison group in social attitudes, motivation, or ability. A study by Mathematica (1984) provides important evidence on the potential for bias in the use of matching strategies such as those employed in the NLS and CLMS analyses reviewed above. The Mathematica study used data from a true experimental design that randomly assigned youths to be either program participants or controls (the Supported Work program). It then compared net-impact estimates derived using the experimental design with estimates derived using the same sample of program participants but substituting various "matched samples" con- structed from the Current Population Survey. The comparison groups were constructed in a manner designed to simulate those used by the analysts working with the CLMS data. Using the true control group, Mathematica found in-program earnings gains and negligible postprogram effects for youths. Using the constructed matched samples, however, yielded either insignificant or significantly negative effects. Mathematica argues that biases in the estimates of program effectiveness are likely to exist in other studies that use similar comparison group strategies, which include the Westat, SRI, and Urban Institute studies using the CLMS and the studies based on the NLS. A further finding of the Mathematica review is the substantial variability in estimates made using different matching strategies on the same data. Not only do the estimates derived from a true control group differ substantially from those derived from a constructed match sample, but the estimates of net impact derived using different matching strategies also differ substantially, from approximately +$122 to -$1,303 (see Appendix D). Given such a broad range of estimated effects and the sensitivity of estimated program effects to alternative assumptions, there must be cause for concern about the nature of the underlying data. While one may argue about the generalizability of the Mathematica demonstration of bias and variability in the matched sample methodology, the finding has a precedent in the analysis of the Salk polio vaccine trials (Meter, 19721. The Mathematica study highlights two separate problems in net-impact estimations using a matched comparison group: (1) the extent to which employment and training programs recruit or attract participants who differ from eligible nonparticipants in ways that affect subsequent earnings, and (2) the extent to which such differences can be detected and controlled using available demographic or preprogram earnings data. Youths present a particularly difficult problem for any such matching strategy since preprogram earnings data either do not exist or are not reliable indicators of the uncontrolled variables that are of interest to program evaluators. Estimates of the magnitude and direction of the bias in matched- group evaluations are only available for the one youth program (Supported Work) whose experimental data were reanalyzed by Mathematical
OCR for page 180
180 From this reanalysis we have an elegant demonstration of the fact that commonly used matched comparison group strategies have yielded an inappropriately negative evaluation when the experimental data indicate that the program had a null impact. There is a natural temptation on the basis of this one result to conclude that biases equal in magnitude and direction affect other comparison group studies. However, there is too little evidence to warrant such a generalization. All we know for certain is that the potential for substantial bias exists in studies that use matching techniques rather than random assignment and that when such biases do occur they can lead to serious errors of inference. (Of course, biases in either direction are theoretically possible.) Until further work is done, there will be considerable uncertainty as to the extent to which the Mathematica finding generalizes to other program evaluations and to different populations of youths. In order to obtain the requisite data, there will have to be a renewed commitment to randomized experiments so that estimates of the magnitude and direction of these biases can be made. YEDPA STANDARDIZED ASSESSMENT SYSTEM A national data base different in major respects from the CLMS and NLS was established by the Educational Testing Service under the auspices of the Office of Youth Programs. A key element of YEDPA's knowledge development strategy called for the establishment of a standardized system for the systematic collection of data on the progress of program participants and the services provided by YEDPA programs. The intent of YEDPA's data gathering was to provide a standardized data base with which to assess the performance of the various YEDPA demonstration projects. This data collection plan was called the Standardized Assessment System (SAS). It was intended to provide preprogram, postprogram, and follow-up (after 3 and 8 months) data for all youths enrolled in YEDPA demonstration programs. The data collected by SAS included an intake interview, a reading test, and seven scales designed to measure occupa- tional knowledge, attitudes, and related skills. In addition, process data were collected from program sites concerning the implementation of the programs and the services offered at those sites. In order to investigate the characteristics of the SAS data base, we obtained a copy of the data base (minus individual identifiers). Appendix A presents in detail our assessment of its sampling adequacy, measurement reliability, and measurement validity. Overall, this analysis suggests that sample coverage was poor and subsequent attrition rates were extremely high. Using program operators' reports of enrollment at 166 sites to estimate the size of the target sample for those sites, we found that the majority of the target sample was missed
OCR for page 181
181 entirely. This sample coverage problem was compounded by high attrition over time: at 3 months postprogram more than 40 percent of the initial sample had been lost. In addition, our examination of the attitude and knowledge measurements in the SAS data base indicated that those measures had low levels of stability over time and that they were only weakly correlated with subsequent success in the job market. The problems evident in our examination of the SAS data collection effort invite the question of how this might be avoided in the future. In Chapter 1 we present a number of specific recommendations in this regard. There are, however, two more general lessons that should be learned from this experience. First, the scope of a research effort should match the resources available. In the case of SAS, it is questionable whether any research purpose required that data be gathered from all participants at all sites, but in any event, the available resources were inadequate for such a task. Well-collected data on a sample of participants or program sites would have been much better than the ambitious but poorly executed data-gathering strategy used by SAS. The second, and related, lesson concerns the dangers of using program operators to collect research data. Collection of research data in a longitudinal study is a demanding task. Like all survey data collections, it requires vigorous follow-up efforts to obtain data from persons who initially refuse to be interviewed or who are hard to reach. It also requires continued contact with respondents over time so as to minimize attrition, together with careful efforts to trace persons who move. While it may seem economical to use program personnel for such tasks, the experience of SAS--and other efforts--suggests that it is a false economy. This estimate is derived from reported enrollments for sites that provided process data for the SAS. Of the 458 sites that provided participant data, only 166 also provided such process data. Obviously it is not possible to tell whether sites that did not provide such process data had higher or lower rates of sample coverage than sites that did provide process data.
OCR for page 182
Representative terms from entire chapter: