Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 175
9
Evidence of Program Effectiveness
from National Data Bases
In addition to the program-specif~c evaluations of YEDPA effective-
ness that were reviewed in Chapters 5 through 8, there are several
evaluations that attempted to use large, representative national
samples to derive estimates of the impact of all federally funded
employment and training programs. The most prominently used data bases
in these studies were the Continuous Longitudinal Manpower Survey
(CLMS) and a special youth sample of the National Longitudinal Survey
(NLS). Both of these data bases involve relatively large samples--more
than 60,000 in the CLMS and more than 12,000 in the NLS--that are drawn
in a manner designed to permit generalizations, for CLMS, to the uni-
verse of participants in CETA programs, and, for NLS, to all American
youths. There was also a YEDPA attempt to collect data on the progress
of its participants and activities. While the major charge of our
committee was to focus on the YEDPA knowledge development activities,
we also reviewed the findings from studies using these other data
bases, and we evaluated the quality of YEDPA's Standardized Assessment
System.
The results of our review of this research are presented in detail
in Appendices A and D. In this chapter we summarize our conclusions
regarding this evidence.
-
THE CLMS AND NLS DATA BASES
The studies based on the CLMS and the NLS use data gathered in a
different manner and have a somewhat different (and wider) focus than
the program-specific evaluations, and so provide an important supple-
mentary perspective on the substance and problems of the individual
YEDPA evaluations we reviewed. Moreover, these studies use data
derived from samples with high sample-coverage rates and low sample
attrition, and consequently they can provide a more adequate evidentiary
basis (at least in respect to sampling mechanics) than many of the
individual program studies we reviewed.
Both the CLMS and the NLS are full probability samples whose
sampling designs appear to have been well executed. Sample coverage
appears high, and the available documentation shows considerable
attention to important methodological details, such as adequacy of
175
OCR for page 176
176
sampling frame, careful screening of respondents to ensure that they
are within the universe being sampled, extensive follow-up to ensure a
high response rate, and so forth.
For both the NLS and CLMS, comparison groups must be constructed.
The basic goal in selecting a comparison group is to find a sample of
individuals who closely resemble the participants in employment and
training programs. Lacking an experimental design, in which individuals
are randomly assigned to participant and control groups, a comparison
group strategy is a next-best approach. (The problems inherent in this
strategy are discussed below.)
There are, nonetheless, important limitations to these data bases.
First, they are not targeted on specific programs, and so the estimates
of aggregate program effects may lump together the effects of effective
and ineffective programs. Second, the data bases (particularly CLMS)
limit the extent to which one can take account of the effects of local
labor market conditions. And third, the data were not derived from
experiments in which subjects were randomly assigned to take part in a
program; consequently, the estimates of program effectiveness require
strong assumptions about the adequacy of model specification and
matching procedures used to construct synthetic control groups.
Finally, we should point out that we received the CLMS-based reports in
draft form late in the course of our work, and thus our evaluation of
them has not been as intensive as that of the individual YEDPA reports.
Findings From the CLMS
The data from the CLMS have been analyzed by researchers from
Westat, Inc. (who concentrated mainly on adult participants in CETA),
SRI International, and the Urban Institute. For youth participants in
CETA programs, Westat (1984) reported that youth work-experience
programs have statistically insignificant effects on employment and
earnings for all cohorts and all postprogram years and did not report
other specific youth-related findings. The Urban Institute (Bass) et
al., 1984:47), however, characterizes Westat's results from earlier
reports as follows:
In looking at youth, Westat (1982) has found that for those
youngsters 14 to 15 years old, CETA has had little overall
impact. For other young workers net gains are found, being
highest once again for OJT [on-the-job training], followed by
PSE [public service employment] and classroom training, and
being negligible for work experience. The results found for
young workers also tend to persist in the second postprogram
year. Westat (1981) also produced a technical paper focusing
on youth in CETA in which net gains were broken down by sex.
As with adults, net gains were greatest for young females,
being negligible or insignificant for males. After classifying
youth according to their attachment to the labor force, net
earnings gains were found to be greatest among structurally
unemployed or discouraged workers.
OCR for page 177
177
SRI's analysis (Dickinson et al., 1984) differs from Westat's in
two key respects: the selection of the comparison group and the
sampling frame. SRI's estimates of program effects were substantially
lower than Westat's [as summarized by the Urban Institute (Bass) et
al., 1984~l, for both adults and youths, and the authors spend con-
siderable time in identifying the sources of the differences. From
their analyses, the SRI authors conclude that most of the differences
could be attributed to choices made in the sampling frame and to an
updating of 1979 Social Security earnings.
SRI's findings for 1976 CETA enrollees were as follows:
· Participation in CETA results in significantly lower post-
program earnings for adult men (-$690) and young men (-$591) and
statistically insignificant gains for adult women (+$13) and young
women (+$185~.
· All program activities have negative effects for men, while
adult women benefit from Public Service Employment and young women from
on-the-job training. Work experience has negative effects for all age
and sex groups.
· Both male and female participants are more likely to be
employed after CETA, but males are less likely to be in high-paying
jobs or to work long hours.
· Length of stay in the program has a positive impact on
postprogram earnings, with turning points for young men at 8 months and
for young women at 1 month.
· Placement on leaving the program leads to positive earnings
gains.
The Urban Institute (Bass) et al., 1984) report focuses separately
on youths. The analysts used Westat's match groups from the Current
Population Survey (CPS) and estimated net effects for six race/sex
groups: male/female by white/black/Hispanic. Both random-effects
estimators and fixed-effects estimators were used to identify net
effects, but the emphasis was on fixed-effects models to control for
selection bias. Net effects were estimated for two postprogram years,
1978 and 1979 (see Appendix D: Table D.2~.
The Urban Institute found the following:
· Significant earnings losses for young men of all races and no
significant effects for young women, with effects persisting into the
second postprogram year.
· For Public Service Employment and on-the-job training,
significant positive net effects for young women, particularly
. . .
minorities.
· For work experience, significant negative or insignificant net
effects for all groups.
· Among groups, the most negative findings were for white males,
the most positive for minority females.
· Older youths (22-year-olds) and those who had worked less than
quarter time had stronger gains or smaller losses than the younger
group or those who had worked quarter time or more.
OCR for page 178
178
· Earnings gains resulted primarily from increased time in the
labor force, time employed, and hours worked rather than from increased
average hourly wages.
Findings Prom the NLS
Two studies have used the NLS data base to make estimates of the
aggregate effects of government-sponsored employment and training
programs on youths. One study (Moeller et al., 1983) was conducted by
the Policy Research Group (PRO) of Washington, D.C.; the second study
(Hahn and Lerman, 1983) was conducted by the Center for Employment and
Income Studies (CEIS) of Brandeis University. Both studies evaluated
the effects of CETA programs on youths although the PRG study expanded
its scope to include such schooling programs as vocational education.
The estimates made by both studies indicate relatively modest
effects of employment and training programs on the subsequent income,
employment status, and educational attainments of the youths who
participated in those programs. For CETA programs, both studies find
negative overall effects of CETA on employment, although PRG reports
some positive effects at 2 years after CETA completion. Reviewing the
PRG results and their own findings, Hahn and Lerman (1983:84) note:
To conclude, both the PRG results and our own show negative and
significant effects of CETA on employment variables. It is
only after going out two years in time after CETA completion
that the PRG report finds evidence of a positive, significant
effect and that on only one variable, unsubsidized earnings.
We cannot confirm this positive effect, but it would not be
inconsistent with our results. It is difficult to claim this
as an impressive success for CETA.
The substantive findings from these NLS analyses are generally
consistent with the weak and generally negative findings from the CLMS
analyses, and we therefore do not review them in great detail here.
Limitations to the Findings:
Bias in Estimates of Effectiveness
Across the three CLMS studies, there is a pattern of preponderantly
negative net effects on youths, and the NLS studies show extremely weak
effects of program participation. These results obviously invite the
conclusion that federally funded employment and training programs have
had (in the aggregate) either little effect or a deleterious effect on
the future earnings and employment prospects of the youths who par-
ticipated in the programs. There is, however, empirical evidence that
suggests that these estimates may be biased.
The evidence indicates that despite various intensive efforts to
select comparison groups that are similar to participants in youth
programs and to control for selection bias through the use of fixed
OCR for page 179
179
effects estimators, there may still be persistent and systematic (but
unmeasured) differences in the earnings profiles of comparison groups
and true controls. Such earnings differences, for example, might be
due to such unobserved factors as (perceived or actual) differences
between program participants and a constructed comparison group in
social attitudes, motivation, or ability.
A study by Mathematica (1984) provides important evidence on the
potential for bias in the use of matching strategies such as those
employed in the NLS and CLMS analyses reviewed above. The Mathematica
study used data from a true experimental design that randomly assigned
youths to be either program participants or controls (the Supported
Work program). It then compared net-impact estimates derived using the
experimental design with estimates derived using the same sample of
program participants but substituting various "matched samples" con-
structed from the Current Population Survey. The comparison groups
were constructed in a manner designed to simulate those used by the
analysts working with the CLMS data.
Using the true control group, Mathematica found in-program earnings
gains and negligible postprogram effects for youths. Using the
constructed matched samples, however, yielded either insignificant or
significantly negative effects. Mathematica argues that biases in the
estimates of program effectiveness are likely to exist in other studies
that use similar comparison group strategies, which include the Westat,
SRI, and Urban Institute studies using the CLMS and the studies based
on the NLS.
A further finding of the Mathematica review is the substantial
variability in estimates made using different matching strategies on
the same data. Not only do the estimates derived from a true control
group differ substantially from those derived from a constructed match
sample, but the estimates of net impact derived using different
matching strategies also differ substantially, from approximately +$122
to -$1,303 (see Appendix D). Given such a broad range of estimated
effects and the sensitivity of estimated program effects to alternative
assumptions, there must be cause for concern about the nature of the
underlying data.
While one may argue about the generalizability of the Mathematica
demonstration of bias and variability in the matched sample methodology,
the finding has a precedent in the analysis of the Salk polio vaccine
trials (Meter, 19721. The Mathematica study highlights two separate
problems in net-impact estimations using a matched comparison group:
(1) the extent to which employment and training programs recruit or
attract participants who differ from eligible nonparticipants in ways
that affect subsequent earnings, and (2) the extent to which such
differences can be detected and controlled using available demographic
or preprogram earnings data. Youths present a particularly difficult
problem for any such matching strategy since preprogram earnings data
either do not exist or are not reliable indicators of the uncontrolled
variables that are of interest to program evaluators.
Estimates of the magnitude and direction of the bias in matched-
group evaluations are only available for the one youth program
(Supported Work) whose experimental data were reanalyzed by Mathematical
OCR for page 180
180
From this reanalysis we have an elegant demonstration of the fact that
commonly used matched comparison group strategies have yielded an
inappropriately negative evaluation when the experimental data indicate
that the program had a null impact.
There is a natural temptation on the basis of this one result to
conclude that biases equal in magnitude and direction affect other
comparison group studies. However, there is too little evidence to
warrant such a generalization. All we know for certain is that the
potential for substantial bias exists in studies that use matching
techniques rather than random assignment and that when such biases do
occur they can lead to serious errors of inference. (Of course, biases
in either direction are theoretically possible.)
Until further work is done, there will be considerable uncertainty
as to the extent to which the Mathematica finding generalizes to other
program evaluations and to different populations of youths. In order
to obtain the requisite data, there will have to be a renewed
commitment to randomized experiments so that estimates of the magnitude
and direction of these biases can be made.
YEDPA STANDARDIZED ASSESSMENT SYSTEM
A national data base different in major respects from the CLMS and
NLS was established by the Educational Testing Service under the
auspices of the Office of Youth Programs. A key element of YEDPA's
knowledge development strategy called for the establishment of a
standardized system for the systematic collection of data on the
progress of program participants and the services provided by YEDPA
programs. The intent of YEDPA's data gathering was to provide a
standardized data base with which to assess the performance of the
various YEDPA demonstration projects.
This data collection plan was called the Standardized Assessment
System (SAS). It was intended to provide preprogram, postprogram, and
follow-up (after 3 and 8 months) data for all youths enrolled in YEDPA
demonstration programs. The data collected by SAS included an intake
interview, a reading test, and seven scales designed to measure occupa-
tional knowledge, attitudes, and related skills. In addition, process
data were collected from program sites concerning the implementation of
the programs and the services offered at those sites.
In order to investigate the characteristics of the SAS data base,
we obtained a copy of the data base (minus individual identifiers).
Appendix A presents in detail our assessment of its sampling adequacy,
measurement reliability, and measurement validity. Overall, this
analysis suggests that sample coverage was poor and subsequent attrition
rates were extremely high. Using program operators' reports of
enrollment at 166 sites to estimate the size of the target sample for
those sites, we found that the majority of the target sample was missed
OCR for page 181
181
entirely. This sample coverage problem was compounded by high
attrition over time: at 3 months postprogram more than 40 percent of
the initial sample had been lost. In addition, our examination of the
attitude and knowledge measurements in the SAS data base indicated that
those measures had low levels of stability over time and that they were
only weakly correlated with subsequent success in the job market.
The problems evident in our examination of the SAS data collection
effort invite the question of how this might be avoided in the future.
In Chapter 1 we present a number of specific recommendations in this
regard. There are, however, two more general lessons that should be
learned from this experience.
First, the scope of a research effort should match the resources
available. In the case of SAS, it is questionable whether any research
purpose required that data be gathered from all participants at all
sites, but in any event, the available resources were inadequate for
such a task. Well-collected data on a sample of participants or
program sites would have been much better than the ambitious but poorly
executed data-gathering strategy used by SAS.
The second, and related, lesson concerns the dangers of using
program operators to collect research data. Collection of research
data in a longitudinal study is a demanding task. Like all survey data
collections, it requires vigorous follow-up efforts to obtain data from
persons who initially refuse to be interviewed or who are hard to
reach. It also requires continued contact with respondents over time
so as to minimize attrition, together with careful efforts to trace
persons who move. While it may seem economical to use program
personnel for such tasks, the experience of SAS--and other
efforts--suggests that it is a false economy.
This estimate is derived from reported enrollments for sites that
provided process data for the SAS. Of the 458 sites that provided
participant data, only 166 also provided such process data. Obviously
it is not possible to tell whether sites that did not provide such
process data had higher or lower rates of sample coverage than sites
that did provide process data.
OCR for page 182
Representative terms from entire chapter:
process data