Click for next page ( 194


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 193
APPENDICES APPENDIX A Standardized Data Collection for Large-Scale Program Evaluation: An Assessment of the YEDPA-SAS Experience Charles F. Turner The Youth Employment and Demonstration Projects Act (YEDPA), as noted in Chapter 3, provided the Department of Labor (DOL) and its new Office of Youth Programs (OYP) with a mandate to test the relative efficacy of different methods of dealing with the employment problems of young Americans. The legislative concern with learning "what works for whom" was consistent with the frequently stated contention that decades of federal funding for similar programs had not yielded much in the way of reliable knowledge. And so, a key element of YEDPA's knowledge development strategy was the establishment of a standardized system for the systematic collection of data on the progress of program participants and the services provided by YEDPA programs. ~_ , _ ~ STANDARDI ZED ASSESSMENT SYSTEM In order "to document administrative outcomes, to monitor per- formance, and to continually assess program impacts and lessons" from YEDPA programs, the Office of Youth Programs launched a large-scale data gathering operation in collaboration with the Educational Testing Service (ETS). The intent of the data gathering was to develop a standardized data base with which the performance of the various programs that YEDPA comprised could be assessed. This data gathering plan, called the Standardized Assessment System (SAS), was ambitious in its aim. SAS was intended to provide preprogram, postprogram, and follow-up data (3 and 8 months after program completion) for almost 50 percent of the youth served by these programs (Taggart, 19801. The SAS data base is an important component of the YEDPA knowledge development enterprise not only because it was a salient feature of the YEDPA effort, but also because it provided the basic data used in evaluating a large number of the YEDPA programs. The characteristics Charles F. Turner was senior research associate with the committee. 193

OCR for page 193
194 of this data base are thus of concern to us in evaluating what was learned from the YEDPA experience. In the following pages we describe the SAS data collection procedures and evaluate the characteristics of the data obtained, e.g., the coverage of the sample and the reliability and validity of the measurements. Data Collection Instruments The SAS data collection instruments included an intake interview, called the Individual Participant Profile (IPP); a reading test (STEP)' a battery of seven measures of occupational knowledge, attitudes, and related skills administered preprogram and postprogram; a program completion interview; interviews at 3 and 8 months postprogram; and evaluations by counselors (postprogram) and employers or work super- visors (postprogram and 3 and 8 months later). In addition, data were collected from program sites concerning the implementation of the program and the services offered, and data were also collected from "control" groups recruited by program operators to provide comparison samples for program evaluation. In this section each of the data collection instruments is briefly described. The descriptions of the instruments are taken from The Standardized Assessment System for Youth Demonstration Projects (Educational Testing Service, 1980~. Where suitable we have used the ETS phrasing or paraphrased the descriptions without repeated citation of the source. Individual Participant Profile The Individual Participant Profile was used to record information on 49 participant characteristics as well as status while in the program and at termination. These data essentially duplicated the standard information gathered on each participant in all Comprehensive Employment and Training Act (CETA) programs. The first 29 items were largely demographic, covering such information as the individual's age, sex, race, and economic, educational, and labor-force status--all at time of entry into the youth program. The remaining 20 items were "program status" items, which indicated the status of the participant at the time of program completion or termination. These included such information as entry and termination dates, total hours spent in the program, whether the program provided the participant with academic credit, and specific forms of "positive" and "nonpositive" termination. (A set of definitions accompanying the IPP form defined each item in some detail and how it was to be completed by the youth program project personnel from their project records.) STEP Reading Scale The STEP reading scale was a short (10 to 15 minutes) measure of reading skill that was intended to cover the wide range of reading

OCR for page 193
195 levels found among the YEDPA enrollees (approximately fourth to ninth grade reading level by ETS's estimate). Twenty items were selected from the STEP locator tests covering fourth to ninth grade reading levels. Those locator tests are short reading-comprehension measures ordinarily used as screening devices for deciding which level of the full STEP achievement tests is suitable for administration. Job Knowledge and Attitudes Battery Measures chosen for incorporation in the Job Knowledge and Attitudes battery were intended to reflect YEDPA program objectives while still being compatible with the characteristics of the trainee population and the operational constraints of the youth projects. As a starting point, five behavioral areas thought to be affected by YEDPA program participation were defined by the Office of Youth Programs. These were considered to encompass the objectives of a vast majority of the YEDPA projects and were designated as (1) career decision making, awareness, and capability, (2) self-image, (3) work attitudes, (4) job search capability, and (5) occupational sex stereotyping. Criticism of the design and administration of conventional paper- and-pencil tests used with similar youth led SAS designers to seek measures that were relatively short, presented orally, pictorial as well as verbal, and appropriate in level and style of language for adolescents or young adults of low reading skill. In addition the battery allowed the item responses to be marked directly in the test booklet. Examples of items from each of the Job Knowledge and Attitudes battery are shown in Figure A.1. The designers of SAS chose two measures to assess what they termed career decision making, awareness, and capability performance. One measure dealt with the "vocational maturity" of adolescents in making appropriate career decisions, and the other with the youth's knowledge of what is required for carrying out different jobs. Vocational Attitude Scale This scale contained 30 verbal items, which were scorable as three 10-item subscales. Those scales were designated as "Decisiveness, n nInvolvement, n and "Independence" in career decision making. The respondent indicated his or her agreement or disagreement with each of 30 statements about vocational careers and employment. Job Knowledge Test This 33-item scale dealt with the qualifications, requirements, and tasks involved in various jobs. The items, in multiple-choice format, required the respondent to indicate the correct response to questions about the specific occupations depicted. Self-Esteem Scale Youth programs often seek to enhance the participant's feelings of personal value, or self-worth, with the expectation that improved self-perception will stimulate more success-oriented social and vocational adjustment behaviors. The SAS

OCR for page 193
c%. ED so so ~4 o a rr - ~ lo; o I) - co ED En En is; o g I4 ~ o :~: in d 3 :^ 0 tic 196 8 . to c c- .; 8 a a ~ ~ 0 ~ ~ 0 0 ~ ~ ~ ~ ~ U00 tic - . _ S: ~ 30 D r (L) 3 ce ~ :3 I; o ._ I, lo' D ~ O - ~.a ~ O ~ O o ;^ 3 :^ D ~ - ~ td O .: 1,.~ ao / - D O _ _ 3 ~ c ~ l~d 4~, . ~ ~ C., ._ 8 .,C' aC 3 ,3 os 3 ~ ._ H.> _ ~ ~e: L; eC ._ _d ~ ~ 3 ~ ~ O ~O Zd 3_ 3 _ ~ ~;~` ~ ~ ~ _ = e.5 O O ( 5~ Rj_~ U) ~n C~ P cn o . U) . E~ E~ . . d d ~= E~ o ~- D z ~0 aC ~_ 1: Q ~ O 9d 3 )0 ~ _ - ~[&d ~ O .' r~ e: ._ - o ~d C>d ~, O - C}d Ul id X ~:e eC ~ _ _ t-d O O ~r~`~O -- ~ ~i iraOI ~ ~ ~ f ~ ~d ~ O ~ E _ _ O ~- ' ~ ~ ., O Ud o, o 3 ~ ~ v ~C o c ~d 3 :^ o c ~d ~d ~d 0d 00 b0 ~ V ~Cd ~ ~O ~ 0 - ~d ed ~ O ~O O ,~d ~d 0d 0d ~d ~ 000 ~0 Q~d :- o0. o ~d D

OCR for page 193
1 197 o ,' U) ~n v a, D O ~ v ~ O 60 C U) U) U] H U) C: z H o o U) ~q C 0 ~ U} 0 ~ v D .D O O CL ~ CL U) ~ _ ~ ~ O ~0 ~0 o o V o :- o ~C V 3 0 ~- :s O ~ ~O o 3 - o m c: aC f _ ~ ~Q ._ ~ al ~ U. ~ o ~ ~C 3 0 4,, ~: O ~V :^ a ~O ~_ 0^ _ - ~- _ _ o3 ~ _ o a' ~ v u ~ a, o ~ ~0 ~0 - :, ~ aD ~ ~_ ~ ._ ._ o c: L. ~C ~ _ ~o ._ ~C - ,, ~_ o _ ~: 3 1 ~,Y .,o o ao ~ o ~ ~ _ o ~ - n ~ J,: (~} - C: . ~: ~D ~n V, ~ C. o o ;- ~C ~C~ o ~. .C C ~ 5 5 O O ~C ~o o ~ ~o Cd : c: _ _ O O ~q~ ~: _ a: - U) C~ ~ol~ . S~ a) Q ~n a) - a a 3 o o Y; ~ - ~0 o . - a U) U] ~ - .,, ~ U) E~ o tn O - X U . e 3 ~: H O ~n

OCR for page 193
198 designers included one measure that attempted to define the level at which the program participant rated his or her personal value. The self-esteem scale was a 15-item scale containing pictorial and verbal material used to assess perceived self-worth in terms of expectations for acceptance or achievement in various social, vocational, and educational settings. The respondent indicated, on a three-point scale, the degree to which he or she would be successful or receive acceptance in the specific situation portrayed. Work-Related Attitudes Inventory This inventory was intended to measure the youth's views about jobs, the importance of working, appropriate ways of behaving in job settings, and general feelings about his or her capabilities for succeeding in a work situation. The inventory contained 16 items that provided both a total score and scores for three subscales defined as "Optimism," "Self-Confidence," and "Unsocialized Attitudes." The response to each of the attitudinal statements was based on a four-point scale of degree of agreement with, or applicability of, the statement. Job Holding Skills Scale This scale dealt with respondent awareness of appropriate on-the-job behaviors in situations involving interaction with supervisors and coworkers. This 11-item scale, containing pictorial and verbal material, required the respondent to indicate which one of three alternatives best defined what his or her response would be in the situation described. (Response alternatives were scaled in terms of "most" to "least" acceptable behaviors for maintaining employment.) Job Seeking Skills Test This test was intended to measure elementary skills essential for undertaking an employment search. This test had 17 items that sampled some of the skills needed to initiate an employment search, interpret information about prospective jobs (in newspaper want ads), and understand the information requirements for filling out a job application. The items, in a multiple-choice format, required selection of the one correct response to each question. Sex Stereotyping of Adult Occupations Scale This scale attempted to measure attitudinal perceptions of sex roles in occupational choice. This relatively short (21 item) verbal scale presented job titles along with a one-sentence description of each job and required the respondent to indicate "who should be a n ~ job title as given). A five-point response scale ranged from "only women" to "only men." Project and Process Data In addition to the range of information collected on program participants and controls, the SAS attempted to measure the types of

OCR for page 193
199 activities, the progress of program implementation, and the range of services being offered at each program site. This information was expected to be of potential use not only as contextual data for the analysis of program outcomes, but also as data for reports to managers and policymakers about the implementation of the various YEDPA programs. The Project and Process Information questionnaire contained six sets of questions that reported on key site-specific variables in quantitative terms. First, basic information was gathered about the site, setting, and sponsors of the project. Second, the project was described in terms of its services, activities, and goals. Third, the linkages involved in the project were described. Fourth, the staff involved in the project were profiled. Fifth, the project stability and the position of the project on the learning curve were assessed. Finally, the project costs were measured. Outcome Measures The outcomes of the programs were measured at program completion and 3 and 8 months after program departure. Two questionnaires were used for this purpose: the "Program Completion Survey" and the "Program Follow-up Survey." (The same instrument was used 3 and 8 months postprogram.) Program Completion Survey This questionnaire contained 48 items, most of which were phrased as questions to be presented to the youth at the time he or she had completed or was leaving the training program. They covered the participant's activities in the program, attitudes about the program, job and educational aspirations, and expectations and social-community adjustments. The questions were intended for oral presentation to the individual by an interviewer. (A parallel questionnaire containing similar material was designed for use with control group members and was designated the "Control Group Status Survey.") Program Follow-up Survey This 50-item questionnaire was designed to be administered orally to the individual by an interviewer, who also recorded the participant's responses. The survey was intended for use 3 months after the participant had left the training program and again at 8 months following program participation. Questions dealt with the former participant's posttraining experiences in areas of employment, education, social adjustments, and future plans. (A parallel version of the follow-up survey was used with control group members and was designated the "Control Group Follow-up Survey.") In addition, a five-item Employer Rating Form was to be completed by the present (or most recent) employer. (Permission to interview the employer had to be granted by the youth.)

OCR for page 193
200 Concerns about Instrument Reliability and Validity In introducing the SAS measuring instruments, the designers at the Educational Testing Service warned that (Educational Testing service, 1980) more careful testing of the instruments would have been preferable but it was necessary to develop these measures while implementing certain programs. The instruments . . . represent the best possible compromise between the many constraints at the time the system was implemented. A particular concern expressed by the SAS designers involved the nature of the youth population from whom data were being collected. Given a population characterized as economically disadvantaged and largely products of inner-city school systems, they anticipated that the validity of any available paper-and-pencil test might be suspect. For this reason the documentation of the SAS instruments stressed the (1) use of measures that employ pictures as well as words, (2) use of an administrator who would read items aloud so that the youth could follow along, and (3) the administration of the tests to small groups-- so that literacy (or other) problems might be more easily detected. Despite these precautions, it can never be assured in a data gathering operation such as SAS that measurements were made in the manner prescribed. The test administrators were not ETS employees, but rather program personnel assigned to fulfill YEDPA's "data reporting" requirement. While ETS did provide instruction to one person at each program site, that person was not necessarily the one who administered the measurements. Moreover, staff turnover may have put some people in the position of serving as test administrator with little or no (or wrong) instruction on how to administer the instruments. Since one of the canons of testing is that the manner of test administration can have important effects on measurement, it is natural that concerns about the reliability and validity of the SAS measurements were voiced by outsiders--as well as by ETS. Almost all of the SAS scales used previously published tests, and there did exist a literature that documented the characteristics of the scales and estimated their reliability and predictive validity with various populations. These populations,-however, were not identical to the YEDPA youth who would be tested with the SAS. Thus, it did not necessarily follow that the readings of test reliability and validity obtained from these groups could be generalized to the youth population targeted by YEDPA. In its 1980 report on the Standardized Assessment System, ETS presented evidence for the reliability and validity of the SAS scales. Some of this evidence predates YEDPA and may have been used JETS (1980) presents estimates of reliability and validity in cases where there are "significant" results (p less than .01 or p less than

OCR for page 193
201 in the decision making about which instruments to use in SASe The evidence is derived from studies of small samples of youths participating in Neighborhood Youth Corps (NYC) and Opportunities Industrialization Center (OIC) training programs. For four of the SAS scales, Table A.1 presents the correlations found between scale scores and various criteria of n success" in these programs. Reported correlations range from .18 to .36. Two measures show significant correlations with success in finding employment after program completion--the Job Knowledge scale (r = .22 in N7C sample) and the Job Search Skills scale (r = .36 in NYC sample, and .21 in OIC sample). The other two scales, Job Holding and Self-Esteem, do not show significant associations with postprogram employment, but do show positive associations with evaluations given by guidance counselors and work training supervisors. The 1980 report on SAS also provides early SAS data from samples of high school seniors participating in the Youth Career Development project In = 1,666) and their control group (n = 1,5903. Estimates of predictive validity using selected criterion measures (and Cronbach's alpha for the scales) are shown in Table A.2. The range of correlations for this sample are generally lower than those found in the earlier studies. In particular, only two scales (Vocational Attitudes and Work-related Attitudes) show significant correlations with postprogram activity (coded 2 for full-time school or work, 1 for part-time school or work, and 0 otherwise). These correlations were very modest in size (r = .12 and .10~. The scales did show somewhat higher correlations with level of present job and a negative correlation with amount of time required to find the present job. Overall, however, the preliminary evidence presented by ETS suggests that (1) the seven scales are not powerful predictors of postprogram employment and (2) the measurement characteristics of these scales when administered in SAS may be different from those found elsewhere. (Whether the latter might be a function of the population tested, lack of standardization in administration, or some other cause, is difficult to say.) .05~. Thus it is not possible in Table A.1 and A.2 to report their estimates for all variables and for each criterion measure. In selecting ETS ~validity" measures to reproduce in Table A.2 and in designing our own analyses (reported in Table A.10 and A.12) we have focused on the prediction of future rather than concurrent outcomes where the outcome variables involved assessments by observers other than the subject (e.g., an employer's evaluation of the subject at 3 months postprogram) or involved reports of relatively objective statuses (e.g., Are you employed full timed. We believe that this procedure provides more appropriate information about the usefulness (for program evaluation) of the SAS assessment battery than procedures that depend exclusively on more subjective reports from the respondent (e.g., assessments of job satisfaction or adjustment).

OCR for page 193
202 TABLE A.1 ETS Estimates of Predictive Validity of SAS Attitude and Knowledge Measurements SAS Measurement Criterion Predicted Sample (n) r Job knowledge Work supervisor rating NYC (109) .32 Counselor rating NYC (109) .25 Counselor rating OIC (220) .19 Vocational skills instructor rating OIC (261) .20 Posttraining employment NYC (104) .22 Job holding Counselor rating NYC (111) .31 skills Work supervisor rating NYC (111) .34 Vocational skills instructor rating OIC (260) .15 Remedial skills instructor rating OIC (134) .18 Job seeking Counselor rating NYC (111) .22 skills Work supervisor rating NYC (111) .31 Posttraining employment NYC (104) .36 Posttraining employment OIC (157) .21 Self-esteem Counselor rating NYC (111) .34 Work supervisor rating NYC (111) .24 Remedial skills instructor rating OIC (134) .18 SOURCE: Educational Testing Service (1980' CEIARACTERISTICS OF THE DATA BASE Completeness of Initial Coverage According to ETS, the Standardized Assessment System was designed to provide a complete enumeration of all participants (together with appropriate controls) in all YEDPA demonstration projects. In their words (Educational Testing Service, 19801: In a literal sense there is no "sampling" with respect to enrollees at a demonstration site since evaluation data are to be collected on the performance of all enrollees at a particular site. The control group at a particular site, however, does represent a sample from a hypothetical population that is, hopefully, similar to the enrollees with respect to important background and ability variables. The difficult task of ensuring that data were collected in a standardized manner from all program participants was not, however, under the control of ETS. The Department of Labor had arranged for data to be collected by individual program operators; administration

OCR for page 193
203 TABLE A. 2 ETS Estimates of Reliability and Predictive Validity of SAS Instruments Predictive Validity Internal Consistency Time to Find Activity Level of SAS Measurement (Alpha) First Job Status(a) Present Job Vocational attitudes .74 b .12 .21 Job knowledge .66 b b .2 3 Job holding skills .56 -.16 b .28 Work-related attitudes .78 -.17 .10 .18 Job seeking skills .66 -.16 b .24 Sex stereotyping .90 -.26 b .16 Self-esteem .60 -.17 b .15 NOTE: Predictive validity estimates are for 3 months postprogram for YCD participants. Sample sizes range from 120 to 790 for validity estimates. Reliability estimates are average of values reported for participants and controls (combined n = 3,256). aActivity status coded 0 for not working or in school, 1 for part-time work or school, and 2 for full-time work or school. It is not clear from the text how both part-time work and part-time school would be coded. hNot significant SOURCE: Educational Testing Service (1980' and execution of the data collection were not ETS's responsibility. ETS contracted to process the data supplied by the program operators (and, in a number of cases, to analyze that data) 2 Indeed, most ETS discussions of the SAS data base contain forceful disclaimers that "collection of all data with the Standardized Assessment System instruments remained the sole responsibility of the service delivery agents who were required to assign suitable staff at each project site for carrying out the data gathering tasks" (ETS, 1982:15, emphasis in original). As a result of this delegation of data gathering responsibility to the program operators, there was known to be quite incomplete reporting of data. Although the precise magnitude of the incompleteness of the initial coverage was not known, ETS has informally speculated that up to 50 percent of the program participants may have been missed. - 2 ETS involvement in the data collection grew out of evaluation studies begun by N. Freeberg and D. Rock of Youth Career Development and Service-Mix Alternatives projects.

OCR for page 193
209 80,000 70,000 60,000 50,000 N 40,000 30,000 20,000 1 0,000 o FIGURE A. 2 System) . _ I,...... ............. ..;, ....... :-:-:-:-:-:-:-:-: .2........ ............... :-:-:-:-:-:-:-:-: : ....... . .~ ................. ................... ........... ::::::::::::::::::: :-:-:-:-:-:-:-:-:- , ............ ................. :-:-:-:-:-:-:-:-:-: ............... ................ ............... ... ' 2 2. ................... ................... :-:-:-:-:-:-:-:-:- ....... ............. .2................. .'............ :-:-:-:-:-:-:-:-: ................... ...................... ............... ............. ................... ............... ................... ................. ................... ................ ........... ............... .................... ................... .......... :-:-:-:-:-:-:-:-: ........... ................ ................... ............. . ............ ................... .................. ..................... .................. .............. ................ ............... .................. ............... .................. .................. ................... .................. ................... ................... ~ Participants ...2..2...... .................. :-:-:-:-:-:-:-:-:- .......... :-:-:-:-:-:-:-:-: , ........ :::: :::::::::: :-:-:-:-:-:-:-:-: ................... .................. .......... :-:-:-:-:-:-:-:-:- :-:-:-:-:-:-:-:-:-: ... ..... :::::::::: :-:-:-:-:-:-:-:-:-: :-:-:-:-:-:-:-:-:- :-:-:-:-:-:-:-:-:-: ........ :::::::::: :-:-:-:-:-:-:-:-:-: ,........ :-:-:-:-:-:-:-:-: ............... ....................... ................... ............ .......... .................. ............ ......... :::::::::: , . . . :-:-:-:-:-:-:-:-: .................. ...... 2'. .............. ::::::::: , ........ :-:-:-:-:-:-:-:-: , ........ .................... /~ I........ ::::::::::::: .......... ............. .......... ......... :-:-:-:-:-:-:-:-:-: ........ ::::: :::::::: :-:-:-:-:-:-:-:-:-: .......... :-:-:-:-:-:-:-:-:-: .......... .................. .......... 2. ....2..-........ . ~. :-:-:-:-:-:-:-:-:-: ......... :-:-:-:-:-:-:-:-:- ........ :-:-:-:-:-:---: ............... :::: -.::::: ............ .............. ,.......... :-:-:-:-:-:-:-:-: .............. :-:-:-:-:-:-:-:-: ......... . ......... . ................... ................... ................. ................ ................ ............. . ...... :-:-:-:-:-:-:-:-: ::::::::::::::::: ........ :-:-:-:-:-:-:-:-: ...... :-:-:-:-:-:-:-:-: :-:-:-:-:-:-: -: :-:-:-:-:-:-:-:-: ............... ................. ............. .............. :-:::.::.::.: ::::::::: :-:-:-:-:-:-:- :~:-:-:.:-:.:.:. :-:-:-:-:-:-: ~ Controls .................... ............. ................. ............ :-:-:-:-:-:-:-:-:-: :-:-:-:-:-:-:-:-:- ......... :-:-:-:-:-:-:-:-: :-:-:-:-:-:-:-:-:-: .......... ........... ::::::::::::::::::: , ........ :-:-:-:-:-:-:-:-:-: ... 2 '. ................... :-:-:-:-:-:-:-:-:-: ........ ................... .......... . ...... - .-.-. - .- ............... :~: . i 1 E l t::::::::::::::::::l a ] ................ :-:-:-:-:-:-:-:-: :-:-: :.:.:::,:-:, :-:-:-:-:-:::-:-- :-:-:-:-:-:-:-:-: .-:-:-:-:-:-:-:-: : :.: :,:,:,:-:-: ;:::: :.: :-: .:::::::.r : :,::::: :':' .............. ............... - Target Sample Pre- Pre- Post- 3-Month 8-Month (est.) Interview Tests Tests Interview Interview Sample coverage and attrition (Standardized Assessment employment. Since these are relatively inexpensive data to collect, there is some reason to favor such a strategy--particularly if one suspects that the effects of training on employment may be unusually subtle or delayed in arriving. This strategy, of course, depends on the measures being adequate in the sense of being replicable so that repeated measurements are relatively stable and in their being

OCR for page 193
210 TABLE A. 5 Social and Demographic Characteristics of SAS Samples 3-Month 8-Month Respondent EntryFollow-up Follow-up Characteristic Sample ~% % Female Participant51.954.3 54.6 Control51.152.5 53.2 High school Participant25.521.6 20.5 dropout Control25.424.5 20.4 Income Participant66.361.7 63.7 70% of standard Control56.756.4 59.8 Welfare recipient Participant42.745.2 47.6 Control38.641.1 42.1 Race/ethnicity Black Participant56.258.5 58.1 Control53.455.2 54.5 Hispanic Participant21~922.4 23.1 Control26.729.2 29.1 White Participant19.516.8 17.0 Control17.714.3 15.2 Limited english Participant7.68.1 7.6 Control10.29.4 8.9 Has children Participant11.310.6 11.7 Control8.57.9 8.4 Criminal offender Participant8.26.7 7.1 Control11.510.0 12.0 Previous CETA Participant30.030.8 30.7 participant Control25.525.7 25.9 NOTE: Less than 1 percent of data records were inconsistent, e.g., the respondent was a "control" but the 3-month follow-up flag indicated the respondent had completed a "participant" follow-up survey. These records were excluded from this analysis. SOURCE: Derived by tabulating data for every fifth record in ETS data base, i.e., 20 percent subsample of data base. reasonable proxies for the more difficult to observe outcomes. The former condition is generally referred to under the rubric of reliability, the latter as validity (of one sort or another). Since the Standardized Assessment System was launched with some expressed trepidations about the suitability of such measures to the YEDPA population, it is important to seek evidence within the data base as to whether these conditions are met by the SAS measurements. SAS provides the opportunity for making (test-retest) reliability estimates

OCR for page 193
211 TABLE A.6 Job Knowledge and Attitudes and Other Pretest Scores (at entry) of Respondents Giving Interviews at Entry and 3 and 8 Months Postprograr.~ Standard Stage Deviation 3-Month 8-Month of SAS Measurement Sample Entry Follow-up Follow-up Scale Vocational attitudes Participant 20.5 20.5 20.4 Control 20.2 20.2 20.6 4.5 Job knowledge Participant 21.6 21.8 21.6 Control 21.4 21.3 21.7 Job holding skills Participant 30.4 30.6 30.5 Control 30.2 30.2 30.3 2.7 Work-related Participant 40.8 48.2 48.1 attitudes Control 47.9 47.8 48.5 6.8 Job search skills Participant 11.7 11.8 11.7 Control 11.5 11.3 11.7 3.2 Sex stereotyping Participant 45.4 45.1 45.3 Control 45~0 44.6 45.0 8 Self-esteem Participant 36.3 36.4 36.3 Central 35.9 35.9 36.2 3 Reading ability Participant 15.0 15.0 15.1 (STEP) Control 14.5 14.6 14.9 4.6 NOTE: Standard deviation of scale is computed from data for all controls and participants. SOURCE: Derived by tabulating data for every fifth record in ETS data base, i.e., 20 percent subsample. for these scales, since the same battery was administered preprogram and postprogram to the untreated controls. Although one can expect true temporal change to affect the cross-temporal correlations between two measures of a trait such as self-esteem or work-related attitudes, one would expect a certain amount of stability in these traits. After all, if people varied widely from day to day on these traits it is not (easily) conceivable that the measure would be helpful in predicting relatively stable social behaviors, such as employment or other vocational behaviors. A series of analyses reported in Tables A.7 through A.10 examine some of the properties of these scales. In Table A.7, the zero-order correlation of each scale measured preprogram and postprogram is

OCR for page 193
212 TABLE A.7 Test-Retest Reliability for SAS Measurements (computed from 20 percent sample of ETS data base) Zero-Order Correlation Over Timea SAS Measurement Controls Participants Vocational attitudes .604 .602 Job knowledge .527 .505 Job holding skills .460 .386 Work-related attitudes .604 .610 Job search skills .572 .538 Sex stereotyping .631 .643 Self-esteem .462 .388 (N)b (1,644) (4,443) NOTES: Test-retest reliability will be affected by "true change" in respondents. Since participants are enrolled in programs designed to change their attitudes and knowledge, reliability estimates for this group should be treated with caution. Measurements made using identical instruments preprogram and postprogram. Minimum sizes of samples from which estimate was made. TABLE A.8 Correlations Between Reading Scores and SAS Measurements of Job Attitudes and Knowledge (computed from 20 percent sample of ETS data base) Correlation with STEP Reading Score - SAS Measurementa Participant Control Vocational attitudes .445 .509 Job knowledge .447 .467 Job holding skill .288 .354 Work-related attitudes .446 .383 Job search skill .569 .578 Sex stereotyping - .279 .241 (N)b (5,603) (2,258) aAll measurements made during pretest. b Maximum sizes of samples upon which any reported correlation is based.

OCR for page 193
213 a' 3 C) - ~n a) a) as U] a) U] a' U] En a, 3 Q - - a) .,, 3 - U] o .,, o C) - a' U] U) in: En ~ o so ~ a) Q. o U] o ~ a) a., Cal ~ a) Q4 . o ~ o in: EN U) U] En U) U) H CD := SO U] a) U) o Al O ~ ~4 ~4 ~ O ED h :] U] A: U] co ~r ~) ~] ~ ~ ~ ~) oo ~ a:, cc 1 1 0 U~ 0 ~ e e e e e e e~ (~ CSi ~ ~ O ~ ~1 ~ O CSi 1 1 ~ t- ~1 ~ ~ ~I t- O ~) ~ In ~1 1 1 L~ ~- O e e ~ e e e ~ - ~ ~ ~1 0 ~ 0 1 1 ~1~ t- ~ OC) ~ ~r ~ c~ ~ 1 1 ~r ~ ~I CO L~ ~r ~r ~ ~ ~ ~ ~r e e e e e e e e e e e O ~ a:) kD 1 1 o o t~ ~ ~ t- ~r 0 ~1 1 ~ ~) CD O ~ Ir) e e e e e e~ e e e e ~1 ~ ~ I I O 1- t- ~ N (X) ~ ~ ~) ~ ~ c~ 1 1 ~ ~1 ~ ~ U) r- ~ U) ~r ~ e e e e e e e e e e t- ~ 1 1 ~1 ~ ~ ~ LO ~ CO ~ C5N C51 ~ 11 ~ ~ ~1 oc) ~ 0 ~ 0 ~1 0 e e e e e e e e e e e 1 1 U)~1 t- O 00 ~> ~ C~ L~ ~ ~ ~ 1 1 o ~ ~ ~ ~ ~ t- (~1 1- ao ~ t- ~r c~ ~ ~ Ln ~r ~ ~ ~ ~ ~ ~ e e e e e e e e e e e e S~ ~ ~ ~ s4 a ~5 ~ ~ ~ ~ ~ ~ ~ ~ k4 ~ ~ ~ S" O ~ O ~ O ~ O ~ O ~ O ~ O 1 - 1 ~1 1 01 1 - 1 . - 1 - 1 a O ~ O =) O ~ O ~ O ~ O ~ O ~ S~ ~ ~ ~ ~ ~ ~ ~ ~ ~ S" ~ S~ S4 a) ~ a) ~ c' ' ~ p~ ~ ~ ~ ~ ~ ~ ~ Pi - ~; 3 <: _ - U) U] ^ 51 a) ~Q _ U] ~U) U) a) - ~u: r ~,`_) _ _ ~; ~ - ._4 _ .,' ~c: ~ U] a) u, ,< ~, _ U] s o C: 3 ~, O O ~1 Q) ~u' - - ~O Y S 1 Ot U] 1 Y ~Q Q ~Q X 0 0 0 0 0 ~a > ~s: ~U2 U] U] o P, -,' E~ U] ~ a' O ~ O o S~ o U] o . - a' o C) ._t h ,' ,4 Q4 U] o e ~ ~ O V] a a' ~ 11 U] Z E~ - O O ~S ~n ~ 3 V] ~ o ~1 ~ a C) a) ~ O U] (~1 O a ~ U] a ~ a U] U] a :3 U] ~: o U] a U] {:: . E~ ~ O aJ Z ~ 3 o Q a U] o o o U2 o o ,4 a) o Q 3 o S U] a U] Q4 _. ,. s~ o s~ o

OCR for page 193
214 TABLE A.10 Estimates of Predictive Validity of SAS Attitude and Knowledge Measurements (computed from 20 percent sample of ETS data base) . Correlation with Activity Statusa SAS Measurement at 3 Months8 Months Program Completion SexPostprogramPostprogram Vocational attitudes Male(.048)(.056) Female.078(.042) Job knowledge Male.085(.042) Female.057(.042) Job holding skills Male.111.075 Female(~043)(.031) Job seeking skills Male.097(.042) Female.060.055 Work-related attitudes Male.068(.057) Female.107.082 Self-esteem Male.060(-.003) Female.092(.043) Sex stereotyping Male-.054(--~029) Female.050(.013) N Male1,080714 Female1,326935 NOTE: Estimated coefficients in parentheses are not reliably different from 0 (at p < .051. Correlations are derived from 20 percent subsample of YEDPA participants in ETS data base. Respondents were included only if they were coded as a participant in IPP profile and if the data flag for the 3-month follow-up indicated they had completed a participant follow-up survey (not a control survey). In a small number of cases (281 of 50,182), those two indicators are inconsistent; those cases were excluded from this analysis. aActivity status is coded 1 if respondent is in a full-time job or is a full-time student; status is coded 0 otherwise. reported for program participants and controls.6 All of the test-retest correlations are within the range of approximately 0.4 to 0.6. While these correlations are not insubstantial, neither would 60bviously, the reliability estimates for the control groups are most relevant since the controls did not participate in YEDPA's programs that were designed to change participant's attitudes, knowledge, behavior, and so forth. However, as Table A.7 shows, reliability estimates for program participants are quite similar to those for controls.

OCR for page 193
215 systematic) they be thought to indicate an extremely robust measurement. Indeed, if one were to assume that measurement errors (both random and ~ ~ did not contaminate these data, these estimates would suggest a great deal of variation over time in young people's knowledge of and attitudes toward jobs, their self-esteem, and the extent to which they sex stereotype the occupational world. This could, of course, be the case. But it is also plausible that a relatively large component of measurement error may be distorting the measurements. Overall, the self-esteem scale and the job-holding skill scale show relatively low cross-temporal correlations, while the sex stereotyping and vocational attitudes scales show correlations of 0.6 or better. (In the case of the sex stereotyping scale, one suspects that this relatively high estimate of reliability may derive, in part, from the fact that all items were presented in the same format and scored in the same direction.) These scales also show a high correlation with reading ability. Table A.8 presents the correlations between each of these scales (measured at pretest) and the STEP reading scale scores. These correlations range from a low of .241 for sex stereotyping to a high of .578 for the job search skill scale. While one might be tempted to dismiss some of these high correla- tions with reading ability as "artifacts," for some purposes the correlation is as one would want. The ability to read a job adver- tisement is an essential component of "job search skills." It is not, however, the case that such a simple argument can be made to defend these correlations in every instance. There is no prima facie case to be made for a correlation between the attitude measures and reading-- although there are more than enough plausible paths for indirect causation to account for this correlation. It is important to keep in mind that reading (and a myriad of other unmeasured traits) may play a role in accounting for the zero-order test-retest reliabilities. Potential correlated measurement errors also bedevil all attempts to understand test-retest reliabilities. Some evidence of the construct validity of the various SAS measurements may be gleaned from Table A.9. As intended by the SAS designers, all of the measures are positively intercorrelated. This is true even when a simplistic attempt is made to account for confounding effects of reading ability on all of the scales. The strongest correlations found for the SAS measures are between scales that measure It should be realized that test-retest correlations such as those shown in Table A.7 are affected by both true change in the respondents and by measurement errors. If one wishes to use measures like the SAS assessment battery as proxies for (unmeasurable) long-term outcomes (e.g., lifetime earning potential and employability), however, instability, per se, may be an important consideration. If a characteristic like work-related attitudes, for example, naturally varies to such a degree that test-retest correlations approach zero over a short period of time (in the absence of measurement error), then even a perfectly reliable measurement of this characteristic would be of doubtful utility in most program evaluations.

OCR for page 193
216 similar or related traits, e.g., vocational attitudes and work-related attitudes, or job search skills and job knowledge. Conversely, correla- tions between the sex role stereotyping measures and job knowledge factors are low. Predictive Validity Given the aim of the YEDPA programs, a key validity test for any scale would be its ability to predict which YEDPA youth would stay in school or find full-time employment and which would not. Several skirmishes have been made with this analysis and Table A.10 reports the simplest of them. (Its outcome, however, is little different from the more complicated analyses.) For all program participants (in 20 percent sample) who provided the requisite data at 3 months (n = 2,406) and at 8 months (n = 1,649) postprogram, a score of 1 was assigned if (at follow-up) the respondent reported being either in school full time or working full time. A score of 0 was assigned otherwise. In the crudest analysis (reported in Table A.10) the 0-order correlation between this dichotomous "activity variable" and each of the scales from the SAS battery was calculated.7 This was done separately for males and females to allow the effects of potential differences in child-care responsibilities to appear. It will be seen from Table A.10, that there were some "significant" correlations between job knowledge and attitude scores and whether a youth was "occupied" or "unoccupied," however, the magnitude of these correlations was not substantial. The correlations for the SAS data base are considerably below those found for the NYC and OIC samples reported by ETS in their 1980 report on SAS (Educational Testing Service, 1980~. They are even lower than the correlations (.10) reported by ETS from the Youth Career Development sample. The extremely low predictive validity of the SAS measures raises questions about the meaningfulness of program evaluations that rest their verdicts of program effectiveness on such measurements. As Chapters 4 through 8 have shown, such studies are are not uncommon in the YEDPA literature. Inter-site Variations The shortcomings of the aggregate SAS data base invite the question: Is the data base uniformly riddled with such problems? It .. . 7 This analysis is somewhat crude, but it illustrates the point in a straightforward manner (and it is analogous to analyses reported in ETS, 1980~. It should be noted, however, that because the criterion variable is dichotomous, the obtained correlations will understate somewhat the extent of the relationship.

OCR for page 193
217 TABLE A, 11 Follow-up Rates for 10 Randomly Selected Sites in SAS Data Base Postprogram Follow-up Stagea Follow-up RatesNo in. 0-24% 25-49% 50-74% 75+% Sampled 3 months Participants 1 3 2 4 -- Controls 1 3 1 2 3 8 months Participants 5 2 2 1 -- Controls 4 2 1 0 3 NOTE: Ten sites were selected using a random number table from among all sites in the ETS data base having an "n" of at least 25 (controls + participants) at Wave 1. N's for samples whose rates are shown above range from 24 to 167. a~ollow-up rate Is a percentage of all respondents at site for whom there is any 3-month (or 8-month) interview data (as indicated by "flags" set in the data base to indicate presence or absence of these data). Three sites had no control groups. is possible in theory, of course, for an aggregate outcome such as the one reported here to be composed of some very fine data gathering operations and some very poor ones. While the aggregate result would not be impressive, it still might be possible to isolate a sizable subset of the data base upon which a convincing analysis could be performed. To assay this possibility, we selected 10 sites at random from the SAS data base and ascertained the distribution of attrition rates across sites. We restricted the universe of potential sites for this analysis to sites that had a minimum of 25 respondents (controls and participants) at the initial data collection. For each of these sites, we then computed the follow-up rates at 3 and 8 months postprogram. The distribution of follow-up rates across the 10 sites is shown in Table A.ll. It will be seen that at 3 months postprogram four sites had follow-up rates for participants of 75 percent or higher. For the control samples, only two sites had such high follow-up rates. While the attrition analysis at 3 months is somewhat encouraging, the results at 8 months are quite disappointing. Only one site maintained a 75 percent follow-up rate for participants at 8 months, and no site attained this rate for its control samples. In addition to the analysis of attrition rates, we also attempted to assay the distribution across sites of the predictive validity of the SAS attitude and knowledge measurements. Here again, we selected 10 sites at random from the SAS data base. This time, however, we restricted our analysis to sites that had a minimum of 100 program participants from whom data had been obtained at 3 months postprogram. This was done to provide an adequate sample size for calculating the correlation coefficients between the (immediate) postprogram SAS measurements and the participants' "activity status" at 3 months

OCR for page 193
218 lo o U] a) U) a) a) o hi; .,, U] U] o U] a' ._' .,, a) as V _! a) o o . - Q a! U2 . - in a' .,, V] V . ~ a) U) A: O m c: ~ fir; o U] O U2 3 a: O O - a) SO ~ 0 in C) O ~ U] ::5 ~ ~ - - ~ - Q Al Q O O Z C: o . o ~ ~ o ~ ~ o o ~ o o ~ o o o o ~ 0 ~ 1- 1 O 0 ~ 1 1 a, ~n a ~n V] o o o o o o o o o o ~r ~ c~ d4 ~ o u~ ~ u~ ~ ~ ~ ~r oo o o o ~ o ~ o o o U] U] ~ U] U2 ~ U] ~ ~ ~tJ1 V .,1 a~ .,1 a a~ U] u O - ~ ~ a ~ 3 ~ - - 0 0 ~ ~ a) U2 a ~ ~ O ~ ~ ~ ~ S (Q 1 1 U] tO V Q Q Q 0 0 0 0 0 a) a) ~ ~Q cn u' ^ O ~ P ~s ,= H ~1 ~<1) 00 - ~ ~ ~O 3 ~ ~ _ ~O ~ ~ cn ~ Q, Q, ~ E~ u ~ ,~ a) U2 ~n ~ a) au 1~ C: ~ V ~ ~ ~ ~ ~s O ~ .,, ~ U] O ~ ~ ~ Q ~ Q :^ P4 ~ 0) ~5 ~ ~ ~ ~ o ~ V ~ ~ ~ ~ s ~ .,~ ~ ~ ~:s ~U] o ~ U2 v, O Q a) v a) ~I s - O a; s" 0 C) rQ O h ~ a, ~ Q 0 ~U] N - U~, ,= U] u, ~ O~ - O 3 - ~ V ~s :' a) ~ ~v . u, s ~ ~ ~ a) ~ ~0 a) =: ~ a' c: u~ S ~ ~ 3 ~o V u,tO ~ ~ "l ~ c: ~ ~ ~ ~:^ V ~- ~ 3 ~ ~ u2 0 ~ ~ u, X ~ ~ u~ ~ O s ~ ~ ~ ~ V u, ~ s O ~ ~ ~ ~ ~ u~ ~ vv ~ O 0) 3 ~ O u~ a) <: 0 -1a) ~ ~ v~ ~ ~a' ~ ~ a' ~u~ o t~ i-~ ~ 0 3 c: ~ s~ 0~ ~a' v a) ~ ~ c: 0 s a) ~3 ~ s~ u2 ~ ~ ~ u~ ~5 a) ~ a Q 0~ ~: Q u2 ~ O G) : o03 ~ E: ~ ~"l V S o ] ~ ~ ~ ~ ~Q o u] ~ ~:5 a) a) ~ ~ ~ v a) ~ 1 ~ ~n ~~ O Q t ~ 3 ES O O ~ S E: U2~1 :D ~t ^ ~ ~ ~ ~ - _ O ` ~ ~O ~ ~ ~ O O c: ~ ~ ~ ~Q O u~ ul s ~:: a~ ~v ~ ~l a) c: ~ O a~ ~ H ~ ~a~ ~ V ~ ~ ,~ ~>, 0 ~J O a) u' a) ~ ~ .= ~_' ~ td tO ~ 1 ^ O v 0 ~.,, a) ~ 0 3 ~ ~ ~ h ~ ~ 3 ~ o U] S ~ ~ ~ 1 =.,, u' ~ ~ ~a' ~ a) ~ u~ =: u, 0) 0) Q s~ a) 0 ~ ~ ~ ~ ~ u, a) ~ ~ ~ ~ 0 0 V ~ - ~l ~ ~ ~ a' ~ ~: ~l a) 0 Q) ~ ~ C: 0 s ~ a) ~ v ~: 0 a) ~ 0 ~ 0 ~ ~ ~ s 0 ~ ~ ~ ~ v ~ 0 ~a) ~ U] a n ~ ~ v O ~ ~ ~ ~U O 0 ~ r~ V ~ ~ a) a) :~ ~ ~ O a) ~ ~ ~ ~ 0 ~ ~s 0 ~ Q O 3 0 (J) ~ 0) ~-O ~3 rn ~ ~ ~ S 3 3 0 a) O c: a) 0 a) ~ ~ ~ 0 ~ ~ ~ 0 ~ ~0 0) 3 0) c: .- a) ,t0 .- o, ~ ~ ~ ~f~ ~ "Q ~ O V U] ~ ~ u, ~ ~ U] 1 aJ ~ d ~ ~ ~ a, ~ ~1 ~ ~ U] s ~ ~ O 0) 0) ~ ~ Q ~ u~ ~: O E~ ~ a~ ~ ~ ~ ~ ~ .- ~ .- ~ ~ ~ ~ ' ~ ~ ~ =: ~ O a) 1 ~ ~ ~ ,(' .. ~ u~ ~ ~ 3 ~ c5 c: ~ 4~ O u, u~ ~ O ~ ~ `: O ~1 a) ~ a) ~ ~l ~ ~ 0 ~ a E~ u~ V tY O ~ ~ ~V2 ~ ~ O ~ ~ ~ O o ~ u~ V ~ O O Z Q 0) ~ ~ Ln '< ~ u2 - .Q 0

OCR for page 193
219 postprogram. Table A.12 presents the results of this analysis. (See the notes to Table A.12 for definitions of sample selection criteria and the activity status variable.) It will be seen from Table A.12 that no predictive validity for any measurement at any site exceeded 0.30. The vast majority of correlations (48 of the 60 that could be calculated) were in the range 0.0 to 0.20. Indeed, over half of the coefficients we calculated (36 of 60) were less than 0.10. While it would be a mistake to overgeneralize based on data from such a small number of sites, these data on attrition and measurement validity do not encourage the belief that there exist a sizable number of sites in the SAS data base that gathered high-quality data (where quality is indicated by the attrition of the sample and the predictive validity of the measurements). REFERENCES Educational Testing Service 1980 The Standardized Assessment System for Youth Demonstration Projects. Youth Knowledge Development Report No. 1.6. Washington, D.C.: U.S. Department of Labor. 1982 Demonstration Programs for Youth Employment and Training: The Evaluation of Various Categories of YEDPA Program Sites. Princeton, N.J.: Educational Testing Service. Office of Management and Budget 1977 Memorandum to heads of executive departments, February 17, 1977. Sewell, W., and R. Hauser 1975 Education, Occupation and Earnings. Taggart, R. 1980 Youth Knowledge Development: Unpublished manuscript. New York: Academic Press. The Process and the Product.