Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 315
APPENDIX E 315
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
APPENDIX E
The Rating of DOT Worker Functions and
Worker Traits
PAMELA S.CAIN and BERT F.GREEN, JR.
In the course of producing the DOT, jobs and occupations were rated for a
variety of characteristics, called worker functions and worker traits. These
ratings and the procedures by which they were assigned are described in
chapter 6. Because of the widespread and varied use made of these ratings both
inside and outside the U.S. Employment Service, it is especially important that
they be accurate—that is, that they measure what they purport to measure.
The ratings assigned to DOT occupations, like all such ratings, are subject to
various influences, some of which are legitimate bases of variation and some of
which are not. An occupation might be rated differently on a given
characteristic not only because it actually requires different levels or amounts of
the characteristic in question but also because of the particular circumstances in
which the ratings were made, characteristics of the raters, specific features of
the occupation itself, etc. Such ratings invariably entail some measurement
error; they reflect, to some extent, characteristics other than those they are
supposed to measure.
There are several reasons to suspect that the ratings of DOT occupations for
worker functions and worker traits are subject to error. First, the factors that the
DOT scales purport to measure are vague and ambiguously defined. It is not
readily apparent what they are intended to measure, i.e., what the “true” scores
of the phenomenon being rated should be. Worker functions, for example, are
said to “express the total level of complexity of the job-worker situation” (U.S.
Department of Labor, 1972:5), but
OCR for page 316
APPENDIX E 316
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
“complexity” is never defined or specified further. Sidney Fine, who was
instrumental in developing worker functions, has also written that they reflect
skill estimates (Fine, 1968a:374) and worker autonomy, i.e., the extent to which
workers are engaged in “prescribed versus discretionary duties” (Fine, 1968b:7).
The reliability of the ratings is also called into question by the extremely
high correlations (of the order of .90) between some of them and measures of
the social status or prestige of occupations. This concern has been voiced about
general education development (GED) by several researchers, notably Siegel
(1971) and Duncan et al. (1972).1
Concern about the reliability of the DOT factors arises for other reasons as
well. Analysts reported difficulty in assigning scores on certain factors,
especially specific vocational preparation (SVP) and aptitudes. Reasons cited for
this were the ambiguity of the factors and the inadequacy of instructions
contained in the Handbook for Analyzing Jobs (U.S. Department of Labor,
1972). Furthermore, production of the fourth edition DOT was highly
decentralized. Analysts were spread across 10 field centers and 1 special
project, and there was reportedly little communication or coordination of effort
among them, nor were their activities closely supervised or standardized by the
national office.
In order to assess the impact of several potential sources of variation in
these ratings, we carried out an experimental study to (1) determine the overall
level of reliability for selected worker functions and traits and (2) identify
significant bases of variations in or influences on the ratings. In the latter regard
we investigated whether the ratings were influenced by (1) analysts' field center
affiliation, (2) the type of occupation being rated, i.e., whether in service or
manufacturing, (3) the general education development level of the occupation,
(4) the particular job description (one of two) of the occupation being rated, and
(5) the particular analyst making the rating. The interactions of these various
influences were also taken into account in the design and analysis of the study.
The specific effects, along with their labels and a brief description of each, are
given in Table E-1.
STUDY DESIGN
With the assistance of national office personnel we asked six experienced
job analysts at each field center with at least 6 months' training and experience
to rate one of two sets of job descriptions. If more than six
1lf an occupation's social standing is indeed dependent on its functional requirements,
as some theorists, notably Davis and Moore (1945) have argued, then it could be argued
alternatively that correlations of this magnitude are evidence of the validity of the worker
functions.
OCR for page 317
APPENDIX E 317
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
experienced analysts were available at a given center, we chose six at random to
participate in the study. Three centers with fewer than six experienced analysts
(Florida, Texas, and Utah) were eliminated from the analysis, although they did
participate in the actual ratings task. Analysts at the Arizona special project
participated in a pretest of the ratings task. Each set of job descriptions
represented 24 distinct DOT occupations. To select occupations and job
descriptions, we created two types of jobs: (1) “service,” which consisted of
base title occupations in the clerical and sales and service categories of the DOT,
and (2) “manufacturing,” which consisted of base title occupations in the DOT
categories of processing, machine trades, bench work, and structural
occupations. Preliminary analysis established that the variation in ratings over
all occupations is
TABLE E-1 Sources of Variation in Ratings of Occupational Characteristics
Source Label Description of Effect
1. T type of occupation
2. G level of general educational development (GED)
3. TG interaction of job type and GED
4. J(TG) jobs nested within the interaction of job type and GED
5. C center
6. CT interaction of center and job type
7. CG interaction of center and GED
8. CTG interaction of center with interaction of job type and GED
9. CJ(TG) interaction of center and jobs nested within interaction of job
type and GED
10. DJ(TG) interaction of description and jobs nested within the interaction
of job type and GED
11. CDJ(TG) interaction of center with interaction of description and jobs
nested within interaction of job type and GED
12. R(CD) raters nested within the interaction of centers and description
13. RT(CD) interaction of raters and job types nested within interaction of
centers and description
14. RG(CD) interaction of raters and GED nested within interaction of
centers and description
15. RTG(CD) interaction of raters with interaction of job type and GED nested
within interaction of centers and description
16. RJ(TGCD) residual
LEGEND: T, one of two types of occupation: service versus manufacturing; G, one of four
levels of GED; J, one of three DOT occupations within eight categories of job type by GED;
C, one of seven field centers; D, one of two job descriptions for given DOT occupation; R,
one of 42 individual occupational analysts.
OCR for page 318
APPENDIX E 318
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
approximately the same in these two categories (the standard deviation of GED
for service occupations is .784 versus .880 for manufacturing occupations; the
range of GED is 1–6). This equivalence offered some measure of confidence that
we could make valid comparisons between the reliabilities of the two categories.
Within these two broad categories of occupations, titles were stratified by
four levels of GED. A set of base title occupations was then selected at random
within each of the eight combinations of job type (2) by GED (4). The source
files of these occupations were inspected in order to locate titles with two
adequate job descriptions.2 Descriptions were judged adequate if items 4 (job
summary) and 15 (description of tasks) of the job analysis schedule had been
completed according to instructions in the Handbook for Analyzing Jobs (U.S.
Department of Labor, 1972). Thus the description had to contain information on
the purpose and nature of the job; the significant involvement of workers with
data, people, and things; the level of such involvement; and a detailed
description of job tasks with an indication as to the amount of time spent on
each. If fewer than two acceptable descriptions were available for an
occupation, we eliminated it and proceeded to the next randomly selected
occupation in the set. If more than two acceptable descriptions were available
for an occupation, two of the descriptions were chosen at random. In this way,
two job descriptions for each of three base title occupations were selected for
eight combinations of job type by GED. (It might be noted in passing that we had
to go through 92 DOT codes in order to obtain the necessary two descriptions for
each of 24 occupations, yet another indication of the poor quality of the DOT
source data.)
Fifteen occupations (16 percent of the total number of codes we inspected)
were eliminated because we could not match the code we had obtained from the
DOT summary tape (provided by the national office) to a code in the source data.
In most such cases one of the worker function codes on the tape was one point
lower than it was in the source data. The systematic nature of the discrepancy
resulted from some last-minute changes in occupational codes prior to
publication of the DOT that were apparently not incorporated on the summary
tape.
The results are based on the ratings of 42 analysts at 7 field centers. Each
analyst rated 24 job descriptions taken verbatim from job analysis schedules.
Each job description was rated with respect to worker functions (DATA, PEOPLE,
and THINGS); training times (the reasoning, math, and language components of
GED, plus SVP); all six physical capacities; and all
2The source materials for the fourth edition DOT are housed at the North Carolina field
center. We wish to express our gratitude to the staff there for the assistance we received
in choosing job descriptions for our study.
OCR for page 319
APPENDIX E 319
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
seven environmental conditions. Each description was thus rated on 20 separate
factors. The ratings task and the rating form used closely approximated the
ratings made in the normal course of job analysis for the DOT, although analysts
were unable to observe the jobs directly, as they would usually do.
The rating task was administered to the 42 raters at their respective centers
on June 11, 1979, under controlled conditions. Analysts worked in conference
rooms rather than at their desks and were proctored by the field center
supervisor or a designated assistant. There was no time limit and analysts were
instructed to work at their normal pace. Analysts were also instucted not to
consult the DOT or one another while making the ratings. Ratings were assigned
according to procedures contained in the Handbook for Analyzing Jobs. Raters
were free to consult the Handbook for additional instruction or bench marks, if
needed.
Supervisors were not requested to keep track of the time required to
complete the ratings, but according to informal reports most analysts finished in
about 4 hours. On the last page of the questionnaire, analysts were invited to
comment on the ratings task. Eighteen of the 42 raters did so. Almost every
comment noted that the descriptions contained insufficient information to rate
jobs for physical capacities and environmental conditions. Some analysts noted
the same difficulty for SVP. Despite this difficulty, analysts completed almost all
of the ratings, and there were few missing data. Of the total of 20,160 ratings
(42 raters rating each of 24 jobs for 20 factors), only 21 were not made. For
these, missing data were replaced with sample means. The amount of missing
data is so small that this replacement procedure should have a negligible effect
on our estimates.
RESULTS
An analysis of variance technique is used to calculate the reliability of the
ratings for the worker functions (DATA, PEOPLE, and THINGS), GED, SVP, STRENGTH,
and LOCATION factors. For a discussion of the rationale for and use of the
analysis of variance to calculate reliabilities, see, for example, Lindquist (1953).
Generally, the advantage of this method over other methods is that it enables the
user to disentangle the effects of separate influences on the ratings and hence to
estimate the amount of error due to each source. Complete results from the
analysis of variance are presented in Tables E-2, E-3, and E-4. (These tables are
not discussed but are provided for the interested reader.)
Table E-5 presents three estimates of the reliability of each rating, making
different assumptions for each about what constitutes “error
OCR for page 320
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-2 Complete Analysis of Variance for DATA
Effect Degrees of Mean Squares (MS) Denominator Denominator DF F Divisor Variance Component
Freedom (DF) (VC)
1. T 1 381.349 8.103 16 47.06 504 .741
APPENDIX E
(7.768+.897 −.562)
2. G 3 858.202 7.967 16 107.72 252 3.374
(7.768+.761 −.562)
3. TG 3 54.791 8.094 16 6.80 126 .371
(7.768+.888 −.562)
4. J(TG) 16 7.768 2.361 24 3.30 42 .129
(2.212+.711 −.562)
5. C 6 2.199 — — — 144 —
6. CT 6 1.526 — — — 72 —
7. CG 18 .930 — — — 36 —
8. CTG 18 .734 — — — 18 —
9. CJ(TG) 96 .707 — — — 6 —
10. DJ(TG) 24 2.212 .562 448 3.90 21 .079
11. CDJ(TG) 144 .711 .562 448 1.26 3 .050
12. R(CD) 28 1.493 .562 448 2.65 24 .039
13. RT(CD) 28 .897 .562 448 1.60 12 .028
14. RG(CD) 84 .761 .562 448 1.35 6 .038
15. RTG(CD) 84 .888 .562 448 1.88 3 .109
RJ(TGCD) 448 .562 — — — 1 .562
16.
TOTAL, 5.515; minimum, 4.615; medium, 4.694; maximum, 4.953; r(minimum)=4.615/5.515=.836; r(medium)=4.694/5.515=.851; r(maximum)
=4.953/5.515=.898.
320
OCR for page 321
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-3 Complete Analysis of Variance for PEOPLE
Effect Degrees of Mean Squares (MS) Denominator Denominator DF F Divisor Variance Component
Freedom (DF) (VC)
1. T 1 218.215 8.868 16 24.61 504 .415
APPENDIX E
(8.365+1.012−.509)
2. G 3 1,108.305 8.563 16 129.43 252 4.364
(8.365+.707 −.509)
3. TG 3 18.486 — 16 — 126 —
4. J(TG) 16 8.365 8.55 24 .98 42 0.000
(8.428+.631 −.509)
5. C 6 1.766 — — — 144 —
6. CT 6 .789 — — — 72 —
7. CG 18 1.843 — — — 36 —
8. CTG 18 .760 — — — 18 —
9. CJ(TG) 96 .581 — — — 6 —
10. DJ(TG) 24 8.428 .509 448 16.56 21 .377
11. CDJ(TG) 144 .631 .509 448 1.24 3 .041
12. R(CD) 28 2.089 .509 448 4.10 24 .066
13. RT(CD) 28 1.012 .509 448 1.99 12 .042
14. RG(CD) 84 .707 .509 448 1.39 6 .033
15. RTG(CD) 84 .787 .509 448 1.55 3 .093
RJ(TGCD) 448 .509 — — — 1 .509
16.
TOTAL, 5.940; minimum, 4.779; medium, 5.156; maximum, 5.431; r(minimum)=4.779/5.940=.804; r(medium)=5.156/5.940=.868; r (maximum)
=5.431/5.940=.914.
321
OCR for page 322
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-4 Analysis of Variance Results: Degrees of Freedom and Mean Squares
Degrees Mean
of Squares
Freedom DATA PEOPLE THINGS GED- GED-MATH GED- SVP STRENGTH LOCATION
Effect
APPENDIX E
(DF) REASON LANGUAGE
1. T 1 381.349 218.215 37.145 212.209 102.542 275.525 274.480 24.453 44.587
2. G 3 858.202 1,108.305 22.233 230.109 135.445 211.657 1,003.190 18.009 12.032
3. TG 3 54.791 18.486 183.072 1.845 10.805 5.496 5.199 7.223 12.534
4. J(TG) 16 7.768 8.365 43.953 4.411 3.270 4.130 18.391 10.894 7.357
5. C 6 2.199 1.766 37.312 2.753 5.794 1.455 11.609 2.626 .332
6. CT 6 1.526 .789 7.517 .950 .657 .446 3.440 .971 .282
7. CG 18 .930 1.843 4.973 .710 .861 .504 .862 .396 .217
8. CTG 18 .734 .760 1.725 .333 .345 .273 .783 449 244
9. CJ(TG) 96 .707 .581 2.201 .208 .279 .213 .788 .209 .195
10. DJ(TG) 24 2.212 8.428 28.199 2.627 .968 1.803 5.444 3.158 .266
11. CDJ(TG) 144 .711 .631 2.875 .328 .460 .382 1.029 .258 .134
12. R(CD) 28 1.493 2.089 8.201 .651 2.852 2.810 6.108 .906 .106
13. RT(CD) 28 .897 1.012 3.610 .688 .888 .689 1.313 .377 .122
14. RG(CD) 84 .761 .707 3.843 .317 .546 .478 1.000 .358 .113
15. RTG(CD) 84 .888 .787 2.783 .229 .236 .420 1.022 .318 .107
RJ 448 .562 .509 2.058 .214 .201 .204 .558 .192 .099
16.
(TGCD)
322
OCR for page 323
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-5 Variance Components for Significant Effects and Estimated Reliabilitiesa
DATA PEOPLE THINGS GED-REASON GED-MATH GED-LANGUAGE SVP STRENGTH LOCATION
Effect
1. T .741 .415 .000 .411 .196 .538 .507 .026 .074
2. G 3.374 4.364 .000 .895 .523 .822 3.906 .027 .018
APPENDIX E
3. TG .371 — 1.098 — — — — — —
4. J(TG) .129 .000 .356 .040 .049 .051 .297 .185 .168
5. DJ(TG) .079 .377 1.245 .115 .036 .076 .233 .141 .008
6. CDJ(TG) .050 .041 .212 .038 .086 .059 .157 .022 .033
7. R(CD) .039 .066 .256 .018 .110 .109 .231 .030 .000
8. RT(CD) .028 .042 .092 .040 .057 .040 .063 .015 .002
9. RG(CD) .033 .033 .298 .017 .057 .046 .074 .028 .002
10. RTG(CD) .109 .093 .242 .005 .012 .072 .155 .042 .003
11. RJ(TGCD) .562 .509 2.058 .214 .201 .204 .558 .192 .099
TOTAL 5.515 5.940 5.917 1.793 1.327 2.017 6.181 .708 .407
r(minimum) .84 .80 .25 .75 .58 .70 .76 .34 .64
r(medium) .85 .87 .46 .82 .61 .74 .80 .54 .66
.90 .91 .65 .88 .85 .90 .92 .73 .76
r(maximum)
aReliabilities are calculated under three different assumptions about sources of error. See text for explanation.
323
OCR for page 324
APPENDIX E 324
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
variance.” Reliabilities are calculated from variance components estimated
according to procedures in the work of Green and Tukey (1960). The variance
components shown in the body of Table E-5 are the proportion of variation in a
given characteristic due to particular effects. Variance components were
calculated only for effects that were statistically significant at the 1-percent
level of probability. Comparing all the analyses, we found that in most of them
a standard pattern emerged in which the effects related to analysts' field center
affiliation (effects C through CJ(TG)) were nonsignificant. Thus variance
components were not calculated for these effects.
The nonsignificance of field center effects is a substantively important
finding. It is also somewhat unanticipated, given the lack of coordination
among field centers. What it means is that ratings do not vary according to the
particular features of field centers.
Reliabilities are calculated across all 24 occupations. Each reliability
represents the proportion of total variation due to true sources. In all the
analyses the effects of occupation, type of job (manufacturing versus service),
and the general education development level of the job (T through J(TG)) are
considered to be true or valid sources of variation in the ratings. In all, the
residual (RJ(TGCD)) is assumed to be random or error variance. As noted,
however, we made alternative assumptions about what other effects constituted
error. In calculating the first set of reliability estimates (labeled “minimum”) we
considered variation due to the particular description being rated (DJ(TG)) and
variation due to the assorted rater effects (CDJ(TG) through RTG(CD)) to be
error, in addition to the residual. This set of reliabilities—the most stringent,
lower-bound estimate—gives us a sense of the reliabilities that would be
obtained if each occupation were rated by one rater on the basis of only one
description.
Under the second assumption, variation due to different descriptions is
considered to be true or valid, and only rater effects in addition to the residual
are considered to be error. These estimates of reliability (labeled “medium”) can
be interpreted as the reliabilities that would be obtained if each occupation were
rated by one rater on the basis of two job descriptions.
The third set of reliabilities (labeled “maximum”) relaxes the assumptions
about error even more. In these estimates, only the residual effect is considered
to be error; the differences between raters and field centers are taken as valid
sources of variation.
The difference between reliabilities in the first and second set of estimates
indicates the contribution of the job description effect per se to the total
variation in the ratings. Similarly, the difference between the
OCR for page 325
APPENDIX E 325
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
second and third reliability estimates indicates the contribution of the rater
effects per se.
Turning to the results in Table E-5, because of the presence of significant,
sometimes relatively large, job description and rater effects, we note that the
three sets of estimates often differ considerably from one another. The impact
of the job description effect is best seen by comparing the first and second sets
of reliability estimates for each factor. While differences between the two sets
average .08, they range from .01 (DATA) to .21 (THINGS), an indication that the
ratings on some factors are more sensitive than others to particular features of
the job description. The effect of job description is relatively small for DATA,
GED-MATH and GED-LANGUAGE, SVP, and LOCATION. It has a larger impact on the
remaining ratings, especially those for PEOPLE, THINGS, GED-REASON, and
STRENGTH. Comparison between the second and third reliability estimates reveals
large rater effects on all the ratings. The effect is especially large for THINGS (a
difference of .19), GED-MATH (.24) and GED-LANGUAGE (.19), and STRENGTH (.19).
Across characteristics the reliabilities also vary greatly. Under the most
stringent assumptions (r(minimum)), reliabilities range from a low of .25 for
THINGS to a high of .84 for DATA. The second set of estimates probably embodies
the most realistic assumptions about what constitutes error. These reliabilities
are not especially high, ranging from .46 for THINGS to .85 for DATA. Under the
most relaxed assumption, reliabilities (r(maximum)) are up to fairly acceptable
levels, in the high .80's and low .90's for all of the ratings except THINGS,
STRENGTH, and LOCATION. It should be kept in mind, however, that in these
estimates, rater variation is considered to be true variance, hardly a tenable
assumption. These estimates, in fact, are only useful insofar as they enable us to
calculate the magnitude of variation due to raters.
The especially low reliabilities of the THINGS and STRENGTH scales may well
result from insufficient information in the description being rated. Of the 18
analysts who made comments at the end of the study, most noted that the
descriptions contained insufficient information to rate jobs for physical
capacities and environmental conditions. Although a similar difficulty was not
reported for the THINGS factor, the scale used to rate THINGS is almost completely
dominated by functions that deal with the relation of the worker to machines
(five of its eight levels). Thus the lower reliabilities on THINGS might be due to
the difficulty of assigning ratings to occupations with tasks in which machines
are unimportant.
Overall, the reliabilities are low enough to cause concern. The large effects
of job description (the difference between the medium and minimum estimates)
reveal that for each of the characteristics there is
OCR for page 326
APPENDIX E 326
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
considerable diversity in the description of jobs classified within an occupation.
Certainly there is more than would be assumed from a reading of the Definition
Writer's Manual (U.S. Department of Labor, 1974) or from the fact that,
typically, only a small number of jobs are analyzed for each occupation (see
chapter 7). Moreover, although there is no significant difference beween ratings
across field centers, there are significant differences across analysts within field
centers. Thus ratings are substantially affected by the idiosyncrasies of
individual analysts.
The implications of these results are twofold. If a reliable rating is desired
of a given characteristic for a given occupation, it will be necessary both to use
more raters and more descriptions per occupation and to average the sets of
ratings thus obtained. The number of raters and descriptions needed to achieve a
desired level of reliability can be estimated from the results presented here
using the general Spearman-Brown formula (see, for example, Allen and Yen
(1979)). Thus starting with an initial r (medium) of .80 (for example, SVP), a
reliability of .89 can be achieved by increasing the number of raters to two; if
three raters are used, a reliability of .93 can be obtained. Substituting jobs for
raters and using the same procedures, with r (minimum) as the base, we find
that by having the raters rate two job descriptions per occupation the reliability
of SVP will increase from .76 to .86; by having raters rate three job descriptions
a reliability of .90 can be obtained. Therefore for all of the factors, both the
number of raters and the number of jobs rated per occupation will need to be
increased somewhat in order to achieve satisfactory levels of reliability. The
increase needed will be relatively smaller for those factors with higher initial
reliability.
In a second analysis of these ratings we calculated reliabilities separately
for the two types of jobs—service and manufacturing—in order to see whether
the ratings were less reliable for the service category. We reasoned that they
might be because the scales were developed during a historical period in which
manufacturing jobs predominated. The scales might as a result be better suited
to the rating of manufacturing jobs. Furthermore, because most occupations
contained in the DOT are in manufacturing industries, analysts are presumably
more practiced in rating such occupations. The reliabilities by job type—service
versus manufacturing—are presented in Table E-6. These reliability estimates
were calculated using the same set of assumptions about error that were used in
previous analysis. For all the characteristics with only one exception
(STRENGTH), all three estimates of reliability are lower for the service
occupations than they are for manufacturing.
These results suggest that particular attention should be paid to the rating
of service occupations in order to bring their reliabilities up to par
OCR for page 327
APPENDIX E 327
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-6 Estimated Reliabilities, by Type of Occupationa
Characteristicb Service Manufacturing
DATA
r(minimum) .694 .880
r(medium) .727 .889
r(maximum) .798 .918
PEOPLE
r(minimum) .666 .908
r(medium) .795 .933
r(maximum) .830 .972
THINGS
r(minimum) .107 .186
r(medium) .329 .406
r(maximum) .632 .637
GED-REASON
r(minimum) .652 .694
r(medium) .717 .794
r(maximum) .792 .888
GED-MATH
r(minimum) .422 .629
r(medium) .431 .682
r(maximum) .771 .878
GED-LANGUAGE
r(minimum) .552 .690
r(medium) .609 .739
r(maximum) .853 .862
SVP
r(minimum) .724 .768
r(medium) .739 .834
r(maximum) .873 .925
STRENGTH
r(minimum) .435 .138
r(medium) .594 .495
r(maximum) .724 .705
aReliabilities are calculated under three different assumptions about sources of error. See text for
explanation.
bReliabilities for the LOCATION factor could not be calculated separately for service and
manufacturing occupations because there was no variation on this factor for the manufacturing
occupations.
OCR for page 328
APPENDIX E 328
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
with those for manufacturing occupations. Although the addition of more
raters and descriptions would raise the reliabilities for service occupations, the
results of this analysis also suggest that other steps need to be taken. Additional
training and practice in the rating of service occupations may be needed, or
perhaps better guidelines and bench marks in the Handbook instructions. More
fundamentally, the scales used to rate occupations for these characteristics may
need to be adapted to the unique features of service jobs.
Analysis of the ratings of the remaining physical demands and
environmental conditions requires a different approach. These variables are
dichotomous and take on only one of two values, signifying either the presence
or the absence of a given characteristic. To assess the reliability or consistency
of ratings on these factors, two types of analyses were conducted. First, for each
characteristic the modal or most frequently occurring rating was determined for
each of the 24 DOT occupations. Consensus among raters was then calculated as
the proportion of raters giving the modal response. If all raters agreed that a
given characteristic was present, the proportion is 1.00, indicating perfect
consensus. Table E-7 presents estimates of consensus obtained in this way.
The average consensus across jobs (last row of the table) varies
considerably from scale to scale. Ratings are least consistent for TALK (.84) and
SEE (.68). Except for these ratings, however, the overall proportion of agreement
is quite high, at least .87 for NOISE, with a high of .96 for CLIMB.
A second feature of these results is that the poorest consensus among raters
(lowest proportions) occurs disproportionately for occupations in the service
category (top half of table). These results echo the finding that reliabilities are
lower for service than for manufacturing occupations. A proportion of less
than .80 (boldface in the table) occurs in 29 percent of the 144 rater-by-job
combinations for the service jobs but in only 17 percent of the 144
combinations for manufacturing jobs.
To assess the consistency of individual raters in rating each factor, we
calculated the correlation across all jobs between the rating of each rater and the
average rating of all other raters. Since half of the raters rated the first set of job
descriptions for the 24 occupations and half rated the second set, the two groups
of raters were analyzed separately. Table E-8 gives the correlations of each rater
with the average of the other 20 raters in his or her set. For raters who had no
variance on the characteristic in question across all jobs (that is, raters who
rated all jobs the same way on a given characteristic), this correlation could not
be calculated. These ratings are denoted by asterisks in the table.
Results indicate that there is little problem with the consistency of
OCR for page 329
APPENDIX E 329
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
ratings for CLIMB, TALK, and HAZARDS, as witnessed by the predominance of
correlations of .80 and above. The low correlations for COLD, HEAT, WET, and
ATMOSPHR are a result of the infrequency of a positive rating and do not
necessarily reflect inconsistency. The low correlations for STOOP, REACH, SEE, and
NOISE, on the other hand, are indicative of inconsistency among the raters, since
these characteristics occur sufficiently often to compute a meaningful correlation.
Generally, these results suggest that in order to achieve a greater degree of
consistency among raters, given the amount of information available in the
description, ratings on all these dichotomous variables should be established by
pooling the judgment of at least three or four raters (see the technical note at the
end of this appendix). For the variables with the lowest degree of consistency, 8
or 10 raters would be needed to achieve stable and consistent responses. As
mentioned previously, however, many analysts felt that the descriptions
contained insufficient information with which to assign these particular ratings.
Perhaps if additional information were incorporated into the description, higher
levels of consistency would be achieved with the same, or only a slightly larger
number of raters.
TECHNICAL NOTE
More precise estimates of the number of raters needed to increase alpha
reliability to desired levels can be obtained using the following procedures:
Coefficient alpha (α), the reliability (homogeneity) of a sum or average of k
homogeneous items or raters, is given by
where is the variance of the ith item and is the variance of the sum
of k items. If we let c • be the average intercovariance, c • equals If we
also let v • be the average variance, then alpha can be written as
where
OCR for page 330
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-7 Rater Consensus by Occupation : Proportion of Modal Responsesa
CLIMB STOOP REACH TALK SEE COLD HEAT WET NOISE HAZARDS ATMOSPHR
Occupation
Service
.76 .62 .52
1. 1.00 .98 .98 .90 .90 .98 .88 1.00
APPENDIX E
.55 .57 .64
2. .98 1.00 1.00 1.00 .90 .98 .86 1.00
.55 .74
3. 1.00 .98 .90 1.00 1.00 1.00 .81 1.00 .98
.74 .71
4. 1.00 1.00 .90 1.00 1.00 1.00 1.00 1.00 1.00
.52
5. 1.00 1.00 .90 .86 1.00 1.00 1.00 .98 .98 1.00
.52 .64 .52
6. .88 .98 .83 .95 .98 .90 .83 .98
.79 .71 .57
7. 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
.67 .76 .55
8. 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
.57 .79
9. 1.00 1.00 .88 .90 .98 .98 .98 .98 1.00
.64 .74 .67 .57 .50 .60 .60
10. .95 .86 .86 .86
.50 .57
11. .98 1.00 .90 1.00 1.00 1.00 1.00 1.00 1.00
.62 .69 .57 .79 .52 .52 .62 .79 .52
.90 .88
12.
330
OCR for page 331
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
Manufacturing
.55
13. 1.00 1.00 .98 1.00 1.00 .90 .90 1.00 .95 .98
.76 .62 .76
14. .98 1.00 .86 .98 .81 .83 .83 .86
.71 .62
15. 1.00 .95 1.00 .98 1.00 1.00 1.00 .83 .81
APPENDIX E
.79
16. 1.00 1.00 1.00 .90 1.00 1.00 1.00 .88 .95 1.00
.55 .79
17. 1.00 .86 .98 1.00 .98 .88 .88 .86 .83
.74
18. 1.00 .98 1.00 .98 1.00 1.00 1.00 .98 .98 1.00
.74 .79
19. 1.00 1.00 1.00 1.00 1.00 1.00 .93 1.00 .88
.71 .60
20. 1.00 .98 1.00 1.00 1.00 .93 .90 .88 .98
.60
21. 1.00 .95 .88 .93 1.00 1.00 .83 .98 .93 .83
.64
22. 1.00 .86 .93 .83 .83 1.00 1.00 1.00 .81 .95
.71 .71 .76
23. 1.00 .98 .83 1.00 1.00 .95 .76 .90
24. 1.00 .88 .98 .86 .93 1.00 1.00 1.00 .88 .95 .90
Average
.68
.96 .88 .88 .84 .98 .94 .90 .87 .90 .91
proportion
aProportions less than .80 are in boldface for easy identification.
331
OCR for page 332
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
TABLE E-8 Correlation of Raters With the Average of All Other Raters, Across Occupations by Job Description Seta
Characteristic
CLIMB STOOP REACH TALK SEE COLD HEAT WET NOISE HAZARDS ATMOSPHR
Rater
Set 1
APPENDIX E
.67 * .11 * .67 .66 * .52 .50
1. .86 .84
.23 .15 .32 * * * * .50
2. .83 .80 .81
* .76 .57 * * * .35 * .09 *
3. .86
.55 .42 .72 .23 * * * * .69
4. .87 .81
.72 * .65 * .76 .63 .55 .59
5. .86 .84 .90
.52 .30 .12 * .58 .27
6. .83 −.09 .84 .90 .81
.55 * * .16 .43 .49 .62
7. .87 .83 .90 .87
.51 * * .76 .63 .37 .70 .72
8. .86 .80 .81
* * * .63 * * .67 .15
9. .87 .83 .90
.72 .41 .52 .69 .76 .42 .58
10. .87 .77 −.19 .81
.48 .52 .57 .21 * * * .64 .50
11. .83 .84
.23 .48 * * .57 * .67 .13
12. .86 .83 .93
* .61 .46 * .18 .49 .48 .72 *
13. .86 .92
.72 .11 * .16 .67 .19 .39
14. .86 .84 .92 .90
.06 .10 .16 * * * .34 *
15. .86 .86 .90
* .22 * .74 .43
16. .83 .81 .80 .87 .90 .81
.22 * .62 * .66
17. .83 .84 .84 .87 .90 .81
.28 .30 .44 .42 .27 * .58 .52
18. .84 .90 −.14
.40 * .77 .54 * * * .39 .67 .48
19. .83
.51 .78 .60 .56 * .04 .20 .34 .03 .14
20. .86
.58 * .69 * * .26 .44 .50 .78
.86 .84
21.
332
OCR for page 333
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
.67 .62
α .98 .86 .83 .97 .89 .89 .80 .92 .89
Set 2
.39 .53 .70 .54 .72 .72 .69 .73 .69
1. .89 .93
.51 .78 .76 .68 * .60 *
2. .89 .98 .83 .93
APPENDIX E
.51 * .67 * .65 .56 *
3. .83 .82 .93 .89
.40 .55 .45 * * * .44 .76 .09
4. .83 .84
.51 .27 .29 * * .42 .58 *
5. .83 −.04 .93
.51 * .72 * * * .56 *
6. .83 .98 .93
.51 * * * * .38 *
7. .83 .84 .98 .93
.42 .17 * .68 * * .78 *
8. .84 .93 .89
* * .58 .48 * .60
9. .81 .98 .96 .93 .89
* .31 .33 .75 .09 * * * * * *
10.
.67 .66 .60 .76 .50 .58 .75 .21 .58 .68
11. .83
.61 .44 .23 .41 .13 .37 .09
12. .83 .85 −.08 −.05
* * .38 * * * * *
13. .83 .89 .93
.75 * .36 * .21 .78 .66
14. .80 .84 .98 .83
.42 .61 .36 * .78 .64 .70 .77
15. .83 .84 .98
.48 * * * * .27 * *
16. .83 .85 −.03
.30 .67 .61 .02 .02 .71 .20 .34
17. −.07 −.10 −.07
.53 .62 * * * .21 .36
18. .85 .83 .98 .82
* .53 .54 .50 .72 .78 *
19. .80 .98 .93 .89
.53 .48 .64 .65 * .78 .64 .51 .73
20. .89 .98
.51 .78 .40 *
21. .89 .84 .98 .96 .86 .93 .89
.47
.96 .86 .86 .96 .82 .95 .90 .84 .95 .89
α
aCorrelationsless than 80 are in boldface for easy identification.
*Correlation could not be computed because one variable had no variance.
333
OCR for page 334
APPENDIX E 334
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution.
It follows that
where r • equals c • / • v • and is the so-called intraclass coefficient of
correlation (see Stanley, 1971:398). That is, the logic of alpha is exactly the
same as the logic of the Spearman-Brown formula, with r •, the average
interrater reliability, being stepped up, via Spearman-Brown, to alpha, the
reliability of the average of k raters.
Thus to find r • from alpha, we use formula 4.8 from Allen and Yen
(1979), with our notation:
So, for example, if α=0.98 and k=21,
Then we use formula 4.9 from Allen and Yen, with our notation, to see
how many raters (k) we need to average in order to obtain a given reliability (α):
For example, if α=0.70, but we want a reliability of 0.80 (a modest want),
Thus two raters would be needed to obtain a reliability of 0.80 given the
initial value of alpha.
As a final illustration, suppose that we obtained α=0.67 with 21 raters.
Then
OCR for page 335
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the
original typesetting files. Page breaks are true to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be
retained, and some typographic errors may have been accidentally inserted. Please use the print version of this publication as the authoritative version for attribution. APPENDIX E
The number of raters we would need to raise α from 0.67 to 0.80 is then
335
Representative terms from entire chapter:
job type