Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 59
4
Survey Design and
Statistical Methodology
When considering the collection of earnings data by gender, race, and
national origin, the U.S. Equal Employment Opportunity Commission
(EEOC) confronts several key decisions in the realm of survey design and
statistical methodology. The decisions involve four closely associated is-
sues: collectability, quality (defined as fitness for use), utility for statistical
analysis, and response burden.
In this chapter we discuss the pros and cons of options for collecting
earnings data from employers by adding items to existing equal employ-
ment opportunity (EEO) forms or developing a new collection instrument.
We consider the fitness for use of the data, which addresses the relevance
of the data to users’ needs. We illustrate a model-based approach to identi-
fying the utility of the categorical variables that would also be collected if
wage data are collected. We address the question of employer burden and
assess various options for minimizing the burden on reporting units. The
last issue is complicated by the fact that there is a differential burden faced
by employers of different sizes and with different levels of sophistication
in their human resource and payroll systems. In the case of collection of
earnings data by gender, race, and national origin, one approach may not
be appropriate for all respondents.
59
OCR for page 60
60 COLLECTING COMPENSATION DATA FROM EMPLOYERS
OPTIONS FOR DATA COLLECTION
Modify Current EEO Forms
The most direct solution to obtaining earnings information for EEOC
purposes would be to add earnings items to existing EEO reports. The
collection instrument that it would likely make most sense to modify for
this purpose would be the EEO-1 form, for several reasons. First, it enjoys
substantial coverage. As discussed in Chapter 1, the mandatory EEO-1 re-
ports in 2010 covered about 67,000 private-sector respondents, with about
55 million employees.
Second, the form is part of the everyday operations of the antidiscrimi-
nation agencies. The EEO-1 reports are used by the EEOC and the Office
of Federal Contract Compliance Programs (OFCCP) to trigger enforcement
and technical assistance based on the identification of potential EEO prob-
lems, which is determined from data provided by employers on the reports.
Third, it is expected that the necessary modifications to the EEO-1
form would be quite manageable for both EEOC and the respondents.
The addition of the earnings data could be accomplished in much the same
way that earnings data are collected on the EEO-4 form: that is, either
by adding another column to the form that requests the earnings data or
adding another row for each occupation, which would collect average pay
in addition to the current row that collects number of employees by race/
ethnicity group. An alternate collection design would be to simply duplicate
the existing EEO-1 form and have employers place in the cells of one table
the number of employees, as they now do, and in the second table enter the
pay corresponding to those employees.
Design a New Collection Instrument
A second option would be to design a new and, one hopes, a more
streamlined collection instrument that would collect both employment
and earnings information. The design of such a new instrument could be
informed by the current effort by OFCCP to develop a collection instru-
ment to replace the defunct Equal Opportunity Pilot Survey discussed in
Chapter 2. As this report was being prepared, the OFCCP had issued an Ad-
vance Notice of Proposed Rulemaking (ANPRM) that solicited comments
on several issues important for designing a new collection instrument. For
example, OFCCP asks whether expanded information should be collected
in order for OFCCP to assess whether further investigation into a con
tractor’s compensation decisions and policies is warranted. To collect such
data as average starting or initial total compensation (including paid leave,
health and retirement benefits, etc.); average pay raises; average bonuses;
OCR for page 61
SURVEY DESIGN AND STATISTICAL METHODOLOGY 61
minimum and maximum salary; standard deviation or variance of salary;
the number of workers in each gender and race/ethnicity category; average
tenure; and average compensation data by job series (e.g., all engineers
within a particular department or all secretaries throughout the establish-
ment) would require a substantial redesign of the collection form.
Some of the items that might be useful in understanding the EEO en-
vironment in establishments would likely require open-ended questions,
such as on topics suggested in the OFCCP ANPRM pertaining to company
policies related to promotion decisions, bonuses, shift pay, and setting of
initial pay. This information is difficult to collect and to process efficiently
in a standardized manner.
FITNESS FOR USE
Types of Uses
Quality of information is generally defined in terms of its fitness for
use. This is a multidimensional concept embracing the relevance of the
information to users’ needs and the accuracy, timeliness, accessibility, in-
terpretability, and coherence that affect how the data can be used. There is
a considerable literature on statistical quality and the steps that should be
taken to make data useful for its intended purpose (see, e.g., Brackstone,
1999; U.S. Office of Management and Budget, 2002). The literature high-
lights the importance of clearly understanding the requirements for the data
before collection begins. It is important in this context to consider the need
of the EEOC for earnings information.
The major use of the EEO compensation data would be to aid enforce-
ment of pay discrimination statutes in two ways: targeting enforcement
actions and carrying out enforcement actions against an employer that has
been targeted. Targeting is primarily a matter of selecting among the com-
plaints the EEOC receives to identify those firms that are most likely to be
found to have discriminatory practices.
There are, however, secondary uses, such as analysis of overall trends in
pay discrimination and trends by industry and location, as well as research
on compensation trends. If such new compensation data become available,
they would be a powerful supplement to existing sources of compensation
data, such as those discussed in Chapters 2 and 3.
Because the data collected by this survey would be so important to
collect correctly, it is incumbent on EEOC to identify the potential uses of
the data early in a design process so that the data items to be collected can
be identified and issues of data quality considered. Again, the requested
comments in the OFCCP ANPRM are instructive when paraphrased in
EEOC terms:
OCR for page 62
62 COLLECTING COMPENSATION DATA FROM EMPLOYERS
• Should the data be used to conduct industrywide compensation
trend analyses? If so, what type of compensation trend analyses
would be appropriate to conduct on an industrywide basis?
• For each type of analysis identified, identify the categories of data
that should be collected in order to compare compensation data
across employers in a particular industry and the job groupings
that should be used.
• Should the data be used to identify employers in specific industries
for industry-focused compensation reviews?
• What specific categories of data would be most useful for identify-
ing employers in specific industries for industry-focused compensa-
tion reviews?
• Should the data be collected by individual establishment for multi-
establishment employers? What specific categories of data would
be most useful for conducting compensation analyses across an
employer’s various establishments?
Utility of the Data Items for Statistical Analysis
In this section we consider how the EEOC could develop a statistical
model for use in screening individual employers for possible violations of
pay discrimination. There are several key considerations here. First, the
data to be used in this model would, of course, be reported by each indi-
vidual employer. In addition to the information already requested for the
EEO-1 report (e.g., employment by occupation, sex, and race/ethnicity),
a form would collect pay (measured as discussed in Chapter 3) and pos-
sibly other information, such as employees’ years of service. Given these
data, one could conduct a multiple regression analysis of pay in relation to
demographic variables (e.g., the EEO-1’s 14 sex and race/ethnicity groups)
and other characteristics, usually called “control variables,” such as occu-
pational category and years of service. More complex models might include
controls for occupation or job categories or more elaborate controls for
education and labor force experience. Still more complex models might
include more detailed occupational or job categories and more elaborate
controls for previous experience and qualifications.
There are a large number of potential control variables that could be
included in such regression models, and, especially for employers with small
numbers of employees, there would be benefits from keeping the number of
covariates in such models relatively small. To do that, there are a variety of
statistics, including Mallows’ Cp, Akaike Information Criterion (AIC), and
Bayesian Information Criterion (BIC), that could be employed to remove
control variables that were not contributing substantially to the fit of the
model.
OCR for page 63
SURVEY DESIGN AND STATISTICAL METHODOLOGY 63
While there is substantial disagreement over the most appropriate
models to use for establishing a reasonable claim of possible wage discrimi-
nation, or defending one, it is not necessary to have a definitive model to
assess the potential quality of certain basic statistical tests that might be
reasonably performed by EEOC. We undertake such an analysis here. We
emphasize that the regression model we describe below is intended, first and
foremost, as an illustrative example of a methodology for undertaking some
of these basic statistical tests. For this purpose, we need to provide enough
specifics to allow a clear and straightforward discussion of the general na-
ture of the issues that would arise in such an exercise.
The regression model we use is a general linear model of the form:
yi = b0 + d i b1 + x i b2 + ei
Here, yi is the logarithm of the wage measure for individual i, di is the
vector of design variables that indicate the EEO-1 categories occupied by
individual i, xi is a vector of control variables, ei is the statistical error, b0 is
the intercept, b1 is the vector of EEO-1 log wage differentials from a speci-
fied reference group (usually white, non-Hispanic males), b2 is the vector
of effects associated with the control variables, and i = 1,...,N, where N is
the total number of employees in the analysis.1
For an agency such as EEOC or OFCCP, the results from this kind of
regression analysis that will be of greatest concern will be the estimates of
the coefficients for gender and race/ethnicity: that is, the betas, because the
estimates of these coefficients indicate the extent (if any) to which women
or nonwhites are paid less than men or whites who are the same in terms
of the other factors (the “control variables”) included in the analysis. It will
be particularly important to perform a test to determine if these coefficients
are statistically significantly different from zero (i.e., are unlikely to have
occurred simply as a result of random or chance factors).
Assuming that design vectors di and xi are statistically exogenous with
respect to ò1 and that ò1 has a normal distribution with zero mean, constant
variance, and independence over individuals, there is a well-known F-test
for the null hypothesis: b1 = 0. This statistic tests the hypothesis that all of
1 he earliest analyses that used the logarithm of wages were Blinder (1973) and Mincer
T
(1974). Their work discussed specifications in the logarithm and levels. Since the early 1970s
the prevailing practice in economics has been to use the logarithm of the rate of pay as the
dependent variable. The regression model has been selected because when analysis is expressed
in logs, pay gaps can be expressed in a comparable way (i.e., as percentages) even for dates that
are wide apart. This also means that estimated coefficients in log regressions can be interpreted
as showing the percentage change in y that occurs as a result of a change in x and when x
is an indicator for race or gender, it measures the percentage difference in pay between the
indicated group relative to a reference group.
OCR for page 64
64 COLLECTING COMPENSATION DATA FROM EMPLOYERS
the EEO-1 log wage differentials are jointly zero versus the alternative that
at least one of the differentials is nonzero. The usual F-statistic is based on
the Type-III sum of squares for the model component associated with the
design vector di: that is, the conditional model sum of squares for di given
the other variables, xi, in the model. This statistic is invariant to the choice
of reference group.
An automated test of the hypothesis b1 = 0 could be conducted from
an enhanced EEO-1 report that included appropriate wage data. The suit-
ability of such a test depends on how likely it is that the test would detect a
departure from b1 = 0 for realistic configurations of employer data and with
appropriate controls. We approach this question by attempting to measure
the power of the standard F-test for b1 = 0 in scenarios that resemble best-
case outcomes for such an automated procedure.
The power of a test is the probability that it will reject the null hypoth-
esis when that hypothesis is false. In other words, the power of a test is the
probability that it will actually find a sex or race/ethnicity difference when
such a difference exists. In colloquial terms, one might say that the power
of a test is the probability that it will detect a potentially discriminating
(“guilty”) party. The power depends on the magnitude of the departure
from the null hypothesis (how big the differentials are) and the precision
with which those differentials can be estimated. In turn, the precision of the
estimate(s) depends critically on the number of data points used in forming
the estimates.
In the present context, it is crucial to note that the power of the statisti-
cal model for screening employers will be sensitive to the number of data
points used in its construction. It is simple common sense that, other things
being equal, a poll of 1,000 people is likely to be much more precise (will
have much greater power) than a poll of 100 people; similarly, regression
estimates of sex or race/ethnicity pay differences that are based on many
data points will have greater power than estimates based on only a few data
points. Finally, note that the number of data points in an analysis of a par-
ticular employer will depend on the size of the employer’s work force: the
greater the number of employees, the greater the number of data points, and
the greater the power of the statistical model used in screening employers.
Thus, when the number of employees is small, any screening model that
EEOC might develop will have very low power, and when the number of
employees is large, the screening model will have high power. The important
question is thus obvious: How many data points must there be—how large
does the employer’s work force have to be—to yield “enough” power?
For general linear models, there is standard software to assist with this
power assessment. The inputs consist of estimates of the magnitude of the
likely discrepancy and summary measures of the estimation precision. We
next describe how we estimated those components.
OCR for page 65
SURVEY DESIGN AND STATISTICAL METHODOLOGY 65
We considered an employer-size power analysis that is based on the
predictions and estimation precision of models fit on the March 2010 Cur-
rent Population Survey (CPS) Annual Social and Economic Supplement.
Essentially, we are asking: “How many employees must a respondent firm
have in order for the F-test to have the specified power to detect log wage
differentials as big as the ones in the overall economy, as measured in
March 2010?” This is a “best-case” scenario for two reasons. First, the dif-
ferentials in the overall economy are larger than those typically found at a
single employer because the heterogeneity in job types between employers is
much greater than the heterogeneity of job types for a given employer. Sec-
ond, because the overall workforce is more heterogeneous than the work-
force of a given employer, most effects are estimated more precisely in the
March CPS than they would be in a sample drawn from a single employer.
Because the CPS data are more heterogeneous than microdata from a
single employer, they permit estimation of models that strongly resemble the
ones that might be used by EEOC to screen EEO-1 reports that included
wage data developed according to either of the two pilots recommended in
this report (see Chapter 6). And because they allow a plausibly “best-case”
power analysis, it is reasonable to consider them before investing heavily
in data that might permit a more precise answer.
To minimize the effects of different definitions of the wage rate, we
selected previous-year wage and salary earners only. The selected individu-
als were full-time employees (at least 35 hours/week) for at least 50 weeks
in 2009 (the reference year for the March 2010 CPS supplement) and were
between the ages of 16 and 75. We coded these individuals into the ap-
propriate gender and race/ethnicity categories corresponding to the EEO-1
form. The design of these log wage differentials has 13 degrees of freedom.
We used the major occupation codes (a taxonomy of 10 occupation groups)
and the detailed occupations (a taxonomy of about 500 categories).2 The
use of 10 major occupation code categories is a reasonable proxy for the
EEO-1 occupations for the purposes of these power studies.
In addition to occupation categories, we also used 16 educational cat-
egories. These were entered as control variables in some analyses and used
in combination with age to create a measure of time since leaving school,
which is called “potential experience.”
Analyses based on the public-use CPS data are necessarily between-
employer estimates, rather than within employer estimates, as any analysis
of EEO-1 wage data would be. We included a control for major industry
(13 categories) to allow the power analyses to be closer to those that a full
2 e chose this approach because a standardized recoding of the CPS occupational codes
W
to EEO-1 categories would have involved about as much measurement error as the error
associated with the coding to major and detailed occupations in the first place.
OCR for page 66
66 COLLECTING COMPENSATION DATA FROM EMPLOYERS
pilot might produce. Model 1 controls for occupation only; Model 2 con-
trols for occupation and covariates; Model 3 controls for detailed occupa-
tions and covariates. Figure 4-1 compares the estimates of the three models.
Model 1, shown in the Table 4-1 below, estimates the EEO-1 differen-
tials within major occupational categories. It corresponds to the test b1 = 0
conditioning on main effects only for the major occupational group. Not
surprisingly, relative to the base group of white non-Hispanic males, all of
the estimated differentials are large. Jointly, the F-test rejects b1 = 0 with
a P-value of less than 0.0001, and individually all of the differentials are
statistically significant at the 0.05 level or higher. The R2 for this equation
is 0.25, and the residual variance is 0.37. These two statistics are also used
in the power analysis.
The first power analysis asks what the minimum employer size would
be in order to detect differentials as large as those in Model 1 and with
employer-specific data that had the same design and explanatory power.
The line labeled “Controls EEO-1 Occupation Only” answers this ques-
tion. All power analyses assume that the basic F-test has size 0.05 at b1 =
0: that is, the probability of rejecting a true null hypothesis is fixed at 0.05
throughout.
A regression analysis of an employer with approximately 99 employees
has power of 0.50: it is equally likely to accept or reject the null hypothesis
b1 = 0 for wage differentials on the magnitude of those in Model 1. Employ-
ment of 200 is needed to boost the power to 0.90, a value that is often used
as the standard for acceptable power.3
Model 2, shown in Table 4-2 computes the EEO-1 log wage differ-
entials with controls for main effects of the major occupation category as
well as main effects of education, major industry, and a quartic in potential
experience. The estimated log wage differentials are much smaller than in
Model 1, although still quite substantial in magnitude. The F-test for the
joint significance is 238.41 with a P-value less than 0.0001. The R2 for this
equation is 0.39, and the residual variance is 0.30. As can be seen in Figure
4-1, an analysis based on 155 employees delivers power of 0.50 in this case,
and an analysis of an employer of size 318 is required for power of 0.90.
Model 3 is shown in Table 4-3 below. In this estimation, we control
for detailed occupation in addition to the covariates that were included
in Model 2. The F-statistic falls to 138.38 but with a P-value that is still
less than 0.0001. Estimated differentials also fall substantially. The R2 for
3 ll model estimation was conducted in SAS (statistical analysis software) version 9.3 using
A
PROC GLM. All power analysis was conducted in SAS version 9.3 PROC GLMPOWER.
The design matrices, estimated subgroup means, and regression summary statistics used in the
power analysis were computed from the March CPS data in the statistical summaries shown
in all three of our models.
OCR for page 67
700
600
500
400
300
200
100
Minimum Employer Size (full-time equivalent employees)
0
0.475 0.500 0.525 0.550 0.575 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000
Power
(probability of correctly rejecting joint “no differences” hypothesis when the alternative is March 2010 CPS differentials)
Controls EEO-1 OccupaƟon Only Controls EEO-1 OccupaƟon and Covariates
Controls Detailed OccupaƟon and Covariates
FIGURE 4-1 Comparisons of analytic power and employer size for selected EEO-1 wage reports, three models.
NOTE: See Tables 4-1, 4-2, and 4-3 and text discussion of these models.
67
SOURCE: Analysis by panel using Current Population Survey data.
Figure 4-1
Broadside
OCR for page 68
68 COLLECTING COMPENSATION DATA FROM EMPLOYERS
TABLE 4-1 Base Model for Estimating EEO-1 Log Wage Differentials
(Current Population Survey, March Supplement 2010)
Model 1
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept [base is white 10.57427 0.010841 975.37 <.0001
(only) non-Hispanic
male]
Hispanic male –0.31926 0.009651 –33.08 <.0001
Hispanic female –0.53986 0.011632 –46.41 <.0001
White (only) non-Hispanic –0.35903 0.006372 –56.35 <.0001
female
Black or African American –0.24208 0.011809 –20.5 <.0001
(only) non-Hispanic male
Black or African American –0.46951 0.011104 –42.28 <.0001
(only) non-Hispanic
female
Native Hawaiian Islander –0.15631 0.072491 –2.16 0.0311
or Other Pacific Islander
(only) male
Native Hawaiian Islander –0.36209 0.07278 –4.98 <.0001
or Other Pacific Islander
(only) female
Asian (only) male –0.03405 0.015258 –2.23 0.0257
Asian (only) female –0.23185 0.017217 –13.47 <.0001
American Indian or Alaska –0.18747 0.04766 –3.93 <.0001
Native (only) male
American Indian or Alaska –0.61771 0.046857 –13.18 <.0001
Native (only) female
Two or more races male –0.13671 0.034945 –3.91 <.0001
Two or more races female –0.38639 0.037565 –10.29 <.0001
DF Model DF Error F Value Pr > F
EEO-1 differentials 13 62001 410.19 <.0001
NOTE: Controls for major occupation only (10 categories).
SOURCE: Analysis by panel using Current Population Survey data.
OCR for page 69
SURVEY DESIGN AND STATISTICAL METHODOLOGY 69
TABLE 4-2 Model for Estimating EEO-1 Log Wage Differentials
Controlling for Education, Major Industry, and Potential Experience
(Current Population Survey, March Supplement 2010)
Model 2
Standard
Parameter Estimate Error t Value Pr > |t|
Intercept [base is white 10.76643118 0.02673094 402.77 <.0001
(only) non-Hispanic
male]
Hispanic male –0.14657715 0.00912375 –16.07 <.0001
Hispanic female –0.35794272 0.01073877 –33.33 <.0001
White (only) non-Hispanic –0.27917604 0.00593934 –47 <.0001
female
Black or African American –0.18823432 0.01062293 –17.72 <.0001
(only) non-Hispanic male
Black or African American –0.36062882 0.01018799 –35.4 <.0001
(only) non-Hispanic
female
Native Hawaiian Islander –0.08203944 0.06499172 –1.26 0.2068
or Other Pacific Islander
(only) male
Native Hawaiian Islander –0.30479654 0.06524491 –4.67 <.0001
or Other Pacific Islander
(only) female
Asian (only) male –0.08434892 0.01375719 –6.13 <.0001
Asian (only) female –0.20779203 0.01551098 –13.4 <.0001
American Indian or Alaska –0.12243174 0.04276004 –2.86 0.0042
Native (only) male
American Indian or Alaska –0.4567789 0.04207129 –10.86 <.0001
Native (only) female
Two or more races male –0.0878382 0.03133437 –2.8 0.0051
Two or more races female –0.27586943 0.03371475 –8.18 <.0001
DF Model DF Error F Value Pr > F
EEO-1 differentials 13 61970 238.41 <.0001
NOTE: Controls for major occupation (10 categories), education (16 categories), major
i
ndustry (13 categories), and potential experience (quartic).
SOURCE: Analysis by panel using Current Population Survey data.
OCR for page 70
70 COLLECTING COMPENSATION DATA FROM EMPLOYERS
TABLE 4-3 Model for Estimating Detailed Occupational Log Wage
Differentials Controlling for Education, Major Industry, and Potential
Experience (Current Population Survey, March Supplement 2010)
Model 3
Standard t Value
Parameter Estimate Error Pr > |t|
Intercept [base is white 10.86089272 0.10171189 106.78 <.0001
(only) non-Hispanic
male]
Hispanic male –0.10250055 0.00874789 –11.72 <.0001
Hispanic female –0.26943452 0.0104887 –25.69 <.0001
White (only) non-Hispanic –0.22408845 0.00603451 –37.13 <.0001
female
Black or African American –0.12759486 0.01016768 –12.55 <.0001
(only) non-Hispanic
male
Black or African American –0.27720803 0.00998411 –27.76 <.0001
(only) non-Hispanic
female
Native Hawaiian Islander –0.06154783 0.06152922 –1 0.3172
or Other Pacific Islander
(only) male
Native Hawaiian Islander –0.22666618 0.06175392 –3.67 0.0002
or Other Pacific Islander
(only) female
Asian (only) male –0.07824855 0.01319245 –5.93 <.0001
Asian (only) female –0.17077671 0.01501089 –11.38 <.0001
American Indian or Alaska –0.10341199 0.04060558 –2.55 0.0109
Native (only) male
American Indian or Alaska –0.37084016 0.0398942 –9.3 <.0001
Native (only) female
Two or more races male –0.08577552 0.02967712 –2.89 0.0039
Two or more races female –0.22016021 0.03197085 –6.89 <.0001
DF Model DF Error F Value Pr > F
EEO-1 differentials 13 61483 138.38 <.0001
NOTE: Controls for detailed occupation (497 categories), education (16 categories), major
industry (13 categories), and potential experience (quartic).
SOURCE: Analysis by panel using Current Population Survey data.
OCR for page 71
SURVEY DESIGN AND STATISTICAL METHODOLOGY 71
this equation is 0.47, and the residual variance is 0.26. As can be seen in
Figure 4-1, 545 employees are required for a power of 0.50 in this case,
while about the same sample size (551 employees) yields a power of 0.90.
The power curve for this model is flat because there are 496 degrees of
freedom for the detailed occupation controls. Once there are adequate data
to fit this model, about 50 additional observations are needed to achieve
the target power for the EEO race and gender test.
MINIMIZATION OF REPORTING BURDEN
Estimation of Burden
One reason for the outcry on the part of the business community when
the Paycheck Fairness Act was under consideration in Congress was the
perception that the legislation would impose a significant new reporting
burden on employers, particularly on small employers. The Paperwork
Reduction Act of 1995 specifically requires agencies to demonstrate the
practical utility of the information that they propose to collect and to bal-
ance this against the burden imposed on the public.
EEOC currently calculates the cost and burden of its data collections
in its submissions of Information Collection Requests to the U.S. Office of
Management and Budget (OMB). The number of respondents (including
multi-establishment respondents), responses (usually at the establishment
level), estimated burden hours, costs, and mode of collection for the four
major EEO data collections in the most recent reports of EEOC to OMB
are shown in Table 4-4.
The estimates of burden costs and hours in Table 4-4 are based on the
EEOC’s best estimates of the amount of time it takes for clerks to retrieve
and enter the data to paper records. However, because less than one-fourth
of employers who report now file paper records, the burden estimates may
be overstated.
Options for Minimizing Response Burden
To the extent that the current burdens data are representative, the ad-
dition of earnings data to the existing EEOC data collection forms that
do not now collect the data, in much the same manner in which earnings
data are collected in the EEO-4 form, could be expected to nearly double
the current burden on employers. In the case of the largest collection, the
current average of 3.5 hours per EE0-1 form might increase to somewhere
near the average of 6.6 hours now reported for the EE0-4 form. This is not
an inconsequential increase in response burden. It would behoove EEOC to
consider taking steps to reduce the increase in response burden.
OCR for page 72
72
TABLE 4-4 Estimated Cost and Burden of EEOC Data Collections
Estimated Percent Electronic
Form Frequency Respondents Responses Burden Hours Estimated Cost Reported
EEO-1 Annual 45,000 170,000 599,000 $11,400,000 80
EEO-3 Biannual 1,399 1,399 2,098 85,000 79
EEO-4 Biannual 6,018 6,018 40,000 700,000 76
EEO-5 Biannual 1,135 1,135 10,000 190,000 58
SOURCE: Data from EEOC Form 83-I submissions to OMB.
OCR for page 73
SURVEY DESIGN AND STATISTICAL METHODOLOGY 73
Several options are available for reducing the burden on reporters.
Three are discussed in this section—less frequent data collection, use of a
rotating scheme for certain employer size classes, and raising the size cutoff
so that fewer employers would be in the scope of the collection.
Less Frequent Collection
The EEO-1 report is now collected annually, while the other forms are
collected on a biannual basis. The main issue is with the EEO-1 form. The
law does not require the annual collection of EEO-1 data. The timing of
collection is an administratively imposed requirement. By administratively
reducing the frequency of data collection, the burden might also be reduced,
though the extent to which it might be reduced is not entirely clear.
On the negative side, the less frequent availability of the reports would
mean that the information that supports EEOC enforcement functions
would be less current, by a year or so. This lag could be an important issue
during economic turning points, when hiring or layoffs could significantly
influence the employment and earnings profiles of covered firms. The time
lag for EEOC’s investigations of potential discrimination would increase,
and the ability of the agency to be responsive to complaints in a timely
manner would be negatively affected.
Rotating Sample
It might be possible to continue to collect data annually but from
only a part of the current reporting population and to permit firms with
certain characteristics, such as not meeting a threshold size or in a selected
industry group, to report less frequently. The selection of annual versus
biannual reporters could, for example, be based on an analysis by EEOC
of the probability of discrimination based, in turn, on the experience of the
agency with enforcement. This tailored approach to selection of those firms
that could report less frequently, however, would be hard to administer and
could well be difficult to implement fairly in practice.
Moreover, this nuanced approach might actually complicate matters
for employers. Because so many firms automate their reporting, it is now
a routine matter, and rotating the reporting requirement might actually
increase the administrative burdens. Employers would need to figure out
when they needed to report, and the task of developing a database to cap-
ture the reports might be much more burdensome for EEOC.
OCR for page 74
74 COLLECTING COMPENSATION DATA FROM EMPLOYERS
Raising the Size Cutoff
The current employment cutoff for the annual requirement to submit
an EEO-1 form is 100 employees (50 employees if the firm is a federal
government contractor). This cutoff limits the overall potential response
burden significantly. By raising the size cutoff to, say, 200 employees (based
on the statistical power analysis presented above), the number of firms that
would have to report earnings would be reduced by half, but the employ-
ment coverage would be reduced by less than 10 percent (see Table 1-1, in
Chapter 1). One consequence of raising the cutoff size would be a relative
reduction in coverage of the earnings of females and minorities. The firms
in the size classes for which the reporting requirement would be eliminated
are those in which women and minorities are more heavily represented.
Experiments with different cutoff sizes to better determine the tradeoffs
between burden and coverage could be useful to include in the pilot study
that the panel recommends (see Chapter 6).
HUMAN RESOURCE AND PAYROLL SYSTEMS
Most companies of the size covered by EEO regulations have at least
somewhat automated payroll and human resource management systems.
Today, larger companies are more able to comply with a potential require-
ment for compensation data by gender, race, and national origin because
they can gather compensation information from automated payroll systems
and demographic data from automated human resource systems.
The panel reviewed the state of automation of company payroll systems
from the perspective of three service providers—a large payroll-providing
service firm, a firm that specializes in the emerging software-as-a-service
(SaaS) market, and a firm that specializes in using companies’ own internal
data to analyze EEO status and prepare Affirmative Action Plans for those
companies. In summary, we found that automated systems were expanding
rapidly among U.S. employers, but that there are differences in the extent
of implementing these applications by size of firm.
Currently, larger firms are likely to have human resource and payroll
management systems, and they are likely to have an easier time in comply-
ing with a new requirement to provide compensation data by demographic
characteristics than would smaller firms. Over time, one would expect that
the use of such systems will grow and spread among smaller firms. In the
long term, these automated systems may well serve as the basis for EEOC
employment and wage data collection. As discussed in Chapter 6, the panel
recommends a pilot test to collect information on the extent of penetration
of these human resource and automated systems: see Appendix C.
OCR for page 75
SURVEY DESIGN AND STATISTICAL METHODOLOGY 75
Payroll and Human Resource Providers
The industry of payroll and human resource providers is character-
ized by a growth in services beyond the usual provision of timekeeping
and payroll functions. Most recently, the industry has expanded to include
human resource management. As a result, one provider can bring together
information on hours, earnings, and the demographics and work histories
of the workforce. These data are captured directly from a client’s data sys-
tems, often without client intervention.
The panel interviewed a large payroll-providing company to determine
the influence of the growth of this sector on the reporting of earnings data
to EEOC. This company lists 600,000 clients, representing, in the com-
pany’s estimation, one of every six U.S. employees. The clients employ as
few as 1 and as many as 1 million employees.
The company has a line of business that focuses on smaller employers—
those with fewer than 100 employees—to provide a total source of payroll
and human resource services. The company estimates that about 40 percent
of these smaller employers use human resource services as well as payroll
services. One product for the clients who use human resource services and
who have an OFCCP or EEOC requirement is to produce EEO-1 reports.
Growth of Software-as-a-Service Applications
The workshop presentation by Karen Minicozzi of Workday, Inc.,
representing an enterprise software solution, highlighted the unified human
capital management solutions offered by the enterprise software and ser-
vices provider, Workday, Inc. The company is one of a growing number of
firms that provide turnkey payroll and human resource management solu-
tions to businesses under the general label of SaaS. The solutions provide a
new, global core system of record to replace legacy systems that have been
maintained by the establishments themselves. The approach taken by these
service providers is through a multitenant architecture: that is, one version
of the application with common hardware, networking, and operating sys-
tems is used for all customers (“tenants”). The applications are often sup-
ported in the “cloud,” that is, through Internet connectivity. The fact that
these new service approaches have so much in common allows the genera-
tion of common reports (such as EEO reports) across the system, drawing
on data from both the human resource and payroll functions of the serviced
companies. Most of the companies that use this service are mid-size, large,
and very large companies. Workday, Inc. has 246 customers.
These SaaS providers have been enjoying remarkable growth. An annual
survey of employing establishments by the consulting firm CedarCrestone,
OCR for page 76
76 COLLECTING COMPENSATION DATA FROM EMPLOYERS
to ascertain the penetration of human resource applications in business,
found them to be widespread, and it forecast SaaS as a deployment op-
tion will likely continue that growth as organizations move from licensed
on-premise solutions to the cloud. The source of this information is the
CedarCrestone 2010–2011 HR Systems Survey. The survey is based on
1,289 responses, representing employers of over 20 million employees
(CedarCrestone, 2011). The survey also found that there were measurable
differences in the penetration of these administrative applications by size
of firm. In the most recent survey, 94 percent of employers with 10,000 or
more employees had such systems, compared with 87 percent for employers
with 250 to 2,499 employees. The CedarCrestone survey found that most
of the applications were still licensed software, but the subscription-based
SaaS applications and outsourcing solutions were growing in use.
Analysis of Salary and Related Data for Pay Equity Purposes
In order to ensure that their firms are in compliance with the Equal
Pay Act, Title VII, and Executive Order 11246 provisions, many employers
use firms that perform compensation analysis and, in many cases, actually
prepare automated Affirmative Action Plans. Other firms use software to
support this analysis internally.
The panel heard testimony from Liz Balconi and Michele Whitehead,
representatives of Berkshire Associates, a company that is very active in the
compensation analysis business. This company obtains the following infor-
mation from its client firms: employee identifier; job code; race; gender; date
of hire; annualized base salary or hourly rate; grade, band, or classification (if
applicable); time in current position, or date of last title change; date of last
degree earned, or date of birth; full time or part time status; exempt or non-
exempt status; title; employee location; years of relevant experience (or date
of birth); factors that may legitimately impact pay in an organization, such as
performance rating; education; date in grade; professional certifications; divi-
sion; job group; starting salary; and annualized total compensation (including
bonuses, commissions, cost of living allowances, and overtime).
The firm uses these data (which are generally available from their
clients) to conduct two kinds of analyses: cohort analysis, which is a
nonstatistical comparison of similarly situated incumbents within a group
based on factors such as time in the company, educational background, and
performance assessment; and statistical (regression) analysis to study the
combined effect of factors on pay between comparator groups. Although
not all of these data elements may be necessary to identify potentially
discriminatory practices, prudent employers can be expected to have these
types of data available and to use them to evaluate their own practices,
using algorithms developed by specialty firms such as Berkshire Associates.