Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 Survey Design and Statistical Methodology When considering the collection of earnings data by gender, race, and national origin, the U.S. Equal Employment Opportunity Commission (EEOC) confronts several key decisions in the realm of survey design and statistical methodology. The decisions involve four closely associated is- sues: collectability, quality (defined as fitness for use), utility for statistical analysis, and response burden. In this chapter we discuss the pros and cons of options for collecting earnings data from employers by adding items to existing equal employ- ment opportunity (EEO) forms or developing a new collection instrument. We consider the fitness for use of the data, which addresses the relevance of the data to usersâ needs. We illustrate a model-based approach to identi- fying the utility of the categorical variables that would also be collected if wage data are collected. We address the question of employer burden and assess various options for minimizing the burden on reporting units. The last issue is complicated by the fact that there is a differential burden faced by employers of different sizes and with different levels of sophistication in their human resource and payroll systems. In the case of collection of earnings data by gender, race, and national origin, one approach may not be appropriate for all respondents. 59
60 COLLECTING COMPENSATION DATA FROM EMPLOYERS OPTIONS FOR DATA COLLECTION Modify Current EEO Forms The most direct solution to obtaining earnings information for EEOC purposes would be to add earnings items to existing EEO reports. The collection instrument that it would likely make most sense to modify for this purpose would be the EEO-1 form, for several reasons. First, it enjoys substantial coverage. As discussed in Chapter 1, the mandatory EEO-1 re- ports in 2010 covered about 67,000 private-sector respondents, with about 55 million employees. Second, the form is part of the everyday operations of the antidiscrimi- nation agencies. The EEO-1 reports are used by the EEOC and the Office of Federal Contract Compliance Programs (OFCCP) to trigger enforcement and technical assistance based on the identification of potential EEO prob- lems, which is determined from data provided by employers on the reports. Third, it is expected that the necessary modifications to the EEO-1 form would be quite manageable for both EEOC and the respondents. The addition of the earnings data could be accomplished in much the same way that earnings data are collected on the EEO-4 form: that is, either by adding another column to the form that requests the earnings data or adding another row for each occupation, which would collect average pay in addition to the current row that collects number of employees by race/ ethnicity group. An alternate collection design would be to simply duplicate the existing EEO-1 form and have employers place in the cells of one table the number of employees, as they now do, and in the second table enter the pay corresponding to those employees. Design a New Collection Instrument A second option would be to design a new and, one hopes, a more streamlined collection instrument that would collect both employment and earnings information. The design of such a new instrument could be informed by the current effort by OFCCP to develop a collection instru- ment to replace the defunct Equal Opportunity Pilot Survey discussed in Chapter 2. As this report was being prepared, the OFCCP had issued an Ad- vance Notice of Proposed Rulemaking (ANPRM) that solicited comments Â on several issues important for designing a new collection instrument. For example, OFCCP asks whether expanded information should be collected in order for OFCCP to assess whether further investigation into a conÂ tractorâs compensation decisions and policies is warranted. To collect such data as average starting or initial total compensation (including paid leave, health and retirement benefits, etc.); average pay raises; average bonuses;
SURVEY DESIGN AND STATISTICAL METHODOLOGY 61 minimum and maximum salary; standard deviation or variance of salary; the number of workers in each gender and race/ethnicity category; average tenure; and average compensation data by job series (e.g., all engineers within a particular department or all secretaries throughout the establish- ment) would require a substantial redesign of the collection form. Some of the items that might be useful in understanding the EEO en- vironment in establishments would likely require open-ended questions, such as on topics suggested in the OFCCP ANPRM pertaining to company policies related to promotion decisions, bonuses, shift pay, and setting of initial pay. This information is difficult to collect and to process efficiently in a standardized manner. FITNESS FOR USE Types of Uses Quality of information is generally defined in terms of its fitness for use. This is a multidimensional concept embracing the relevance of the information to usersâ needs and the accuracy, timeliness, accessibility, in- terpretability, and coherence that affect how the data can be used. There is a considerable literature on statistical quality and the steps that should be taken to make data useful for its intended purpose (see, e.g., Brackstone, 1999; U.S. Office of Management and Budget, 2002). The literature high- lights the importance of clearly understanding the requirements for the data before collection begins. It is important in this context to consider the need of the EEOC for earnings information. The major use of the EEO compensation data would be to aid enforce- ment of pay discrimination statutes in two ways: targeting enforcement actions and carrying out enforcement actions against an employer that has been targeted. Targeting is primarily a matter of selecting among the com- plaints the EEOC receives to identify those firms that are most likely to be found to have discriminatory practices. There are, however, secondary uses, such as analysis of overall trends in pay discrimination and trends by industry and location, as well as research on compensation trends. If such new compensation data become available, they would be a powerful supplement to existing sources of compensation data, such as those discussed in Chapters 2 and 3. Because the data collected by this survey would be so important to collect correctly, it is incumbent on EEOC to identify the potential uses of the data early in a design process so that the data items to be collected can be identified and issues of data quality considered. Again, the requested comments in the OFCCP ANPRM are instructive when paraphrased in EEOC terms:
62 COLLECTING COMPENSATION DATA FROM EMPLOYERS â¢ Should the data be used to conduct industrywide compensation trend analyses? If so, what type of compensation trend analyses would be appropriate to conduct on an industrywide basis? â¢ For each type of analysis identified, identify the categories of data that should be collected in order to compare compensation data across employers in a particular industry and the job groupings that should be used. â¢ Should the data be used to identify employers in specific industries for industry-focused compensation reviews? â¢ What specific categories of data would be most useful for identify- ing employers in specific industries for industry-focused compensa- tion reviews? â¢ Should the data be collected by individual establishment for multi- establishment employers? What specific categories of data would be most useful for conducting compensation analyses across an employerâs various establishments? Utility of the Data Items for Statistical Analysis In this section we consider how the EEOC could develop a statistical model for use in screening individual employers for possible violations of pay discrimination. There are several key considerations here. First, the data to be used in this model would, of course, be reported by each indi- vidual employer. In addition to the information already requested for the EEO-1 report (e.g., employment by occupation, sex, and race/ethnicity), a form would collect pay (measured as discussed in Chapter 3) and pos- sibly other information, such as employeesâ years of service. Given these data, one could conduct a multiple regression analysis of pay in relation to demographic variables (e.g., the EEO-1âs 14 sex and race/ethnicity groups) and other characteristics, usually called âcontrol variables,â such as occu- pational category and years of service. More complex models might include controls for occupation or job categories or more elaborate controls for education and labor force experience. Still more complex models might include more detailed occupational or job categories and more elaborate controls for previous experience and qualifications. There are a large number of potential control variables that could be included in such regression models, and, especially for employers with small numbers of employees, there would be benefits from keeping the number of covariates in such models relatively small. To do that, there are a variety of statistics, including Mallowsâ Cp, Akaike Information Criterion (AIC), and Bayesian Information Criterion (BIC), that could be employed to remove control variables that were not contributing substantially to the fit of the model.
SURVEY DESIGN AND STATISTICAL METHODOLOGY 63 While there is substantial disagreement over the most appropriate models to use for establishing a reasonable claim of possible wage discrimi- nation, or defending one, it is not necessary to have a definitive model to assess the potential quality of certain basic statistical tests that might be reasonably performed by EEOC. We undertake such an analysis here. We emphasize that the regression model we describe below is intended, first and foremost, as an illustrative example of a methodology for undertaking some of these basic statistical tests. For this purpose, we need to provide enough specifics to allow a clear and straightforward discussion of the general na- ture of the issues that would arise in such an exercise. The regression model we use is a general linear model of the form: yi = b0 + d i b1 + x i b2 + ei Here, yi is the logarithm of the wage measure for individual i, di is the vector of design variables that indicate the EEO-1 categories occupied by individual i, xi is a vector of control variables, ei is the statistical error, b0 is the intercept, b1 is the vector of EEO-1 log wage differentials from a speci- fied reference group (usually white, non-Hispanic males), b2 is the vector of effects associated with the control variables, and i = 1,...,N, where N is the total number of employees in the analysis.1 For an agency such as EEOC or OFCCP, the results from this kind of regression analysis that will be of greatest concern will be the estimates of the coefficients for gender and race/ethnicity: that is, the betas, because the estimates of these coefficients indicate the extent (if any) to which women or nonwhites are paid less than men or whites who are the same in terms of the other factors (the âcontrol variablesâ) included in the analysis. It will be particularly important to perform a test to determine if these coefficients are statistically significantly different from zero (i.e., are unlikely to have occurred simply as a result of random or chance factors). Assuming that design vectors di and xi are statistically exogenous with respect to Ã²1 and that Ã²1 has a normal distribution with zero mean, constant variance, and independence over individuals, there is a well-known F-test for the null hypothesis: b1 = 0. This statistic tests the hypothesis that all of 1â he earliest analyses that used the logarithm of wages were Blinder (1973) and Mincer T (1974). Their work discussed specifications in the logarithm and levels. Since the early 1970s the prevailing practice in economics has been to use the logarithm of the rate of pay as the dependent variable. The regression model has been selected because when analysis is expressed in logs, pay gaps can be expressed in a comparable way (i.e., as percentages) even for dates that are wide apart. This also means that estimated coefficients in log regressions can be interpreted as showing theÂ percentage change in y that occurs as a result of a change in x and when x is an indicator for race or gender, it measures the percentage difference in pay between the indicated group relative to a reference group.Â
64 COLLECTING COMPENSATION DATA FROM EMPLOYERS the EEO-1 log wage differentials are jointly zero versus the alternative that at least one of the differentials is nonzero. The usual F-statistic is based on the Type-III sum of squares for the model component associated with the design vector di: that is, the conditional model sum of squares for di given the other variables, xi, in the model. This statistic is invariant to the choice of reference group. An automated test of the hypothesis b1 = 0 could be conducted from an enhanced EEO-1 report that included appropriate wage data. The suit- ability of such a test depends on how likely it is that the test would detect a departure from b1 = 0 for realistic configurations of employer data and with appropriate controls. We approach this question by attempting to measure the power of the standard F-test for b1 = 0 in scenarios that resemble best- case outcomes for such an automated procedure. The power of a test is the probability that it will reject the null hypoth- esis when that hypothesis is false. In other words, the power of a test is the probability that it will actually find a sex or race/ethnicity difference when such a difference exists. In colloquial terms, one might say that the power of a test is the probability that it will detect a potentially discriminating (âguiltyâ) party. The power depends on the magnitude of the departure from the null hypothesis (how big the differentials are) and the precision with which those differentials can be estimated. In turn, the precision of the estimate(s) depends critically on the number of data points used in forming the estimates. In the present context, it is crucial to note that the power of the statisti- cal model for screening employers will be sensitive to the number of data points used in its construction. It is simple common sense that, other things being equal, a poll of 1,000 people is likely to be much more precise (will have much greater power) than a poll of 100 people; similarly, regression estimates of sex or race/ethnicity pay differences that are based on many data points will have greater power than estimates based on only a few data points. Finally, note that the number of data points in an analysis of a par- ticular employer will depend on the size of the employerâs work force: the greater the number of employees, the greater the number of data points, and the greater the power of the statistical model used in screening employers. Â Thus, when the number of employees is small, any screening model that EEOC might develop will have very low power, and when the number of employees is large, the screening model will have high power. The important question is thus obvious: How many data points must there beâhow large does the employerâs work force have to beâto yield âenoughâ power? For general linear models, there is standard software to assist with this power assessment. The inputs consist of estimates of the magnitude of the likely discrepancy and summary measures of the estimation precision. We next describe how we estimated those components.
SURVEY DESIGN AND STATISTICAL METHODOLOGY 65 We considered an employer-size power analysis that is based on the predictions and estimation precision of models fit on the March 2010 Cur- rent Population Survey (CPS) Annual Social and Economic Supplement. Essentially, we are asking: âHow many employees must a respondent firm have in order for the F-test to have the specified power to detect log wage differentials as big as the ones in the overall economy, as measured in March 2010?â This is a âbest-caseâ scenario for two reasons. First, the dif- ferentials in the overall economy are larger than those typically found at a single employer because the heterogeneity in job types between employers is much greater than the heterogeneity of job types for a given employer. Sec- ond, because the overall workforce is more heterogeneous than the work- force of a given employer, most effects are estimated more precisely in the March CPS than they would be in a sample drawn from a single employer. Because the CPS data are more heterogeneous than microdata from a single employer, they permit estimation of models that strongly resemble the ones that might be used by EEOC to screen EEO-1 reports that included wage data developed according to either of the two pilots recommended in this report (see Chapter 6). And because they allow a plausibly âbest-caseâ power analysis, it is reasonable to consider them before investing heavily in data that might permit a more precise answer. To minimize the effects of different definitions of the wage rate, we selected previous-year wage and salary earners only. The selected individu- als were full-time employees (at least 35 hours/week) for at least 50 weeks in 2009 (the reference year for the March 2010 CPS supplement) and were between the ages of 16 and 75. We coded these individuals into the ap- propriate gender and race/ethnicity categories corresponding to the EEO-1 form. The design of these log wage differentials has 13 degrees of freedom. We used the major occupation codes (a taxonomy of 10 occupation groups) and the detailed occupations (a taxonomy of about 500 categories).2 The use of 10 major occupation code categories is a reasonable proxy for the EEO-1 occupations for the purposes of these power studies. In addition to occupation categories, we also used 16 educational cat- egories. These were entered as control variables in some analyses and used in combination with age to create a measure of time since leaving school, which is called âpotential experience.â Analyses based on the public-use CPS data are necessarily between- employer estimates, rather than within employer estimates, as any analysis of EEO-1 wage data would be. We included a control for major industry (13 categories) to allow the power analyses to be closer to those that a full 2â e chose this approach because a standardized recoding of the CPS occupational codes W to EEO-1 categories would have involved about as much measurement error as the error associated with the coding to major and detailed occupations in the first place.
66 COLLECTING COMPENSATION DATA FROM EMPLOYERS pilot might produce. Model 1 controls for occupation only; Model 2 con- trols for occupation and covariates; Model 3 controls for detailed occupa- tions and covariates. Figure 4-1 compares the estimates of the three models. Model 1, shown in the Table 4-1 below, estimates the EEO-1 differen- tials within major occupational categories. It corresponds to the test b1 = 0 conditioning on main effects only for the major occupational group. Not surprisingly, relative to the base group of white non-Hispanic males, all of the estimated differentials are large. Jointly, the F-test rejects b1 = 0 with a P-value of less than 0.0001, and individually all of the differentials are statistically significant at the 0.05 level or higher. The R2 for this equation is 0.25, and the residual variance is 0.37. These two statistics are also used in the power analysis. The first power analysis asks what the minimum employer size would be in order to detect differentials as large as those in Model 1 and with employer-specific data that had the same design and explanatory power. The line labeled âControls EEO-1 Occupation Onlyâ answers this ques- tion. All power analyses assume that the basic F-test has size 0.05 at b1 = 0: that is, the probability of rejecting a true null hypothesis is fixed at 0.05 throughout. A regression analysis of an employer with approximately 99 employees has power of 0.50: it is equally likely to accept or reject the null hypothesis b1 = 0 for wage differentials on the magnitude of those in Model 1. Employ- ment of 200 is needed to boost the power to 0.90, a value that is often used as the standard for acceptable power.3 Model 2, shown in Table 4-2 computes the EEO-1 log wage differ- entials with controls for main effects of the major occupation category as well as main effects of education, major industry, and a quartic in potential experience. The estimated log wage differentials are much smaller than in Model 1, although still quite substantial in magnitude. The F-test for the joint significance is 238.41 with a P-value less than 0.0001. The R2 for this equation is 0.39, and the residual variance is 0.30. As can be seen in Figure 4-1, an analysis based on 155 employees delivers power of 0.50 in this case, and an analysis of an employer of size 318 is required for power of 0.90. Model 3 is shown in Table 4-3 below. In this estimation, we control for detailed occupation in addition to the covariates that were included in Model 2. The F-statistic falls to 138.38 but with a P-value that is still less than 0.0001. Estimated differentials also fall substantially. The R2 for 3â ll model estimation was conducted in SAS (statistical analysis software) version 9.3 using A PROC GLM. All power analysis was conducted in SAS version 9.3 PROC GLMPOWER. The design matrices, estimated subgroup means, and regression summary statistics used in the power analysis were computed from the March CPS data in the statistical summaries shown in all three of our models.
700 600 500 400 300 200 100 Minimum Employer Size (full-time equivalent employees) 0 0.475 0.500 0.525 0.550 0.575 0.600 0.625 0.650 0.675 0.700 0.725 0.750 0.775 0.800 0.825 0.850 0.875 0.900 0.925 0.950 0.975 1.000 Power (probability of correctly rejecting joint âno differencesâ hypothesis when the alternative is March 2010 CPS differentials) Controls EEO-1 OccupaÆon Only Controls EEO-1 OccupaÆon and Covariates Controls Detailed OccupaÆon and Covariates FIGURE 4-1â Comparisons of analytic power and employer size for selected EEO-1 wage reports, three models. NOTE: See Tables 4-1, 4-2, and 4-3 and text discussion of these models. 67 SOURCE: Analysis by panel using Current Population Survey data. Figure 4-1 Broadside
68 COLLECTING COMPENSATION DATA FROM EMPLOYERS TABLE 4-1â Base Model for Estimating EEO-1 Log Wage Differentials (Current Population Survey, March Supplement 2010) Model 1 Standard Parameter Estimate Error t Value Pr > |t| Intercept [base is white 10.57427 0.010841 975.37 <.0001 (only) non-Hispanic male] Hispanic male â0.31926 0.009651 â33.08 <.0001 Hispanic female â0.53986 0.011632 â46.41 <.0001 White (only) non-Hispanic â0.35903 0.006372 â56.35 <.0001 female Black or African American â0.24208 0.011809 â20.5 <.0001 (only) non-Hispanic male Black or African American â0.46951 0.011104 â42.28 <.0001 (only) non-Hispanic female Native Hawaiian Islander â0.15631 0.072491 â2.16 0.0311 or Other Pacific Islander (only) male Native Hawaiian Islander â0.36209 0.07278 â4.98 <.0001 or Other Pacific Islander (only) female Asian (only) male â0.03405 0.015258 â2.23 0.0257 Asian (only) female â0.23185 0.017217 â13.47 <.0001 American Indian or Alaska â0.18747 0.04766 â3.93 <.0001 Native (only) male American Indian or Alaska â0.61771 0.046857 â13.18 <.0001 Native (only) female Two or more races male â0.13671 0.034945 â3.91 <.0001 Two or more races female â0.38639 0.037565 â10.29 <.0001 Â Â Â DF Model DF Error F Value Pr > F EEO-1 differentials 13 62001 410.19 <.0001 NOTE: Controls for major occupation only (10 categories). SOURCE: Analysis by panel using Current Population Survey data.
SURVEY DESIGN AND STATISTICAL METHODOLOGY 69 TABLE 4-2 Model for Estimating EEO-1 Log Wage Differentials Controlling for Education, Major Industry, and Potential Experience (Current Population Survey, March Supplement 2010) Model 2 Standard Parameter Estimate Error t Value Pr > |t| Intercept [base is white 10.76643118 0.02673094 402.77 <.0001 (only) non-Hispanic male] Hispanic male â0.14657715 0.00912375 â16.07 <.0001 Hispanic female â0.35794272 0.01073877 â33.33 <.0001 White (only) non-Hispanic â0.27917604 0.00593934 â47 <.0001 female Black or African American â0.18823432 0.01062293 â17.72 <.0001 (only) non-Hispanic male Black or African American â0.36062882 0.01018799 â35.4 <.0001 (only) non-Hispanic female Native Hawaiian Islander â0.08203944 0.06499172 â1.26 0.2068 or Other Pacific Islander (only) male Native Hawaiian Islander â0.30479654 0.06524491 â4.67 <.0001 or Other Pacific Islander (only) female Asian (only) male â0.08434892 0.01375719 â6.13 <.0001 Asian (only) female â0.20779203 0.01551098 â13.4 <.0001 American Indian or Alaska â0.12243174 0.04276004 â2.86 0.0042 Native (only) male American Indian or Alaska â0.4567789 0.04207129 â10.86 <.0001 Native (only) female Two or more races male â0.0878382 0.03133437 â2.8 0.0051 Two or more races female â0.27586943 0.03371475 â8.18 <.0001 DF Model DF Error F Value Pr > F EEO-1 differentials 13 61970 238.41 <.0001 NOTE: Controls for major occupation (10 categories), education (16 categories), major i Ândustry (13 categories), and potential experience (quartic). SOURCE: Analysis by panel using Current Population Survey data.
70 COLLECTING COMPENSATION DATA FROM EMPLOYERS TABLE 4-3â Model for Estimating Detailed Occupational Log Wage Differentials Controlling for Education, Major Industry, and Potential Experience (Current Population Survey, March Supplement 2010) Model 3 Standard t Value Parameter Estimate Error Pr > |t| Intercept [base is white 10.86089272 0.10171189 106.78 <.0001 (only) non-Hispanic male] Hispanic male â0.10250055 0.00874789 â11.72 <.0001 Hispanic female â0.26943452 0.0104887 â25.69 <.0001 White (only) non-Hispanic â0.22408845 0.00603451 â37.13 <.0001 female Black or African American â0.12759486 0.01016768 â12.55 <.0001 (only) non-Hispanic male Black or African American â0.27720803 0.00998411 â27.76 <.0001 (only) non-Hispanic female Native Hawaiian Islander â0.06154783 0.06152922 â1 0.3172 or Other Pacific Islander (only) male Native Hawaiian Islander â0.22666618 0.06175392 â3.67 0.0002 or Other Pacific Islander (only) female Asian (only) male â0.07824855 0.01319245 â5.93 <.0001 Asian (only) female â0.17077671 0.01501089 â11.38 <.0001 American Indian or Alaska â0.10341199 0.04060558 â2.55 0.0109 Native (only) male American Indian or Alaska â0.37084016 0.0398942 â9.3 <.0001 Native (only) female Two or more races male â0.08577552 0.02967712 â2.89 0.0039 Two or more races female â0.22016021 0.03197085 â6.89 <.0001 DF Model DF Error F Value Pr > F EEO-1 differentials 13 61483 138.38 <.0001 NOTE: Controls for detailed occupation (497 categories), education (16 categories), major industry (13 categories), and potential experience (quartic). SOURCE: Analysis by panel using Current Population Survey data.
SURVEY DESIGN AND STATISTICAL METHODOLOGY 71 this equation is 0.47, and the residual variance is 0.26. As can be seen in Figure 4-1, 545 employees are required for a power of 0.50 in this case, while about the same sample size (551 employees) yields a power of 0.90. The power curve for this model is flat because there are 496 degrees of freedom for the detailed occupation controls. Once there are adequate data to fit this model, about 50 additional observations are needed to achieve the target power for the EEO race and gender test. MINIMIZATION OF REPORTING BURDEN Estimation of Burden One reason for the outcry on the part of the business community when the Paycheck Fairness Act was under consideration in Congress was the perception that the legislation would impose a significant new reporting burden on employers, particularly on small employers. The Paperwork Reduction Act of 1995 specifically requires agencies to demonstrate the practical utility of the information that they propose to collect and to bal- ance this against the burden imposed on the public. EEOC currently calculates the cost and burden of its data collections in its submissions of Information Collection Requests to the U.S. Office of Management and Budget (OMB). The number of respondents (including multi-establishment respondents), responses (usually at the establishment level), estimated burden hours, costs, and mode of collection for the four major EEO data collections in the most recent reports of EEOC to OMB are shown in Table 4-4. The estimates of burden costs and hours in Table 4-4 are based on the EEOCâs best estimates of the amount of time it takes for clerks to retrieve and enter the data to paper records. However, because less than one-fourth of employers who report now file paper records, the burden estimates may be overstated. Options for Minimizing Response Burden To the extent that the current burdens data are representative, the ad- dition of earnings data to the existing EEOC data collection forms that do not now collect the data, in much the same manner in which earnings data are collected in the EEO-4 form, could be expected to nearly double the current burden on employers. In the case of the largest collection, the current average of 3.5 hours per EE0-1 form might increase to somewhere near the average of 6.6 hours now reported for the EE0-4 form. This is not an inconsequential increase in response burden. It would behoove EEOC to consider taking steps to reduce the increase in response burden.
72 TABLE 4-4 â Estimated Cost and Burden of EEOC Data Collections Estimated Percent Electronic Form Frequency Respondents Responses Burden Hours Estimated Cost Reported EEO-1 Annual 45,000 170,000 599,000 $11,400,000 80 EEO-3 Biannual 1,399 1,399 2,098 85,000 79 EEO-4 Biannual 6,018 6,018 40,000 700,000 76 EEO-5 Biannual 1,135 1,135 10,000 190,000 58 SOURCE: Data from EEOC Form 83-I submissions to OMB.
SURVEY DESIGN AND STATISTICAL METHODOLOGY 73 Several options are available for reducing the burden on reporters. Three are discussed in this sectionâless frequent data collection, use of a rotating scheme for certain employer size classes, and raising the size cutoff so that fewer employers would be in the scope of the collection. Less Frequent Collection The EEO-1 report is now collected annually, while the other forms are collected on a biannual basis. The main issue is with the EEO-1 form. The law does not require the annual collection of EEO-1 data. The timing of collection is an administratively imposed requirement. By administratively reducing the frequency of data collection, the burden might also be reduced, though the extent to which it might be reduced is not entirely clear. On the negative side, the less frequent availability of the reports would mean that the information that supports EEOC enforcement functions would be less current, by a year or so. This lag could be an important issue during economic turning points, when hiring or layoffs could significantly influence the employment and earnings profiles of covered firms. The time lag for EEOCâs investigations of potential discrimination would increase, and the ability of the agency to be responsive to complaints in a timely manner would be negatively affected. Rotating Sample It might be possible to continue to collect data annually but from only a part of the current reporting population and to permit firms with certain characteristics, such as not meeting a threshold size or in a selected industry group, to report less frequently. The selection of annual versus biannual reporters could, for example, be based on an analysis by EEOC of the probability of discrimination based, in turn, on the experience of the agency with enforcement. This tailored approach to selection of those firms that could report less frequently, however, would be hard to administer and could well be difficult to implement fairly in practice. Moreover, this nuanced approach might actually complicate matters for employers. Because so many firms automate their reporting, it is now a routine matter, and rotating the reporting requirement might actually increase the administrative burdens. Employers would need to figure out when they needed to report, and the task of developing a database to cap- ture the reports might be much more burdensome for EEOC.
74 COLLECTING COMPENSATION DATA FROM EMPLOYERS Raising the Size Cutoff The current employment cutoff for the annual requirement to submit an EEO-1 form is 100 employees (50 employees if the firm is a federal government contractor). This cutoff limits the overall potential response burden significantly. By raising the size cutoff to, say, 200 employees (based on the statistical power analysis presented above), the number of firms that would have to report earnings would be reduced by half, but the employ- ment coverage would be reduced by less than 10 percent (see Table 1-1, in Chapter 1). One consequence of raising the cutoff size would be a relative reduction in coverage of the earnings of females and minorities. The firms in the size classes for which the reporting requirement would be eliminated are those in which women and minorities are more heavily represented. Experiments with different cutoff sizes to better determine the tradeoffs between burden and coverage could be useful to include in the pilot study that the panel recommends (see Chapter 6). HUMAN RESOURCE AND PAYROLL SYSTEMS Most companies of the size covered by EEO regulations have at least somewhat automated payroll and human resource management systems. Today, larger companies are more able to comply with a potential require- ment for compensation data by gender, race, and national origin because they can gather compensation information from automated payroll systems and demographic data from automated human resource systems. The panel reviewed the state of automation of company payroll systems from the perspective of three service providersâa large payroll-providing service firm, a firm that specializes in the emerging software-as-a-service (SaaS) market, and a firm that specializes in using companiesâ own internal data to analyze EEO status and prepare Affirmative Action Plans for those companies. In summary, we found that automated systems were expanding rapidly among U.S. employers, but that there are differences in the extent of implementing these applications by size of firm. Currently, larger firms are likely to have human resource and payroll management systems, and they are likely to have an easier time in comply- ing with a new requirement to provide compensation data by demographic characteristics than would smaller firms. Over time, one would expect that the use of such systems will grow and spread among smaller firms. In the long term, these automated systems may well serve as the basis for EEOC employment and wage data collection. As discussed in Chapter 6, the panel recommends a pilot test to collect information on the extent of penetration of these human resource and automated systems: see Appendix C.
SURVEY DESIGN AND STATISTICAL METHODOLOGY 75 Payroll and Human Resource Providers The industry of payroll and human resource providers is character- ized by a growth in services beyond the usual provision of timekeeping and payroll functions. Most recently, the industry has expanded to include human resource management. As a result, one provider can bring together information on hours, earnings, and the demographics and work histories of the workforce. These data are captured directly from a clientâs data sys- tems, often without client intervention. The panel interviewed a large payroll-providing company to determine the influence of the growth of this sector on the reporting of earnings data to EEOC. This company lists 600,000 clients, representing, in the com- panyâs estimation, one of every six U.S. employees. The clients employ as few as 1 and as many as 1 million employees. The company has a line of business that focuses on smaller employersâ those with fewer than 100 employeesâto provide a total source of payroll and human resource services. The company estimates that about 40 percent of these smaller employers use human resource services as well as payroll services. One product for the clients who use human resource services and who have an OFCCP or EEOC requirement is to produce EEO-1 reports. Growth of Software-as-a-Service Applications The workshop presentation by Karen Minicozzi of Workday, Inc., representing an enterprise software solution, highlighted the unified human capital management solutions offered by the enterprise software and ser- vices provider, Workday, Inc. The company is one of a growing number of firms that provide turnkey payroll and human resource management solu- tions to businesses under the general label of SaaS. The solutions provide a new, global core system of record to replace legacy systems that have been maintained by the establishments themselves. The approach taken by these service providers is through a multitenant architecture: that is, one version of the application with common hardware, networking, and operating sys- tems is used for all customers (âtenantsâ). The applications are often sup- ported in the âcloud,â that is, through Internet connectivity. The fact that these new service approaches have so much in common allows the genera- tion of common reports (such as EEO reports) across the system, drawing on data from both the human resource and payroll functions of the serviced companies. Most of the companies that use this service are mid-size, large, and very large companies. Workday, Inc. has 246 customers. These SaaS providers have been enjoying remarkable growth. An annual survey of employing establishments by the consulting firm CedarCrestone,
76 COLLECTING COMPENSATION DATA FROM EMPLOYERS to ascertain the penetration of human resource applications in business, found them to be widespread, and it forecast SaaS as a deployment op- tion will likely continue that growth as organizations move from licensed on-premise solutions to the cloud. The source of this information is the CedarCrestone 2010â2011 HR Systems Survey. The survey is based on 1,289 responses, representing employers of over 20 million employees (CedarCrestone, 2011). The survey also found that there were measurable differences in the penetration of these administrative applications by size of firm. In the most recent survey, 94 percent of employers with 10,000 or more employees had such systems, compared with 87 percent for employers with 250 to 2,499 employees. The CedarCrestone survey found that most of the applications were still licensed software, but the subscription-based SaaS applications and outsourcing solutions were growing in use. Analysis of Salary and Related Data for Pay Equity Purposes In order to ensure that their firms are in compliance with the Equal Pay Act, Title VII, and Executive Order 11246 provisions, many employers use firms that perform compensation analysis and, in many cases, actually prepare automated Affirmative Action Plans. Other firms use software to support this analysis internally. The panel heard testimony from Liz Balconi and Michele Whitehead, representatives of Berkshire Associates, a company that is very active in the compensation analysis business. This company obtains the following infor- mation from its client firms: employee identifier; job code; race; gender; date of hire; annualized base salary or hourly rate; grade, band, or classification (if applicable); time in current position, or date of last title change; date of last degree earned, or date of birth; full time or part time status; exempt or non- exempt status; title; employee location; years of relevant experience (or date of birth); factors that may legitimately impact pay in an organization, such as performance rating; education; date in grade; professional certifications; divi- sion; job group; starting salary; and annualized total compensation (including bonuses, commissions, cost of living allowances, and overtime). The firm uses these data (which are generally available from their clients) to conduct two kinds of analyses: cohort analysis, which is a nonstatistical comparison of similarly situated incumbents within a group based on factors such as time in the company, educational background, and performance assessment; and statistical (regression) analysis to study the combined effect of factors on pay between comparator groups. Although not all of these data elements may be necessary to identify potentially discriminatory practices, prudent employers can be expected to have these types of data available and to use them to evaluate their own practices, using algorithms developed by specialty firms such as Berkshire Associates.