National Academies Press: OpenBook
« Previous: 4 Do All Eligible Employers Receive and Respond to the Component 2 Instrument?
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

5

Measurement Quality

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

INTRODUCTION

This chapter addresses the key question of whether important measurement quality issues exist for current Component 21 data that EEOC would benefit from addressing before the next round of data collection. The chapter focuses on sources of measurement error, including item nonresponse (INR) and other errors arising from respondent reports, data capture, and data processing. Quality indicators that quantify the extent of item missingness, data implausibility and inconsistency, and response variance or unreliability are analyzed. A post-hoc strategy for improving data quality for analyses of 2017 and 2018 Component 2 data is also proposed, which focuses on removing erroneous and implausible observations from the analysis dataset. The chapter concludes with recommendations for further quality analyses and for strengthening future data-collection efforts, to reduce measurement errors and improve fitness for use.

To briefly summarize the key findings, measurement quality was assessed both before and after filtering the data (i.e., removing a small

___________________

1 In 2018 and 2019, EEO-1 data collections occurred in two components. The customary EEO-1 instrument containing composition data (Component 1) was collected in 2018 for reporting year 2017, and in 2019 for reporting year 2018. In 2019 and 2020, pay data (Component 2) were collected for reporting years 2017 and 2018. In this report, we refer to historical EEO-1 data collections as EEO-1, and the 2017 and 2018 EEO-1 Component 1 and Component 2 instruments as “Component 1” and “Component 2,” respectively. All information collected by the Component 1 instrument is also collected by the Component 2 instrument.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

percentage of establishments whose data were deemed implausible). Before filtering, about 10 percent of SROs had either zero employees while hours worked were non-zero, or vice versa. A comparison of Component 1 and Component 2 data for number of employees determined that about 16 percent of SROs were zero in one component and non-zero in the other, which was evidence of response error. In addition, there were very large discrepancies in number of employees by SRO between Component 1 and Component 2 data. These discrepancies had a considerable impact on response reliability, which is estimated primarily from differences between Component 1 and 2 data. Filtering the data on number of employees per SRO improved reliability for that variable but did not affect the inconsistency in zero reports, nor did it improve the reliability of hours worked per employee. Additional filtering for hours worked per employee is needed but out of scope for this report. These findings suggest that missing data further erode the degree to which Component 2 data represent employers eligible for the data collection. In addition, substantial editing may be required to further improve data usability.

Chapter Overview

EEOC and its contractor the National Opinion Research Center (NORC) performed certain edits to the Component 1 and 2 data prior to receipt of those data by the panel. The panel understands that EEOC may have access to alternative versions of the data files with additional edits, which were unavailable to the panel. The edits made by NORC include combining multiple files and file formats into employer-level and establishment-level data files for each year, verifying whether an employer’s data were captured by another employer, replacing multiple file uploads per employer with the largest and latest filing, verifying whether the correct filing was used, assigning user IDs to previously unrecorded employers submitted through a professional employer organization, and a wide variety of other edits as documented in NORC’s Data Cleaning and Post-Processing report to EEOC (EEOC, 2020h). However, not all issues identified by NORC were resolved, other than to accept the data as certified and possibly flag the data. The edit checks performed by the panel were applied to the NORC-edited data.

Prior to the panel’s analysis, a number of editing and coding steps were performed by NORC to confirm that the data were consistent with EEOC’s quality standards. The overall approach to data editing, in accordance with EEOC’s guidance, was to accept all data elements as submitted by respondents and certified by the data-collection process but to flag inconsistent data or data that failed other data-integrity checks. These checks included a variety of edits to resolve: (1) data quality issues noted and tracked by

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

the employer help desk; (2) issues arising during and after data integration and file preparation; (3) issues in the remarks and questions that filers entered on the forms; (4) issues identified from process data generated as filers entered data online; (5) outright filer errors; and (6) other basic data-integrity checks.

As an example, if a given sex-race/ethnicity-occupation-pay band (SROP) cell contains a non-zero number of employees, then the corresponding number of hours worked for the cell should also be non-zero, and vice versa. Violations of this rule were flagged but otherwise left unaltered by the editing process. Other quality checks included two employer-size checks: one that compared reported information between Components 1 and 2 data for the same reporting year, and another that compared Component 2 data for 2017 and 2018. Implausibly large differences were flagged but not omitted from the data file. Additional information regarding the editing and coding process is available in the contractor’s data-processing methodology (EEOC, 2020h).

The overarching question of this chapter is whether important measurement quality issues exist for current Component 2 data that EEOC could address before the next round of data collection, through improvements to survey design or administration. One facet of this question involves whether measurement issues identified for Component 2 data can be resolved to improve the overall quality and usability of current data. This can involve applying criteria to either flag or delete questionable or ostensibly erroneous data. Another facet of this question is to identify lessons learned from the 2017–2018 Component 2 data-collection experience that could improve data quality in future data collections (see Chapter 8).

The data available to resolve Component 2 data quality issues are quite limited. Direct evidence of measurement error for a given employee SROP observation requires knowledge of the true value (or construct) underlying the observation. For example, an accurate estimate of the error in the number of employees for a given SROP cell requires knowledge of the true number of employees in the cell. While these data are known by establishments, they are not readily available for this analysis. Moreover, if collected retrospectively, data may still be inaccurate due to recall error and the dynamic nature of employment. Likewise, at any level of aggregation, a direct estimate of the bias in the total number of employees (or their corresponding average number of hours worked) requires knowledge of the true value of the aggregate, which is also unavailable for this analysis. In some cases, Component 1 data can serve as a check on Component 2 data; however, Component 1 data are also subject to many of the same errors as Component 2 data, so caution is needed to properly interpret these comparisons. In addition, the Component 1 data collection used an instrument quite different from that of the Component 2 data collection,

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

so the constructs may not align perfectly. Thus, when the two components disagree, it may be impossible to determine which component is in error, or even if an error occurred.

Nevertheless, it is still possible to develop reasonable criteria for identifying and resolving data quality issues using indirect methods. The methods employed in this section to assess Component 2 measurement quality include the following:

  • An analysis of extreme values, inconsistent reports such as zero employees with non-zero hours worked or vice versa, and other data anomalies identified using only Component 2 data for a given year.
  • Comparisons between Component 1 and Component 2 data to identify:
    • inconsistencies between Component 1 and Component 2 data for the number of employees in data cells defined by SRO; and
    • reliability estimates for number of employees at the SRO level, treating Component 1 and Component 2 data as parallel measures.
  • Comparisons between 2017 and 2018 Component 2 data, including
    • inconsistencies between years of Component 2 data for the number of employees and hours worked in SROP cells, and
    • index of inconsistency estimates (defined below) based upon comparisons of 2017 and 2018 Component 2 data for the same establishments and SROP cells.

These methods were used to produce quality indicators that may imply the existence of errors in the data under specified conditions. For example, a large discrepancy between Components 1 and 2 data in the number of employees for an establishment may imply an error in either or both components. In some situations, it is possible to determine which component is likely in error based upon the relative and absolute magnitudes of the differences and other plausibility arguments. Likewise, a large disagreement between years of Component 2 data for the number of employees in some SROP cells could suggest an error in either or both years of data. However, because of the time lag between reports, small discrepancies between 2017 and 2018 data are expected due to real change and should not necessarily be interpreted as response error.

Note that the data analyzed in this chapter omit firms with total sizes larger than the largest possible firm in the target population. Type 6 establishments (that is, establishments for which the filer exercised the option to only report the number of employees, and therefore not also provide the sex, race/ethnicity, occupation, and pay of those employees) are also excluded.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

Notation

This section will establish basic notation used throughout the chapter. Nontechnical readers who wish to skip this section need to understand three points. First, the subsequent analysis will address errors in both employee counts and hours worked, both at the SRO and SROP levels. Second, to evaluate the biasing effects of measurement error, some measure of “truth” is needed which, unfortunately, is unavailable. However, later in the chapter, Component 1 data will be regarded as closer to “truth” for the purposes of eliminating highly implausible Component 2 data values. Finally, the notation below establishes a convenient way of expressing some of the technical aspects of the methodology employed in the analysis and clarifies units of analysis (i.e., SRO, SROP, establishment, or firm).

Let ytik denote a response (either number of employees or number of hours worked) for the ith establishment in the kth cell defined by the cross-classification of SROP within the establishment for year t = 2017, 2018. (Let j denote the cell defined by the cross-classification of SRO, used later in this chapter.) Let y t i k (E) denote number of employees and y t i k (H) denote number of hours worked within the cell (t, i, k). Let µtik denote the corresponding (unobserved) true value for the cell with µ t i k (E) and µ t i k (H) corresponding to y t i k (E) and y t i k (H) , respectively. Superscripts and the subscript t will be ignored when it is not necessary to differentiate between number of employees or number of hours worked or the particular survey year.

Dropping the superscripts and the subscript, t, let eik = yikµik denote the (measurement) error in the observation yik. Much of the analysis in this section focuses on the measurement error eik and its various aggregates. Note that, if µik were known, then eik would also be known. However, because µik is unknown, the analysis will involve the assessment of various quality indicators designed to be predictive of eik.

Let Y i = k y i k denote total number of employees observed for the ith establishment, where the sum is over SROP cells reported by the establish-note ment. that Let M i = k μ i k and E i = k e i k referred to as measurement bias, and note that E i = Y i k M i .

INR may be represented as a type of measurement error where at the unit level, yik = 0 (assuming erroneous blanks are converted to zeros) when μik ≠ 0. Thus, in the absence of any adjustment for INR, the bias in the total, Yi due to INR is E INR, i = y i k = 0 | μ i k 0 μ i k , where the sum is over all k for which yik = 0 but µik is (believed to be) non-zero.

Some analyses will require using the firm as the unit of analysis. A firm-level total is the sum of the totals for all establishments with the same employer identification number (EIN). For example, the firm-level total for employees is i k U EIN y i k (E) where UEIN is the set of all establishments with the same EIN (i.e., within the same parent firm).

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

INTERNAL INCONSISTENCY AND EXTREME VALUES

Internal Inconsistencies

This section considers the inconsistencies in the Component 2 data within the same survey year. Two types of inconsistency are of interest: (1) a zero employee count for an SROP cell when hours worked for that cell are non-zero; (i.e., y i k (E) = 0 and y i k (H) 0); and (2) zero hours worked for an SROP cell when the employee count for the cell is non-zero; (i.e., y i k (E) 0 and y i k (H) == 0). For example, Establishment ABC entered 15 employees in an SROP cell in the “Number of Employees” section of the Component 2 instrument. However, for the same SROP cell, nothing was entered in the “Total Number of Hours Worked” section. The same issues occur for an SROP cell that may have 10,000 hours entered in the “Total Number of Hours Worked” section and nothing entered for this cell in the “Number of Employees” section.

Both (1) and (2) errors can arise in two ways. One possibility is that the missing component constitutes INR (i.e., the zero cell should have a non-zero entry, but the respondent failed to provide that value). A second possibility is that the non-zero cell is in error (i.e., it should also be zero) and the zero cell is correct. To simplify the discussion, both (1) and (2) will be referred to as “missingness” and, due to insufficient information, no attempt will be made to distinguish between the two possible causes.

Table 5-1 provides the results of a tabulation of (1) and (2) for the two years of Component 2 data for the primary strata used for the analysis: administration mode, establishment size, and occupation. To summarize:

  • Overall, missing hours-worked data occur more than twice as frequently as missing numbers of employees (i.e., missing hours-worked data account for about 70% of missing values).
  • About 100 percent of the missingness for the online-entry mode is due to missing hours worked while, for the data-upload mode, missing hours worked accounts for about 63 percent of missingness.
  • When employees and hours worked are considered jointly, 10 percent of SROPs for uploaded responses are essentially unusable.
  • Percent missing is quite small for the online-entry mode (less than 1% of all SROPs) while about 10 percent of SROPs are missing either hours worked or number of employees for the data-upload mode. Nevertheless, since about 72 percent of data are submitted using the online-entry mode, about 20 percent of all missingness arises due to online submissions.
  • Establishment size appears to be unrelated to item missingness.
  • The executive EEO job category stands out as having much greater missingness. One possible explanation is that executives
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-1 Percent of SROP Cells with Missing Data on Hours Worked or Number of Employees

Firm, Establishment, and Employee Characteristic 2017 2018
Missing Hours Worked; Have Employee Counts Missing Employees; Have Hours Worked Missing Hours Worked; Have Employee Counts Missing Employees; Have Hours Worked
Administration Mode
Online-Entry 0.9 0.0 0.8 0.0
Data-Upload 6.3 3.7 5.9 3.3
Size Distribution
Fewer than 100 2.1 0.9 2.0 0.8
100–249 2.8 1.4 2.6 1.2
250–499 2.5 1.0 2.2 0.9
500–999 2.2 0.7 2.1 0.7
1,000 or More 2.5 0.6 2.4 0.5
Job Category
Executive 5.9 3.0 5.5 2.6
First/Midlevel 2.3 1.1 2.2 1.0
Professionals 2.3 1.0 2.1 0.9
Technicians 2.2 1.1 2.2 1.0
Sales Workers 1.7 0.7 1.6 0.6
Administrative Support 2.4 1.1 2.3 1.0
Craft Workers 2.6 1.2 2.5 1.1
Operatives 2.4 0.9 2.2 0.8
Laborers and Helpers 3.1 1.3 2.8 1.1
Service Workers 2.2 0.7 2.0 0.6
Overall 2.4 1.0 2.2 0.9

SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
  • are Fair Labor Standards Act-exempt and their hours worked are not tracked; however, executives also have a higher-than-average amount of missing data on employee counts, which would not be explained by their exempt status. Interestingly, professionals and managers, who also might be expected to be exempt, do not stand out as having high missingness, which suggests that the problem may not be an inability to report hours worked for exempt workers.

These results suggest that missingness (or more generally, inconsistent zero and non-zero values) at the SROP level is an important issue for both years of Component 2 data because it renders about 10 percent of (employee count, hours worked) pairs unusable. It is also evident that missing hours worked is a much bigger problem for Component 2 data than is missing employee numbers, for both survey years. Moreover, missingness essentially appears to be a problem for the data-upload mode (rather than the online-entry mode) because the percentage of missing data for the online-entry mode is rather trivial. These findings suggest that missing data constitute an important data quality issue for current Component 2 data from uploaded forms—one that would be advantageous to address when planning future data collections.

Extreme Values

Implausible or extreme data values are another indicator of measurement error. An implausible value for number of employees is defined at the firm level for any firm reporting a number of employees that exceeds the largest U.S. employer. This value is quite conservative in that even firms with numbers of employees somewhat less than this extreme might still be considered implausible. However, the panel’s intent in using this criterion was to err on the side of inclusion, to avoid erroneously eliminating extreme but still valid firms. Extreme firms, defined as firms with employee counts exceeding 1.4 million (that of the largest firm in the U.S.), were removed from the Component 2 data for both years and were not included in subsequent analyses.

In addition to eliminating implausible firm-level values for numbers of employees, we also flagged implausible values for number of hours worked per employee for further analysis. An implausible SROP value for average hours worked per employee is defined as exceeding 16 hours/day, 365 days per year, or 5,840 hours or more per year. Such SROPs were assigned red flags. SROPs with average numbers of hours worked exceeding three standard deviations from the mean were assigned orange flags. All other SROPs were assigned green flags. The three standard deviation rule is somewhat arbitrary; however, by Chebyshev’s inequality, at least 89 percent

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

of data should be within these so-called “3 sigma” bounds. Additional red, orange, and green flags were defined for establishments and firms as shown in Box 5-2. These flags were applied to both years of Component 2 data after removing firms whose reported total number of employees exceeded 1.4 million (that of the largest U.S. employer). 2017 results are reported in Table 5-2; results for 2018 (not shown) were very similar.

Key findings from Table 5-2 can be summarized as follows:

  • About 0.3 percent of SROPs had highly unlikely values (red flags). After eliminating these, a trivial number of SROPs had values beyond three standard deviations of the mean (orange flags).
  • At the establishment level, data uploads and large establishments tended to be most problematic. For the categories considered, no other organizational traits stood out as particularly problematic.
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
  • However, almost 17 percent of establishments that uploaded Component 2 data had orange flags versus less than three percent for the online-entry mode.

Thus, it appears from this analysis that very few SROPs had extreme or highly unlikely data. However, a substantial number of establishments had SROPs with at least one highly unlikely value and/or a fairly high number of extreme values. Furthermore, there were no firms with red flags but a substantial number with orange flags.

COMPARISONS BETWEEN COMPONENTS 1 AND 2 DATA

Because both the Component 1 and 2 data collections collected data on number of employees from many of the same establishments for the same year, comparisons between the two data collections could provide insights regarding Component 2 data quality. The reporting periods for the two components are slightly different, as described in Chapter 2, but their reports for total number of employees should still agree closely.

This analysis required the linking of establishments participating in both Component 1 and Component 2 data collections, both for 2017 and 2018 (see Appendix 5-3). To summarize, matching establishments across the two components proved somewhat problematic, in that only about 67 and 70 percent, respectively, of the 2017 and 2018 Component 2 data could be linked with high confidence to Component 1 data. As a result, analyses reported for these linked datasets may not be representative of Component 2 target populations. For example, an unmatchable Component 2 data unit may have greater measurement error than the matched units, which is not uncommon in linked data (e.g., see Herzog et al., 2007). If so, then the results reported in this section could somewhat overstate the quality of Component 2 data.

The absence of pay bands in Component 1 data requires that Component 2 SROP data be aggregated across pay bands for comparison. Number of hours worked is not collected in the Component 1 data collection, so comparisons are not possible for that variable. Despite these differences, it is still possible and useful to compare the number of employees by SRO for both employee counts and hours worked, by collapsing the Component 2 statistics over pay bands. Even allowing for potential differences in reporting periods, the numbers of employees in the same SRO cell for both components should closely agree. This is especially true for larger establishments in which employee counts are expected to be more stable over time. Thus, inconsistent zeros or large discrepancies in reported numbers of employees between the two components can serve as a quality indicator (i.e., an indicator of the risk of measurement error for either or both data-collection components).

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-2 Percentage of Firms, Establishments, and Cells by Hours-Worked-per-Employee Flag Status, Administration Mode, and Size for 2017, Component 2

Firm Characteristic Firms Establishments Cells
Red Orange Green Red Orange Green Red Orange Yellow Green
Administration Mode
Online-Entry 0.3 12.5 87.1 0.3 2.4 97.3 0.3 # 0.9 98.9
Data-Upload 0.6 21.6 77.8 0.4 16.7 82.9 0.3 # 6.3 93.4
Size Distributiona
Fewer than 100 0.0 2.2 97.8 0.3 4.8 94.9 0.2 # 2.1 97.6
100–249 0.5 12.7 86.8 0.4 8.1 91.5 0.2 # 2.8 97.0
250–499 0.5 18.5 81.0 0.5 8.4 91.1 0.3 # 2.5 97.2
500–999 0.5 23.6 75.9 0.3 8.8 90.8 0.3 # 2.2 97.6
1,000 or More 0.6 34.6 64.8 0.4 15.1 84.5 0.4 # 2.5 97.1
Missing or Invalid 0.0 0.0 100.0 0.0 0.0 0.0 100.0
Overall 0.5 18.2 81.3 0.3 5.4 94.2 0.3 # 2.4 97.3

SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. # indicates rounds to zero.

a Firm size for first set of three columns; establishment size for the remainder.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

Let y1ij and y2ij denote the numbers of employees reported by the ith establishment in the jth SRO cell for Components 1 and 2 data, respectively. Three types of inconsistencies are of interest in this analysis: (1) y1ij≠ 0 and y2ij= 0 (i.e., Component 1 cell is non-zero while Component 2 cell is zero); (2) y1ij= 0 and y2ij≠ 0 (i.e., Component 1 cell is zero while Component 2 cell is non-zero); and (3) | y1ij– y2ij| >> 0 when y1ij> 0 and y2ij > 0 (i.e., there is a very large difference between Components 1 and 2 for two non-zero cells in Components 1 and 2, respectively). As an (imperfect) example, in 2017, one establishment reported 561 employees for an SRO in the Component 1 data collection, while reporting 0 employees for the same SRO in the Component 2 data collection. It is highly unlikely that both components are correct, given that the two data collections were conducted at approximately the same time.

Table 5-3 presents the results for inconsistencies of type (1) and (2). Some key findings from this table are the following:

  • Almost 16 percent of all SROs are inconsistent with regard to zero employees between Component 1 and Component 2 data, which is substantial.
  • Data from the online-entry mode are slightly less consistent than data from the data-upload mode.
  • Smaller establishments are slightly less consistent than larger establishments.

Note that the data-upload mode was organized differently than the online-entry mode. The data-upload mode had one row for each combination of sex, race/ethnicity, occupation, and pay band, with the numbers of employees and hours worked next to each other, while the online-entry mode had 20 separate grids, with 10 grids (one per job category) collecting data on the number of employees and 10 collecting data on the number of hours worked (EEOC, 2020e). Thus, differences in data reliability could reflect differences in the actual mode (online-entry mode versus data-upload mode) or differences in the instrument layout. In particular, the side-by-side placement of the numbers of employees and hours worked in the data-upload mode may have helped to prevent inconsistencies between the data on hours worked and the data on employee counts.

For Type (3) inconsistencies, we compared the SRO cell values for the two components using the relative difference (RD) metric, defined as

R D i = 2 y 1 i j y 2 i j y 1 i j + y 2 i j

as well as the absolute value of RDi denoted by ARDi. These results are presented in Table 5-4 and Figure 5-1, respectively.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-3 Percent of SROs with Inconsistent Zero Numbers of Employees When Comparing 2018 Component 1 and Component 2

Firm Characteristica Only Component 1 Non-Zero Only Component 2 Non-Zero Both Components Non-Zero
Administration Mode
Online-Entry 9.1 10.1 80.7
Data-Upload 7.0 7.2 85.8
Size Distribution
Fewer than 100 9.2 8.4 82.4
100–249 6.6 7.7 85.7
250–499 4.9 6.8 88.3
500–999 4.3 7.7 88.0
1,000 or More 3.5 8.7 87.8
Overall 7.6 8.0 84.4

SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

a Size strata based upon establishment sizes from Component 1 data.

Table 5-4 suggests that, overall, the average absolute relative difference (ARD) for number of employees is small for both years (−0.66 and –1.62%, respectively), although it is more than twice as large in 2018 than in 2017. RD is quite large for the largest establishments (almost 35 and 38%, respectively) and at least double for online-entry submissions relative to data uploads. Therefore, this may be a larger issue for either establishments who respond online or, more generally, for the online-entry methodology.

Figure 5-1 shows that almost 20 percent of the largest establishments have numbers of employees in SROs that differ from Component 1 data by almost 100 percent in 2018 and more than 60 percent in 2017. It is also evident that all strata considered have a substantial percentage of Component 2 SROs that are quite different from their counterparts in Component 1 data.

These results suggest that the data from the two components do not align as closely as expected for a substantial percentage of the Component 2

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-4 Mean Intercomponent RD for Number of Employees in an SRO Cell by Administration Mode, Size, and Year

Firm Characteristica 2017 RD (%) 2018 RD (%)
Administration Mode
Online-Entry –1.89 –2.64
Data-Upload –0.30 –1.30
Establishment Size
Fewer than 100 0.68 –0.34
100–249 –2.91 –3.74
250–499 –6.52 –6.84
500–999 –11.18 –11.66
1,000 or More –34.43 –37.58
Overall –0.66 –1.62

SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

aSize strata based upon establishment sizes in Component 1 data.

universe and, furthermore, the likely cause is measurement error in the data of one or both components. See “Filtering Data to Improve Usability,” below, for an approach for addressing these issues and improving data usability.

A final quality indicator considered in the analysis is the index of inconsistency (Biemer, 2011), which can be computed for Component 1 and 2 intercomponent SRO pairs for the same year, as well as for Component 2 interannual pairs. For Component 1 and 2 SRO pairs, let i = 1,…,m denote the establishment, where m is the number of establishments in the target population (or stratum); n is the observed number of establishments; and let j denote an SRO cell within the ith establishment. The index of inconsistency for the ith establishment is defined over all cells for which both y1ij > 0 and y2ij > 0 as

I i = 2 j = 1 n i ( y 1 i j y 2 i j ) 2 V i

where

V i = j ( y 1 i j y 1 i ¯ ) 2 + j ( y 2 i j y 2 i ¯ ) 2 + j ( y 1 i j y 2 i ¯ ) 2 + j ( y 2 i j y 1 i ¯ ) 2 , y 1 i ¯ = j y 1 i j / n i = 1 , y 2 i ¯ = j y 2 i j / n i = 1 ,

and where ni is the number of SRO cells in establishment i for which both y1ij > 0, y2ij > 0. The equation’s numerator is quite intuitive in that it is simply

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-1 Top three quintiles for intercomponent average ARD for number of employees in an SRO cell by year and firm characteristic (excludes inconsistent zero cells).
SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

the squared difference between Component 1 and 2 observations. However, the denominator is less intuitive. In the language of analysis of variance (ANOVA), the numerator is the between-treatment (i.e., Component 1 versus Component 2 data) sum of squares, while the denominator is the total (both within and between) sum of squares for the ANOVA.

As discussed in Biemer (2011), Ii is the proportion of total variance that is error variance. In the current application, Ii is the proportion of the total variance for number of employees across all SROs within the ith establishment that is attributable to measurement error. Because it is a proportion, Ii will vary across establishments in the unit interval (i.e., [0, 1]). Values near one denote high inconsistency; values near zero denote low inconsistency (or high consistency). Furthermore, if data from Component 1 and 2 can be regarded as parallel measures (i.e., y1ij and y2ij have the same true scores, and their errors are independent and identically distributed), then 1− Ii may

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

be interpreted as an establishment-level measure of test-retest reliability. In the survey literature (e.g., Biemer, 2011), a rule of thumb for interpreting the index of inconsistency for parallel measures as follows: 0 ≤ Ii ≤ 0.20 is good, 0.21 ≤ Ii ≤ 0.50 is moderate, and Ii ≥ 0.51 is poor.

Rather than evaluating the reliability of individual establishments, a summary measure of reliability is needed. Two methods for averaging the values of Ii across establishments are of interest in this analysis and defined as follows:

I ¯ 1 = i = 1 m I i / m ( simple average ) ;  and I ¯ 2 = 2 i m j ( y 1 i j y 2 i j ) 2 i m V i i = i m V i I i i m V i ( weighted average ) . I ¯ 2

Note that Ī1 is an unweighted average because all establishments are given the same weight (i.e., 1/m) in the calculation, while Ī2 is a weighted average of Ii, for which the weight for the ith establishment is V i / i m V i (i.e., the proportion of the total variance across establishments contributed by the ith establishment). Thus, Ī2 is weighted toward larger establishments with highly varying SRO sizes, which may better reflect overall data quality than Ī1, which does not account for an establishment’s contribution to total variance. Other weighting methods are possible and could be explored in a more extended analysis of reliability. Note that the rules of thumb for Ii also apply to both Ī1 and Ī2, which are also confined to the unit interval.

Table 5-5 shows the values Ī1 and Ī2 for comparisons of 2017 and 2018 Component 1 and 2 data. Overall, the unweighted Ī1 version of the index is in the “good” range for both years. However, this is not true for the weighted version, Ī2 which, for many strata, the index is poor. One explanation for the difference between the two versions of the index is the lack of consistency for larger establishments which, _as previously noted, receive larger weights in the calculation of Ī2 but not Ī1. In the penultimate section of this chapter, it is shown that less than one percent of all establishments with extremely large error variances are the major cause of the high values of Ī2 in Table 5-5. To correct this, a data-filtering strategy is proposed that removes these establishments from the dataset and thus substantially reduces Ī2 to acceptable levels. Hopefully these extreme variance issues will be eliminated in future Component 2 data collections; otherwise, filtering will continue to be necessary to improve reliability.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-5 Average Indexes of Inconsistency When Comparing Component 1 and Component 2 Responses, by Administration Mode, Size, and Year

Firm Characteristica 2017 2018
Ī1 Ī2 Ī1 Ī2
Administration Mode
Online-Entry 0.14 1.00 0.12 1.00
Data-Upload 0.13 0.87 0.11 0.95
Establishment Size
Fewer than 100 0.15 0.47 0.12 0.41
100–249 0.09 0.19 0.08 0.15
250–499 0.08 0.10 0.08 0.09
500–999 0.09 0.10 0.09 0.08
1,000 or More 0.19 1.00 0.20 0.99
Overall 0.14 1.00 0.11 0.99

SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

a Size strata based upon establishment sizes in Component 1 data.

COMPARISONS BETWEEN 2017 AND 2018 COMPONENT 2 DATA

Additional insights on measurement error in Component 2 data can be gleaned from comparisons of 2017 and 2018 data, both for number of employees and average number of hours worked per employee. For this analysis, 2017 and 2018 Component 2 datasets were linked and merged as described in Appendix 5-3 (Liao, 2021b). Although the match rate improved this merge, the linkage was still somewhat problematic, with only about 70 percent of Component 2 data matching. As noted, the failure to match 100 percent of the data means that the results reported in this section may not represent the full Component 2 data. Although the degree to which the data are not representative is unknown, it is possible that unmatched entities would be subject to greater data issues and, if so, the results reported here may understate measurement-error issues.

One problem with interannual analysis of Component 2 data is the one-year time lag between the two Component 2 data collections. This lag

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

means that inconsistencies between the two years may result from both measurement error and actual year-to-year change—and there is no way to distinguish between the two. Given the available data, perhaps the best that can be done is to use the magnitude of the change as an indicator of measurement-error risk. For example, an establishment reporting 100 or more employees in an SROP in 2017 and 0 employees for the same SROP in 2018 might be regarded as measurement error. Likewise, the index of inconsistency may be biased upward due to real change; thus, 1–I systematically underestimates Component 2 data reliability.

However, there are several advantages to the interannual analysis. For example, unlike the analyses of the prior section, which were confined to the SRO level and only for number of employees, interannual analyses can be conducted at the SROP level for both numbers of employees and average number of hours worked. In addition, the instruments were very similar for the two years of the Component 2 data collection, whereas the Component 1 instrument was quite different from the Component 2 instrument, which could introduce extraneous measurement error. In addition, for comparisons of the numbers of inconsistent zeros, the corresponding results from the Component 1 and 2 data analysis could help with the interpretation of both interannual and intercomponent comparisons. Thus, the analysis in Table 5-6 considers inconsistencies in the reporting of zero employees in an SRO cell for comparability with the Component 1 versus 2 data analyses. While comparisons for the SROP level across survey years are also possible, they may be difficult to interpret for the reasons previously discussed.

Let y 2017 i j ( E ) and y 2018 i j ( E ) denote the 2017 and 2018 Component 2 numbers of employees, respectively, for the ith establishment and jth SRO cell (i.e., collapsing SROP cells by pay bands within the ith establishment). Table 5-6 shows the proportions of SROs for which y 2017 i j ( E ) > 0 and y 2018 i j ( E ) = 0 (i.e., non-zero only in 2017), y 2017 i j ( E ) = 0 and y 2018 i j ( E ) > 0 (i.e., non-zero only in 2018) as well as both y 2017 i j ( E ) > 0 and y 2018 i j ( E ) > 0 (non-zero in both years).

Comparing the results in Table 5-6 with those in Table 5-3, the proportion of SRO cells that are non-zero in both years of the Component 2 data collection is about 12 percentage points lower than the proportion of SRO cells that are non-zero across Components 1 and 2 data. Some increase is expected given the one-year gap between reporting periods in Table 5-6 as compared with using essentially the same reporting periods in Table 5-3. The difference is greater for the data-upload mode than for the online-entry mode, as well as for smaller establishments. For establishments of 1,000 employees or more, the percentage non-zero in both data collections is only 2.8 points higher than the corresponding figure in Table 5-3. Because of the confounding effect of real change in employment over the one-year interval between data-collection periods and inadvertent item missingness, it is unclear the extent that inconsistencies in Table 5-6 can be attributed

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-6 Percent of Component 2 SROs Having Non-Zero Employees in One or Both Years by Administration Mode, Size, and Year

Firm Characteristic Only Non-Zero in 2017 Only Non-Zero in 2018 Non-Zero in Both Years
Administration Mode
Online-Entry 12.2 13.9 74.0
Data-Upload 13.6 14.4 72.0
Establishment Size
Fewer than 100 15.2 16.4 68.4
100–249 11.5 12.2 76.4
250–499 10.4 10.8 78.9
500–999 8.9 9.5 81.7
1,000 or More 7.4 7.6 85.0
Overall 13.3 14.2 72.5

SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. This table omits cells that were zero in both 2017 and 2018 data.

to reporting error versus to true change in occupational structure and demographic employment composition.

The next analysis revisits the RD for the ith establishment, defined this time at the SROP level for the difference between 2017 and 2018 Component 2 data. For all non-zero SROPs (i.e., where y 2017 i k ( E ) > 0, y 2018 i k ( E ) > 0, y 2017 i k ( H ) > 0, y 2018 i k ( H ) > 0) define hours worked per employee as y t i k ( H/E ) = y t i k ( H ) / y t i k ( E ) for t = 2017 and 2018. Define RD for number of employees as

R D i ( E ) = 2 y 2017 i k ( E ) y 2018 i k ( E ) y 2017 i k ( E ) + y 2018 i k ( E )

with a similar definition for average number of hours worked, RDi(H/E). Table 5-7 summarizes these results.

Focusing on Column 2 of Table 5-7, the overall RD for the number of employees (–0.75%) is similar to that reported in Table 5-4 for 2017 (i.e., –0.66%) but considerably smaller in absolute value than that reported for 2018 in Table 5-4 (i.e., –1.62%). Nevertheless, an overall RD of less

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-7 Average Component 2 Relative Differences When Comparing 2017 and 2018 for Number of Employees (RD(E)) and Average Hours Worked (RD(H/E)) by Administration Mode and Size

Firm Characteristic RD(E) (%) RD(H/E) (%)
Administration Mode
Online-Entry –1.79 –2.20
Data-Upload –0.49 –3.19
Establishment Size
Fewer than 100 –1.12 –3.32
100–249 0.32 –1.64
250–499 1.57 –0.83
500–999 1.85 –1.24
1,000 or More 9.18 –4.44
Overall –0.75 –3.08

SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

than one or two percent could be considered small for these data. For the online-entry mode and the larger size categories (100 employees or more), the magnitudes of the values of RD(E) in Table 5-7 are substantially lower than their counterparts in Table 5-4, both for 2017 and 2018. This is somewhat counterintuitive, given that the Table 5-4 comparison is for the same years while the Table 5-7 comparison is for data collections reporting about one year apart. This further emphasizes the findings for Table 5-4—that the differences in numbers of employees for Components 1 and 2 data are larger than expected for some strata. Methodological effects are one possible explanation, given that the Component 1 and 2 data collections use somewhat different instruments and data-collection methodologies. In addition, the two tables consider somewhat different populations, because of the nonmatches from the merge operation described in Appendix 5-3.

From the rightmost column of Table 5-7, it is clear that the magnitude of the RD for average hours worked is generally larger than the RD for number of employees, with an overall RD(H/E) of –3.08 percent versus an overall RD(E) of –0.75 percent; that is the average number of hours worked

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

for 2018 exceeds that of 2017 by about 3 percent, which may be considered small given the aforementioned time lag. Interestingly, 2018 average hours worked is larger than 2017 average hours worked across all strata. Again, one can only speculate as to why, but given that average hours worked increased as unemployment decreased in 2018, real change, rather than error, is a strong possibility.

Next, the distribution of the absolute value of the RDs, denoted ARD, for the two variables by strata (Figures 5-2 and 5-3) is considered. As in Figure 5-1, only the upper quintiles are shown in these figures. The vertical axes show ARD, expressed as a percentage. For number of employees in Figure 5-2, the 80th percentile of the ARD distribution is about 28 percent overall, which implies that about 20 percent of Component 2 data exceeds an ARD of 28 percent. However, across size categories of 100 or more employees, the 80th percentile is only 15–19 percent. The results for average hours worked (Figure 5-3) are very similar.

Finally, Table 5-8 repeats the analysis of Table 5-5 for both versions of the and index of inconsistency. Now, however, y1ij and y2ij are replaced by y 2017 i k ( E ) and y 2018 i k ( E ) , k denotes the kth SROP within the ith establishment. In this table, the asterisk superscript on the index of inconsistency denotes that it is based upon interannual differences in Component 2 data rather

Image
FIGURE 5-2 Quintiles of Component 2 average ARD when comparing 2017 and 2018 for number of employees, by administration mode and establishment size.
SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. The ARD is a single statistic reflecting the difference between 2017 and 2018.
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-3 Quintiles for Component 2 average ARD when comparing 2017 and 2018 for hours worked per employee, by administration mode and establishment size.
SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. The ARD is a single statistic reflecting the difference between 2017 and 2018.

than differences between Component 1 and 2 data for the same year. As in Table 5-5, the unweighted version of the index is uniformly smaller than the weighted version as the result of the larger establishments, which tend to exhibit the greatest inconsistencies, being weighted up in the equation for Ī2. For all strata, Ī2 is in the poor range. However, as previously noted, because of the lack of parallelism between y2017ik or y2018ik, 1 – Ī2* cannot be interpreted as an estimate of Component 2 data reliability. However, values of Ī2 as extreme as those shown in Table 5-8 coupled with the results of Table 5-5 suggest that poor reliability is indeed an important issue for number of employees reported in Component 2 data. The results for average hours worked are similar (not shown).

FILTERING DATA TO IMPROVE USABILITY

The analyses presented in the previous three sections demonstrate that Component 2 data are subject to problematic non-sampling errors from several sources, including missing data, response inconsistency, and poor response reliability. This section will demonstrate how data quality and

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-8 Component 2 Average Indexes of Inconsistency When Comparing 2017 and 2018 for Number of Employees per SROP, by Administration Mode and Establishment Size

Ī1* Ī2*
Administration Mode
Online-Entry 0.11 0.58
Data-Upload 0.14 0.33
Size Distribution
Fewer than 100 0.15 1.00
100–249 0.04 1.00
250–499 0.03 0.95
500–999 0.03 0.94
1,000 or More 0.05 0.55
Overall 0.13 0.58

SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values.

usability can be substantially improved by filtering out the approximately 0.5 percent of the establishments that contribute the most to poor measurement quality. Ideally, data issues would be resolved/corrected rather than filtered, and some types of errors seemed potentially identifiable and fixable. For example, some responses appeared to report the number of hours worked as the number of employees, and some responses appeared to include misplaced decimal points. Filtering can be performed relatively quickly and with relatively few resources, which makes it advantageous, although the best solution depends on the planned uses of the data. For developing general models or for enforcement activities, a small amount of filtering may present little risk. On the other hand, for developing population estimates or considering enforcement activities for establishments with problematic data, filtering poses greater risks.

Determining an Appropriate Filter

Figure 5-4 plots the number of employees for linked establishments reporting in 2017 for both the Component 1 and 2 data collections (i.e., y 1 i + ( E ) plotted against y 2 i + ( E ) where i+ indicates linked establishments and the straight

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-4 Comparison of 2017 establishment sizes in Components 1 and 2 data prior to filtering to remove outliers.
SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with 0 employees, and SROP cells with missing values.

line represents y 1 i + ( E ) = y 2 i + ( E ) . The graph for 2018 is essentially the same (not shown). In Figure 5-4, establishments with zero employees in either component were deleted, so the small numbers in the graph are all non-zero counts. This figure reveals a major source of inconsistency and unreliability. Note that, although a small percentage of all establishments, a substantial (in absolute terms) number of small establishments reported considerably more employees in Component 2 data than in Component 1 data. In addition, the maximum number of employees reported by any establishment for Component 1 data was about 60,000, while in Component 2 data a substantial number of establishments reported a much higher employee count—some reporting more than 100,000 employees.

External data (see Chapter 4) confirm that the maximum establishment size was indeed around 60,000, which is also consistent with Component 1 data. Therefore, establishments in Component 2 data reporting 60,000 or more employees were misreported for reasons unknown. In addition,

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

establishments that reported several times more employees in Component 2 data than they reported in Component 1 data were suspected to also be erroneous, given that both data collections reflected approximately the same reporting periods. Although it may be possible for establishments with a relatively small number of employees (i.e., less than 100) to double or triple in size within a few months, that outcome is less plausible for much larger establishments (i.e., those showing increases of at least 400 employees). Therefore, in the absence of additional data to correct the obvious reporting errors in Figure 5-4, erroneous data were removed from the dataset.

This section considers a simple approach for improving overall Component 2 data quality while increasing usability. The approach involved removing (or filtering out) establishments with implausible relative sizes in Component 2 to Component 1 data. The filters considered satisfied several properties deemed desirable for preserving data integrity. First, after filtering Component 2 data, there should be good agreement between data from Component 1 and 2 for the distribution of number of establishments and employees across administration mode and size strata. Second, the percentage of establishments deleted by the filtering process should be quite small (e.g., no more than 1% of all establishments and preferably much less). Third, the quality indicators established in the previous sections of this chapter should show dramatic improvement after filtering, particularly those reflecting response consistency. A wide range of data filters were considered that satisfy these properties. Ultimately, the following filter was selected:

  1. Remove establishments for which y 2 i + ( E ) ≥ 60000 (i.e., number of employees in Component 2 data is 60,000 or more); or
  2. For establishments in Component 2 data that can be linked to Component 1 data, remove establishments satisfying both of the following:
    1. | y 1 i ( E ) y 2 i ( E ) ≥ 400 (i.e., the difference in establishment reports is at least 400 employees); and
    2. K 21 i ( E ) = y 2 i ( E ) / y 1 i ( E ) ≥ 9 (i.e., the ratio of number of employees in Component 2 to Component 1 data is 9 or more).

Table 5-9 compares the distributions of total number of establishments and number of employees by size (based upon establishment size in Component 1 data) for Component 1 data, Component 2 data (with no filtering), and Component 2 data (after filtering) for 2017. Results for 2018 are very similar (not shown). Note that the number of establishments by size changed very little after filtering. However, agreement between components

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

for number of employees was much improved after filtering. Table 5-9 suggests that the proposed filter achieves the first desirable property noted above while deleting only about 0.5 percent of all establishments, which satisfies the second desirable property.

Figure 5-5 shows that the proposed filter eliminated most of the large, discrepant establishments apparent in Figure 5-4. This figure also suggests that the proposed filter satisfies the third desirable property of improved quality indicators. This is examined in more detail in the next section.

Refinements to this filter are conceivable and no claim is made that it is optimal. Nevertheless, filtering results indicate that data quality can be dramatically improved with minimal editing of Component 2 data.

TABLE 5-9 Comparison of 2017 Component 1 to Component 2 Distributions for Number of Establishments and Total Number of Employees Before and After Filtering to Remove Outliers, by Size

Strata No Filtering After Filtering
Component 1 Component 2 Component 1 Component 2
Number of Establishments
Total 797,618 897,683 796,360 896,425
Fewer than 100 650,441 758,098 649,429 757,086
100–249 100,847 94,759 100,665 94,577
250–499 29,527 28,281 29,476 28,230
500–999 10,604 10,232 10,595 10,223
1,000 or more 6,199 6,313 6,195 6,309
Number of Employees
Total 61,642,747 112,210,602 61,557,330 72,226,054
Fewer than 100 15,461,354 32,328,123 15,430,845 16,381,112
100–249 15,355,737 28,438,560 15,328,302 14,387,568
250–499 10,068,697 14,504,817 10,051,756 9,449,091
500–999 7,239,061 10,373,946 7,232,925 6,776,986
1,000 or more 13,517,898 26,565,156 13,513,502 25,231,297

SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000; or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-5 Comparison of 2017 Component 1 and 2 establishment sizes after filtering to remove outliers.
SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with 0 employees, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000, or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.

Quality Indicators After Filtering

This section reexamines some of the quality indicators discussed previously, to assess the improvement in data quality after filtering. Because missing data are not affected by the filtering process, Tables 5-3 and 5-6 remain essentially the same. As evident from Figure 5-5, many extreme values of number of employees (relative to Component 1 data) were eliminated by the filter, which eliminated about 0.5 percent of establishments. The RD, ARD, and Ī indicators computed from the merged Component 1 to Component 2 data reflect a dramatic improvement in filtered data.

Table 5-10 presents RDs (relative to Component 1 data) by administration mode and size for both years of Component 2 data, expressed as percentages. Comparing these results to those in Table 5-4 reveals that the magnitudes of the RD between Components 1 and 2 data were substantially reduced after filtering. Overall, the mean RD changed from –0.66 and –1.62

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-10 Mean RD When Comparing Components 1 and 2 for Number of Employees in an SRO Cell Before and After Filtering to Remove Outliers, by Administration Mode, Size, and Year

Firm Characteristica Before Filtering After Filtering
2017 RD (%) 2018 RD (%) 2017 RD (%) 2018 RD (%)
Administration Mode
Online-Entry –1.89 –2.64 –0.81 –1.55
Data-Upload –0.30 –1.30 –0.09 –1.08
Size Distribution
Fewer than 100 0.68 –0.34 0.68 –0.34
100–249 –2.91 –3.74 –2.91 –3.74
250–499 –6.52 –6.84 –6.06 –6.34
500–999 –11.18 –11.66 –5.81 –6.18
1,000 or More –34.43 –37.58 –6.05 –7.07
Overall –0.66 –1.62 –0.25 –1.19

SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000, or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.

a Size strata based upon establishment sizes in Component 1 data.

percent to –0.25 and –1.19 percent for 2017 and 2018, respectively. However, for the largest size stratum, mean RD was improved from about 34–38 percent to about 6–7 percent for both years—a considerable improvement more consistent with true change in employment.

Table 5-11 reports the indicators Ī1 and Ī2 computed for Component 1 to Component 2 matched data in Table 5-11 after filtering. There is substantial improvement in response consistency, particularly for Ī2, the weighted index. For example, for both years, Ī2 was reduced from about 1 to about 0.05 overall. For establishments with 250 or more employees, both average indexes almost always improved, some considerably. Unfortunately, there is still evidence of unreliability for establishments of fewer than 100 employees, with Ī2 values in the high “moderate” range, approaching “poor.” However, other size categories have Ī2 values in the “good” range. Of course, it is in the smallest establishments where rapid relative change is most likely.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-11 Average Intercomponent Indexes of Inconsistency Before and After Filtering, by Administration Mode, Size, and Year

Firm Characteristica Before Filtering After Filtering
2017 2018 2017 2018
Ī1 Ī2 Ī1 Ī2 Ī1 Ī2 Ī1 Ī2
Administration Mode
Online-Entry 0.14 1.00 0.12 1.00 0.14 0.12 0.12 0.10
Data-Upload 0.13 0.87 0.11 0.95 0.13 0.05 0.11 0.04
Size Distribution
Fewer than 100 0.15 0.47 0.12 0.41 0.15 0.47 0.12 0.41
100–249 0.09 0.19 0.08 0.15 0.09 0.19 0.08 0.15
250–499 0.08 0.10 0.08 0.09 0.08 0.09 0.07 0.08
500–999 0.09 0.10 0.09 0.08 0.06 0.08 0.06 0.06
1,000 or more 0.19 1.00 0.20 0.99 0.05 0.04 0.05 0.04
Overall 0.14 1.00 0.11 0.99 0.13 0.06 0.11 0.05

SOURCE: Panel generated from Components 1 and 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000; or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.

aSize strata based upon establishment sizes in Component 1 data.

With regard to comparisons between 2017 and 2018 Component 2 data, filtering improved many quality indicators related to number of employees. However, as previously noted, interannual comparisons of Component 2 data are confounded by true change, so moderate RD in establishment sizes and hours worked by SROP are expected and should not necessarily be attributed to measurement error. For example, Figures 5-6 and 5-7 show the relationship of establishment size for 2017 compared to 2018 before and after filtering, respectively.

As in comparisons with unfiltered Component 1 data, a substantial number of establishments with relatively small sizes in one year have implausibly large sizes in the other year. After removing establishments that either (1) showed employee counts of 60,000 or more or (2) had establishments with Component 2 to Component 1 size ratios of nine or more and show differences of least 400 employees, agreement was much improved, as is evident in Figure 5-7.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-6 Comparison of establishment sizes in 2017 to 2018 Component 2 data prior to filtering to remove outliers.
SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with 0 employees, and SROP cells with missing values.

This improvement is quantified in Table 5-12. Particularly for Ī2*, consistency between 2017 and 2018 data was dramatically improved after applying the filter to both datasets.

Table 5-13 presents RD, Ī1*, and Ī2* indicators for hours worked per employee when comparing the two years of Component 2 data. Results suggest only small changes in the indicators before and after applying the number-of-employees filter. High inconsistency is still present in the dataset, presented visually in Figure 5-8. The right side plots hours worked per employee per SROP for 2017 against 2018 before filtering, and the left side is the same plot after filtering. There are no discernable differences.

To understand why, note that the number-of-employees filter is designed to eliminate large discrepancies between Components 1 and 2 data for number of employees, not hours worked or hours/employee. Since hours worked are not collected by the Component 1 instrument, there is no possibility of replicating the filter for that variable. Nevertheless, it is

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-7 Comparison of establishment sizes in 2017 to 2018 Component 2 data after filtering to remove outliers.
SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with no employees, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000 or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.

possible to construct a filter that removes implausible interannual, RD for hours/employee from Component 2 data. Of course, unlike the filter based on comparing Component 1 data with Component 2 data, imposing an interannual filter risks removing establishments with seemingly implausible yet real RD in hours worked/employee. Nevertheless, improvements in the consistency and reliability of Component 2 data for this variable would likely be similar to that achieved for number-of-employee filtering. No attempt was made to develop this type of filter, so the inconsistencies shown in Table 5-13 remain in the 2017 and 2018 Component 2 data. However, see Chapter 6 for additional edits that may have been applied to these data for that chapter’s analyses.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-12 Component 2 Average Relative Differences and Indexes of Inconsistency When Comparing 2017 and 2018 for Number of Employees by Administration Mode and Size Before and After Filtering to Remove Outliers

Firm Characteristic Before Filtering After Filtering
RD Ī1* Ī2* RD Ī1* Ī2*
Administration Mode
Online-Entry –1.79 0.11 0.58 –1.69 0.11 0.11
Data-Upload –0.49 0.14 0.33 –0.49 0.14 0.06
Size Distribution
Fewer than 100 –1.12 0.15 1.00 –1.03 0.15 0.12
100–249 0.32 0.04 1.00 0.46 0.04 0.07
250–499 1.57 0.03 0.95 1.50 0.03 0.06
500–999 1.85 0.03 0.94 1.02 0.03 0.05
1,000 or More 9.18 0.05 0.55 0.93 0.02 0.07
Overall –0.75 0.13 0.58 –0.73 0.13 0.07

SOURCE: Panel generated from Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000 or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.

SUMMARY

This chapter sought to determine whether important measurement quality issues exist for the current Component 2 data collection, which EEOC is advised to address before the next round of data collection. Analyses presented in this chapter clearly identified important quality issues for both years of Component 2 data, including missing data, response inconsistencies, implausible extreme values, and measurement unreliability. Filtering the data on number of employees addressed some but not all issues, and additional filtering based upon hours worked would be beneficial. Some of the issues identified for 2017–2018 Component 2 data are likely to persist in future years unless they are addressed by a redesign of the data-collection methodology, as described in Chapter 8.

Before filtering the data, the analysis of internal inconsistencies and extreme values suggested that data missingness (more specifically, inconsistent zeros in SROP cells) is an important issue for the data collection

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

TABLE 5-13 Component 2 Average Relative Differences and Indexes of Inconsistency When Comparing 2017 and 2018 Data for Hours Worked per Employee Before and After Filtering to Remove Outliers, by Administration Mode and Size

Firm Characteristic Before Filtering After Filtering
RD(H/E) Ī1* Ī2* RD(H/E) Ī1* Ī2*
Administration Mode
Online-Entry –2.20 0.21 0.99 –2.28 0.21 0.99
Data-Upload –3.19 0.24 1.00 –3.19 0.24 1.00
Size Distribution
Fewer than 100 –3.32 0.26 1.00 –3.35 0.26 1.00
100–249 –1.64 0.11 0.99 –1.65 0.11 0.99
250–499 –0.83 0.10 1.00 –0.84 0.10 1.00
500–999 –1.24 0.09 0.99 –1.16 0.09 0.99
1,000 or More –4.44 0.09 1.00 –1.40 0.07 1.00
Overall –3.08 0.23 1.00 –3.09 0.23 1.00

SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000 or (2) data that are simultaneously at least nine times the values reported in Component 1 and show differences of at least 400 employees.

because it essentially renders about 10 percent of the number of employees-by-hours-worked SROP pairs unusable. Missing hours worked appeared to be a much bigger problem than missing number of employees for both data-collection years. In addition, missingness appeared to be a problem mainly for uploaded submissions because the percentage of missing data for online-entry submissions was trivial by comparison.

Analyses performed on the Component 1 to Component 2 linked establishment files revealed large discrepancies in the number of employees between the two components. For example, more than 15 percent of total number-of-employee SRO cells that had one non-zero entry had a zero entry in one component and a non-zero entry in the other component. Online responses and smaller establishments exhibited the greatest inconsistency. Although the non-zero counts may be small in many cases, the inconsistencies still suggest important instrument usability issues, particularly for the online instrument. In addition, there were very large discrepancies in the

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Image
FIGURE 5-8 Comparison of 2017 to 2018 Component 2 total employee hours worked before and after filtering to remove outliers.
SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018.
NOTE: Excludes firms reporting more than 1.4 million employees, establishments covered in Type 6 reports, establishments with 0 employees, and SROP cells with missing values. Filtered data also exclude Component 2 data that show either: (1) employee counts above 60,000 or (2) data that are simultaneously at least nine times the values reported in Component 1 data and show differences of at least 400 employees.
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

numbers of employees for a substantial percentage of the establishment universe. For example, for establishments with 500–999 employees, the average relative deviation is about –12 percent, indicating that Component 2 employee counts were, on average, 12 percent greater than those of Component 1. For establishments with 1,000 or more employees, the average relative deviation is even greater: about –36 percent. Moreover, about 20 percent of SROs in this size stratum differed in numbers of employees by more than 100 percent. Because the two components were collected for essentially the same reporting period, these inconsistencies and discrepancies must be attributed to measurement errors in one or both components. Evaluation of measurement reliability (via the index of inconsistency) also found extremely large error variances, particularly within the smallest and largest establishments. Plotting the establishment sizes from Component 1 and 2 data against one another revealed that less than one percent of all establishments in Component 2 data reported highly implausible numbers. For example, a few Component 2 establishments reported sizes of more than 60,000 employees, which exceeded the maximum size of establishments in the Component 1 universe. There were also many smaller, yet still implausible, discrepancies that increased measurement error variance; however, available data were insufficient to determine which component was the primary source of the errors.

Analysis of merged 2017 and 2018 Component 2 data yielded somewhat ambiguous results because of the difficulty distinguishing between error and true change, given the approximate one-year gap between reporting periods. Nevertheless, analysis showed somewhat greater consistency between the two years than might be inferred from the merged Component 1 and 2 data. Reports for numbers of employees tended to be more consistent than reports for hours worked per employee. Interestingly, 2017 data tended to report greater numbers of employees than 2018 data, while 2018 data seemed to report greater hours worked than 2017 data. As was the case for the intercomponent indexes of inconsistency, the interannual indexes also suggested poor reliability for the number-of-employees data. In addition, the interannual indexes for hours worked per employee also implied poor reliability for this variable in some strata.

Based upon these results, data filtering was applied to remove a relatively small number of establishments that provided implausible numbers of employees in Component 2 data compared to Component 1 data. Filtering eliminated establishments with 60,000 or more employees, as well as establishments in which the difference between Component 1 and Component 2 data was at least 400 employees and the Component 2 to Component 1 ratios for number of employees exceeded nine. In total, a mere 0.5 percent of all establishments were removed by this process. However, the consistency and reliability of Component 2 data for both years substantially

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

improved for number of employees at both SRO and SROP levels. Missing data, however, were unaffected by this filtering process and thus remain a concern for filtered data (including the analyses presented in Chapters 6, 7, and 8 of this report).

In addition, removing establishments with implausible values for numbers of employees did not fix the problem of implausible values for hours worked per employee. For this variable, consistency and reliability are still poor for some strata. Scatter plots of the 2017 to 2018 hours-worked data per employee data suggest that, as for the number-of-employees variable, removing a small percentage of establishments with highly implausible values of hours worked per employee could markedly improve reliability and consistency. An hours-worked-per-employee filter was not developed; however, the filtering process would be very similar to that implemented for numbers of employees. In addition, both filters could be fine-tuned and optimized to achieve acceptable reliability levels, while minimizing the number of establishments eliminated.

While reliability was not directly evaluated for pay data, results for both numbers of employees and hours worked have important implications for assigning hourly pay rates to employees. Erroneous employee counts and hours worked for an SROP cell result in erroneous estimates of hourly pay calculated from these data. Thus, the reliability of hourly pay will be poor if the reliability of number of employees or hours worked per employee is poor. Thus, data filtering of the latter two variables will also improve the reliability of hourly pay data.

Finally, a few limitations of this chapter’s analyses should be noted. First, as shown in Appendix 5-3, not all establishments could be linked between Component 1 and Component 2 data, so that analyses using linked data pertain to a subset (65–75%) of the Component 2 universe. Second, the Component 1 and Component 2 data collections may not generate parallel measures because they differ methodologically. To that extent, reliability calculations using these data reflect average reliability of both Component 1 and Component 2 data, not simply Component 2 data reliability. Finally, because there are no “gold standard” measurements upon which to base direct measures of error, the analyses of measurement error use indirect methods. Thus, inferences regarding the measurement quality of the Component 2 data collection are subject to the limitations inherent in indirect methods.

CONCLUSION AND RECOMMENDATIONS

The chapter’s key conclusion and recommendations based on its findings appears below.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

CONCLUSION 5-1: Important data-quality issues exist in the 2017–2018 Component 2 data, including missing data, response inconsistencies, implausible extreme values, and measurement unreliability. These errors are large and, if not addressed, could generate misleading results. Filtering the data on number of employees by removing a small amount of data can address some, but not all, issues.

RECOMMENDATION 5-1: Before 2017–2018 Component 2 data are used to assist initial investigations of charges, for employer self-assessment, or for research on pay differences more generally, the data should be carefully reviewed and cleaned. Filtering on employee counts and on hours worked would be beneficial, but some issues would be best addressed by modifying the basic data-collection methodology.

RECOMMENDATION 5-2: Before future collection of Component 2 data, EEOC should conduct a field test to investigate issues of burden, data availability, and instrument design. The field test should examine the sources of errors in the hours-worked and employee count data, and should assess the functioning of new survey questions. Solutions to be tested may include placing employee-count and hours-worked data side-by-side, as in the data-upload mode. Cognitive interviews may inform EEOC of employers’ interpretations of survey questions, difficulties faced in answering, and strategies used to obtain the reported data.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

CHAPTER APPENDIXES

APPENDIX 5-1

Percent of Data Present for Hours Worked and Employment in SROP Cells

Firm, Establishment, and Employee Characteristic 2017 2018
>0 Employees; >0 Hours Worked >0 Employees; 0 or Missing Hours Worked 0 or Missing Employees; >0 Hours Worked >0 Employees; >0 Hours Worked >0 Employees; 0 or Missing Hours Worked 0 or Missing Employees; >0 Hours Worked
Administration Mode
Online-Entry 74.0 26.5 # 73.8 26.5 #
Data-Upload 26.0 73.5 100.0 26.2 73.5 100.0
Establishment Size
Less than 100 45.8 41.0 40.4 45.8 40.6 41.2
100–249 27.1 31.8 36.5 27.0 32.2 36.0
250–499 13.3 13.9 13.6 13.4 13.4 13.4
500–999 6.9 6.2 5.0 7.0 6.5 5.1
1,000 or More 6.9 7.1 4.1 6.9 7.3 3.8
Missing or Invalid 0.0 0.0 0.4 0.0 0.0 0.4
Job Category
Executive 2.6 6.9 8.0 2.6 6.7 7.9
First-/Mid level 15.9 15.2 17.6 15.7 15.2 17.4
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Firm, Establishment, and Employee Characteristic 2017 2018
>0 Employees; >0 Hours Worked >0 Employees; 0 or Missing Hours Worked 0 or Missing Employees; >0 Hours Worked >0 Employees; >0 Hours Worked >0 Employees; 0 or Missing Hours Worked 0 or Missing Employees; >0 Hours Worked
Professionals 17.5 16.9 16.5 17.7 16.9 17.1
Technicians 6.8 6.3 7.0 6.7 6.7 7.2
Sales Workers 13.1 9.3 8.2 12.9 9.1 8.2
Administrative Support 14.7 15.0 15.2 14.6 15.3 15.3
Craft Workers 4.9 5.3 5.8 4.9 5.4 5.7
Operatives 7.7 7.6 6.9 7.7 7.5 6.6
Laborers and Helpers 5.4 7.0 6.7 5.4 6.8 6.7
Service Workers 11.5 10.5 8.1 11.7 10.4 7.9
Establishment Quality Flag
Red 0.3 0.6 2.2 0.2 0.5 2.2
Orange 5.1 77.1 56.0 4.7 75.8 51.9
Green 94.5 22.4 41.8 95.1 23.7 45.9
Overall 100.0 100.0 100.0 100.0 100.0 100.0

SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: # Indicates rounds to zero.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

APPENDIX 5-2

Percentage of Firms, Establishments, and Cells with Each Flag Status (2018)

Firm, Establishment, and Employee Characteristic Firms Establishments Within Firms Cells Within Establishments
Red Orange Green Red Orange Green Red Orange Yellow Green
Administration Mode
Online-Entry 23.2 26.6 40.6 68.6 34.4 80.9 58.0 71.9 26.5 73.1
Data-Upload 76.8 73.4 59.4 31.4 65.6 19.1 42.0 28.1 73.5 26.9
Establishment Size
Less than 100 0.0 # 0.1 77.2 75.0 84.9 39.8 23.8 40.6 45.8
100–249 50.0 36.0 55.3 15.6 15.8 10.2 28.5 20.1 32.2 27.1
250–499 23.9 22.3 22.7 4.9 4.7 3.1 15.1 17.3 13.4 13.4
500–999 14.6 15.1 11.0 1.2 2.0 1.1 7.0 10.9 6.5 6.9
1,000 or More 11.4 26.6 10.9 1.2 2.5 0.7 9.6 27.9 7.3 6.9
Missing or Invalid 0.0 0.0 0.0 0.0 0.0 # 0.0 0.0 0.0 #
Job Category
Executive 2.7 2.6 6.7 2.7
First/Midlevel 13.4 5.7 15.2 15.8
Professionals 17.8 16.6 16.9 17.7
Technicians 6.3 22.3 6.7 6.7
Sales Workers 7.6 11.9 9.1 12.9
Administrative Support 9.9 13.4 15.3 14.6
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Firm, Establishment, and Employee Characteristic Firms Establishments Within Firms Cells Within Establishments
Red Orange Green Red Orange Green Red Orange Yellow Green
Craft Workers 6.4 11.4 5.4 4.9
Operatives 13.3 1.5 7.5 7.7
Laborers and Helpers 6.3 6.2 6.8 5.4
Service Workers 16.3 8.4 10.4 11.7
Establishment Quality
Red 100.0 0.0 0.0 57.2 4.3 0.5 0.1
Orange 0.0 100.0 0.0 42.8 46.6 75.8 5.1
Green 0.0 0.0 100.0 0.0 49.0 23.7 94.8
Overall 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0

SOURCE: Panel generated Component 2 employer, establishment, and employee files for 2017 and 2018.

NOTE: # Indicates rounds to zero.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

APPENDIX 5-3

Technical Memorandum Describing Merging Datasets for Data Quality Assessment

To: Jennifer Park
From: Dan Liao
CC: Bradford Chaney, Sahar Zangeneh
Date: December 6, 2021
Subject: NASEM-EEOC Panel Study Technical
Memo: Merging Datasets for Data Quality Assessment
Attachments: SAS program files for the data merging are provided separately.

1. Introduction

This memorandum documents the technical details on how the datasets were merged for the data quality assessment of 11 the Component 2 data. There are four distinct data merging activities:

  1. merging the Component 2 data files across 2017 and 2018;
  2. merging the Component 1 data file with the Component 2 data file for 2017;
  3. merging the Component 1 data file with the Component 2 data file for 2018; and
  4. merging the combined Component 1&2 data files across 2017 and 2018.

In each procedure, the datasets are merged at the establishment level.

The following sections will present the decision rules on 1) how the datasets are merged and 2) the match rates on how successful the two datasets can be matched based on the ID variables and additional information (e.g., addresses, NAICS code, and zip code). Table 1 summarizes these

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

two aspects for the four different data merging analyses. The decision rules were developed based on RTI’s consultation with the NORC’s data collection team and were further refined with additional suggestions provided by the panel.

Table 1. Data Merging Variables and Match Rates for Data Merging across Component 1 and 2 Data Files, 2017 and 2018

Merge type Rate of Matcha Data Merging Variables
1) Merging the Component 2 data files across 2017 and 2018 79.7% (Denominator 1)

76.5% (Denominator 2)Numerator: the number of establishments that can be matched across the two years

Denominator 1: the number of establishments that appeared in the 2017 Component 2 data

Denominator 2: the number of establishments that appeared in the 2018 Component 2 data
EIN_FIRM, UNITADDRESS, UNITAD-DRESS2, ZIPCODE and NAICS code (all six digits)
2) Merging the Component 1 and Component 2 data files for 2017 66.5%

Numerator: the number of establishments that can be matched across the two components

Denominator: the number of establishments that appeared in the 2017 Component 2 data
HDQ_NBR, UNIT_ NBR, Standardized Address, Standardized Zip code, NAICS Code (all six digits)
3) Merging the Component 1 and Component 2 data files for 2018 69.5%

Numerator: the number of establishments that can be matched across the two components

Denominator: the number of establishments that appeared in the 2018 Component 2 data
HDQ_NBR, UNIT_ NBR, Standardized Address, Standardized Zip code, NAICS Code (all six digits)
4) Merging the combined Component 1 & 2 data files for 2017 and 2018 58.7% (Denominator 1)

56.4% (Denominator 2)

Numerator: the number of establishments that can be matched across the two years

Denominator 1: the number of establishments that appeared in the 2017 Component 2 data

Denominator 2: the number of establishments that appeared in the 2018 Component 2 data
HDQ_NBR, UNIT_NBR in the Component 1 Data

a The Type 6 reports and establishments in the firms that failed the Walmart rule (i.e., with a size larger than Walmart) are excluded in the calculation of the match rate.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

2. Merging the Component 2 Data Files across 2017 and 2018

Merging Algorithm

Because the Component 2 Data were simultaneously collected from the same system for both years of 2017 and 2018, we can achieve a relatively high match rate by just using the ID and address information on the data files. The procedure of merging the 2017 and 2018 Component 2 data files takes the following steps subsequently:

  1. Merge based on unique matches of EIN_FIRM+UNITADDRESS+ UNITADDRESS2: if an establishment has a unique one-to-one match based on the combination of EIN_FIRM, UNITADDRESS, and UNITADDRESS2 across the two years, then its data across the two years can be merged.
  2. Merge based on unique matches of EIN_FIRM+ZIPCODE+NAICS: for establishments that cannot be matched in Step 1) in both years, unique matches were searched based on the combination of EIN_ FIRM, ZIPCODE and NAICS (all six digits) across the two years. Establishments with unique matches across the two years then can be merged.

Match Rate

Match rate among establishments, after excluding outliers (based on the Walmart rule) and the Type 6 reports

There are 897,770 establishments appeared in the 2017 Component 2 data file and 935,610 establishments appeared in 2018 after excluding 1) the establishments in the firms that failed the Walmart rule (i.e., with a size larger than Walmart) and 2) the small establishments who submitted Type 6 reports.

  • 881,938 (98.2% out of the 897,770 establishments in 2017 and 94.3% out of the 935,610 establishments) can be matched across the two years at the firm level based on EIN_FIRM.
  • 715,874 (81.2%) out of these 881,938 establishments can be matched at the establishment level based on UNITADDRESS, UNITADDRESS2, ZIPCODE and NAICS codes.

The overall match rate for 2017 is 79.7 percent, which equals to 715,874 divided by 897,770.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

The overall match rate for 2018 is 76.5 percent, which equals to 715,874 divided by 935,610.

3. Merging the Component 1 and the Component 2 Data Files within Each Year of 2017 and 2018

Merging Algorithm

The addresses for the same establishments can differ across the two components. To improve the rate of match, RTI’s GIS team first used a geocoding software to standardize the address information in both components in both years. Then, the procedure to merge the Component 1 and Component 2 data files was carried out for each year that takes the following steps subsequently:

  1. Merge based on unique matches of HDQ_NBR and UNIT_NBR: if an establishment has a unique one-to-one match based on the combination of HDQ_NBR and UNIT_NBR across the two components, then its data across the two components can be merged.
  2. Merge based on unique matches of HDQ_NBR and the first six character of UNIT_NBR: it was found that UNIT_NBR can have different length in the data files and thus UNIT_NBR are truncated to its first six characters for merging.
  3. Merge based on unique matches of HDQ_NBR and padded first six characters of UNIT_NBR with leading zeros: the UNIT_NBR in Component 2 are padded with zeros so that they will be more aligned with the UNIT_NBR in Component 1.
  4. Merge based on unique matches of HDQ_NBR and the full standardized address: for establishments that failed to be matched in Steps 1–3 (based on UNIT_NBR), then they were matched based on the full standardized addresses generated from the geocoding software.
  5. Merge based on unique matches of HDQ_NBR+ZIPCODE+NAICS: for establishments that failed to be matched in the steps above, they were matched based on the zip code from the standardized addresses and the NAICS code (all six digits).
  6. Merge based on unique matches of HDQ_NBR+First Part of Address+NAICS: at last, establishments were matched based on their first part of addresses (i.e., street number and address, but drop city, state, and zipcode) plus the NAICS code (all six digits).
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

Match Rate

Match rate in 2017 among establishments, after excluding large outliers (based on the Walmart rule) and the Type 6 reports2

There are 897,770 establishments in the 2017 Component 2 Data (the same total in Table 3-6, Chapter 3), after excluding the establishments in the firms that failed the Walmart rule (i.e., with a size larger than Walmart) and the small establishments who submitted Type 6 reports.

  • 855,256 (95.3%) out of these 897,770 establishments can be matched at the firm level based on HDQ_NBR.
  • 597,361 (69.8%) out of these 855,256 establishments can be matched at the establishment level based on UNIT_NBR, addresses, zip code and NAICS code.

The overall match rate is 66.5 percent, which equals to 597,361 divided by 897,770.

Match rate in 2018 among establishments, after excluding large outliers (based on the Walmart rule) and the Type 6 Reports

There are 935,610 establishments in the 2018 Component 2 Data (the same total in Table 3-6, Chapter 3), after excluding the establishments in the firms that failed the Walmart rule (i.e., with a size larger than Walmart) and the small establishments who submitted Type 6 reports.

  • 906,565 (96.9%) out of these 935,610 establishments can be matched at the firm level based on HDQ_NBR.
  • 650,528 (71.76%) out of these 906,565 establishments can be matched at the establishment level based on UNIT_NBR, addresses, zip code and NAICS code.

The overall match rate is 69.5 percent, which equals to 650,528 divided by 935,610.

4. Merging the Combined Component 1 and 2 Data Files across 2017 and 2018

___________________

2 The match rate will be higher if the Type 6 reports are included.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

Merging Algorithm

NORC suggested that the Component 1 data across years can be merged based on the HDQ_NBR and UNIT_NBR in the Component 1 data, because the quality of these two variables in the Component 1 data are well maintained across years by EEOC. Therefore, we used this approach to merge the combined Component 1 and 2 data files across 2017 and 2018.

Match Rate

Match rate among establishments, after excluding large outliers (based on the Walmart rule) and the Type 6 reports

Among the 597,361 establishments that can be matched across the two components in 2017, 527,379 (88.3%) of them can be matched with establishments that can be matched across the two components in 2018. These 527,379 establishments account for 58.7 percent of the total 897,770 establishments in the 2017 Component 2 Data and 56.4 percent of the total 935,610 establishments in the 2018 Component 2 data.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

APPENDIX 5-4

Inconsistent Zeros for Number of Employees Comparing Component 2 Data for 2017 and 2018 at the SRO Level

Firm and Establishment Characteristic Number Percent
>0 Employees; 2017 Only >0 Employees; 2018 Only >0 Employees; Both Years >0 Employees; 2017 Only >0 Employees; 2018 Only >0 Employees; Both Years
Administration Mode
Online-Entry 243,256 277,659 1,482,817 12.1 13.9 74.0
Data-Upload 849,513 896,738 4,498,270 13.6 14.4 72.0
Establishment Size
Fewer than 100 729,700 789,222 3,290,336 15.2 16.4 68.4
100–249 231,687 245,237 1,543,519 11.5 12.1 76.4
250–499 85,843 89,616 658,356 10.3 10.7 79.0
500–999 29,070 31,948 276,911 8.6 9.5 81.9
1,000 or More 16,469 18,302 211,965 6.7 7.4 85.9
Establishment Quality
Red 3,401 4,052 18,288 13.2 15.7 71.0
Orange 71,705 77,498 361,755 14.0 15.2 70.8
Green 1,017,663 1,092,847 5,601,044 13.2 14.2 72.6
Overall 1,092,769 1,174,397 5,981,087 13.2 14.2 72.5

SOURCE: Panel generated from Component 2 employer and establishment files for 2017 and 2018.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×

This page intentionally left blank.

Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 159
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 160
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 161
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 162
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 163
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 164
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 165
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 166
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 167
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 168
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 169
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 170
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 171
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 172
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 173
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 174
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 175
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 176
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 177
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 178
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 179
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 180
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 181
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 182
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 183
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 184
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 185
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 186
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 187
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 188
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 189
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 190
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 191
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 192
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 193
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 194
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 195
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 196
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 197
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 198
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 199
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 200
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 201
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 202
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 203
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 204
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 205
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 206
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 207
Suggested Citation:"5 Measurement Quality." National Academies of Sciences, Engineering, and Medicine. 2023. Evaluation of Compensation Data Collected Through the EEO-1 Form. Washington, DC: The National Academies Press. doi: 10.17226/26581.
×
Page 208
Next: 6 Are Component 2 Pay Data Useful for Examining National Pay Differences? »
Evaluation of Compensation Data Collected Through the EEO-1 Form Get This Book
×
 Evaluation of Compensation Data Collected Through the EEO-1 Form
Buy Paperback | $50.00 Buy Ebook | $40.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The U.S. Equal Employment Opportunity Commission (EEOC) expanded EEO-1 data collection for reporting years 2017 to 2018 in an effort to improve its ability to investigate and address pay disparities between women and men and between different racial and ethnic groups. These pay disparities are well documented in national statistics. For example, the U.S. Census Bureau (2021) found that Black and Hispanic women earned only 63 percent and 55 percent as much, respectively, of what non-Hispanic White men earned.

Evaluation of Compensation Data Collected Through the EEO-1 Form examines the quality of pay data collected using the EEO-1 form and provides recommendations for future data collection efforts. The report finds that there is value in the expanded EEO-1 data, which are unique among federal surveys by providing employee pay, occupation, and demographic data at the employer level. Nonetheless, both short-term and longer-term improvements are recommended to address significant concerns in employer coverage, conceptual definitions, data measurement, and collection protocols. If implemented, these recommendations could improve the breadth and strength of EEOC data for addressing pay equity, potentially reduce employer burden, and better support employer self-assessment.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!