Read "Procedures and Guidelines for Validating Contractor Test Data" at NAP.edu

« Previous: Chapter 2 - Research Approach

Page 27

Suggested Citation:"Chapter 3 - Findings and Applications." National Academies of Sciences, Engineering, and Medicine. 2020. Procedures and Guidelines for Validating Contractor Test Data. Washington, DC: The National Academies Press. doi: 10.17226/25823.

Page 28

Page 29

Page 30

Page 31

Page 32

Page 33

Page 34

Page 35

Page 36

Page 37

Page 38

Page 39

Page 40

Page 41

Page 42

Page 43

Page 44

Page 45

Page 46

Page 47

Page 48

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

27 C H A P T E R 3 The procedures identified in the project were evaluated using numerical simulations to quantify risks and qualify acceptable procedures. The research approach described in Chapter 2 provided the basis for this evaluation, and SHA data were used to test the effectiveness of the validation procedures. 3.1 Numerical Simulation Findings The identified statistical tests (see Section 2.2) were evaluated using numerical simulations to quantify risks and qualify acceptable tests. Multiple distribution types and construction material AQCs were considered (see Tables 5 and 6). The results of the numerical simulations for normally distributed data sets are described in this chapter, and the results for the skewed and bimodal distributions are presented in Appendix C. 3.1.1 Normal Distribution Results For each AQC, four scenarios of distributions were examined using this iterative process (see Figure 8). These scenarios involved the following combinations of SHA distribution mean, Âµ1; standard deviation, s1; contractor distribution mean, Âµ2; and standard deviation, s2: â¢ Scenario 1: Âµ1 = Âµ2 and s1 = s2. â¢ Scenario 2: Âµ1 = Âµ2 and s1 â s2. â¢ Scenario 3: Âµ1 â Âµ2 and s1 = s2. â¢ Scenario 4: Âµ1 â Âµ2 and s1 â s2. Hypothesis Tests Figure 11 shows the numerical simulation results for a set of hypothesis tests under Scenario 1 for a SHA sample size of 7 and contractor sample size varying in increments of 7, from 7 (equal sample size) up to 70 (10 times SHA sample size). The success rate of the different tests is shown as a function of SHA sample CV (CV1). Because of the wide range of target means and standard deviations for the selected AQCs values, the CV was the most suitable parameter to compare the test results (Table 5 shows the AQCs and corresponding CV values). Under Scenario 1, all hypothesis tests performed (as expected) at a success rate of 95% or above (represented by the horizontal dotted line in Figure 11). Figure 12 shows similar results for Scenario 2 where the sample means were equal while the standard deviations were unequal (Âµ1 = Âµ2 and s1 â s2), as most of the hypothesis tests performed at the expected threshold of 95%. Figure 13 shows results for hypothesis tests under Scenario 3 where the sample means were unequal while the standard deviations were equal (Âµ1 â Âµ2 and s1 = s2). Under Scenario 3, the tests are expected to perform at a success rate of 5% or below (represented by the horizontal Findings and Applications

28 Procedures and Guidelines for Validating Contractor Test Data 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 t_test UV_t_test p_t_test ks_test U_test Figure 11. Results for hypothesis tests for Scenario 1 (l1 = l2 and r1 = r2). 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 t_test UV_t_test p_t_test ks_test U_test Figure 12. Results for hypothesis tests for Scenario 2 (l1 = l2 and r1 Ã± r2). 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 t_test UV_t_test p_t_test ks_test U_test Figure 13. Results for hypothesis tests for Scenario 3 (l1 Ã± l2 and r1 = r2).

Findings and Applications 29 dotted line in Figures 13 and 14). The hypothesis tests in this case did not perform at the expected threshold of 5%. However, the hypothesis tests performed better as the CV1 value got smaller. By comparison, the t-test performed best, followed by Welchâs t-test (unequal variance t-test) and Mann-Whitney test. Figure 14 shows similar results for Scenario 4 where the sample means and the standard deviations were unequal (Âµ1 â Âµ2 and s1 â s2). Variance Tests Figure 15 shows the numerical simulation results for a set of variance tests under Scenario 1 where the sample means and standard deviations were equal (Âµ1 = Âµ2 and s1 = s2) for a SHA sample size of 7 and contractor sample size varying in increments of 7, from 7 (equal sample size) up to 70 (10 times SHA sample size). The variance tests performed at the expected threshold of 95% (represented by the horizontal dotted line in Figure 15). Figure 16 shows similar results 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 t_test UV_t_test p_t_test ks_test U_test Figure 14. Results for hypothesis tests for Scenario 4 (l1 Ã± l2 and r1 Ã± r2). 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 f_test Ansari-Bradley Bartlett Levene Modified Levene Figure 15. Numerical simulations results for variance tests for Scenario 1 (l1 = l2 and r1 = r2).

30 Procedures and Guidelines for Validating Contractor Test Data for Scenario 3 where the sample means were unequal while the standard deviations were equal (Âµ1 â Âµ2 and s1 = s2), except for the Ansari-Bradley test that requires that the samples have equal medians showed a lower success rate. Figure 17 shows a similar set of results for variance tests under Scenario 2 where the sample means were equal while the standard deviations were unequal (Âµ1 = Âµ2 and s1 â s2). Under Scenario 2, the tests are expected to perform at a success rate of 5% or below (represented by the horizontal dotted line in Figures 17 and 18). All variance tests in this case performed at the expected threshold of 5%. However, by comparison, the F-test had the best performance, fol- lowed by the Ansari-Bradley test, Leveneâs test, and Bartlettâs test. Figure 18 shows similar results for Scenario 4 where the sample means and the standard deviations were unequal (Âµ1 â Âµ2 and s1 â s2). In this scenario, the Ansari-Bradley test performance was inconsistent since it requires that the samples have equal medians. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 f_test Ansari-Bradley Bartlett Levene Modified Levene Figure 16. Numerical simulations results for variance tests for Scenario 3 (l1 Ã± l2 and r1 = r2). 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 f_test Ansari-Bradley Bartlett Levene Modified Levene Figure 17. Numerical simulations results for variance tests for Scenario 2 (l1 = l2 and r1 Ã± r2).

Findings and Applications 31 Summary of Numerical Analysis Findings The following observations were made based on the numerical simulations conducted on the normally distributed and nonparametric data sets: â¢ The t-test and the Welchâs t-test (unequal variance t-test) showed consistent satisfactory results in the simulations at the selected significance level regardless of distribution type. â¢ The Welchâs t-test showed more consistency in detecting the difference in means than the t-test and other hypothesis tests regardless of distribution type. â¢ The effectiveness of the hypothesis tests decreased when the CV increased. â¢ The alternative tests (Mann-Whitney and Kolmogorov-Smirnov two-sample test) showed similar results to the t-tests in most cases. â¢ The F-test showed consistent satisfactory results at the selected significance level. â¢ The alternative variance tests (Leveneâs test and Bartlettâs test) showed similar results to the F-tests in most cases. â¢ The Ansari-Bradley test showed satisfactory results for samples having equal medians. The numerical analysis provided the information needed to recommend validation tests on a statistical basis. The observations were then validated using SHA project data. The consistent results provided by the F- and t-tests support the finding reported in the literature that these tests are the most statistically appropriate tests to use for validation. 3.2 SHA Data Findings Based on the findings of the numerical simulations, the F-test and Welchâs t-test (unequal variance t-test) were used for the sampling, testing, and validation plan presented in Chapter 2. This plan was applied to SHA data; results are presented in the following sections. Case 1: SHA Results Case 1 used data from SHA 5 (the 5th state agency that provided data for the project) that requires contractors to perform QC tests on samples split from the same bulk samples used by the SHA for each lot. The data contained recent SHA and contractor results of percent AV of HMA 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% 0% 5% 10% 15% 20% 25% Su cc es s Ra te (% ) SHA sample CVSample 1 size = 7 f_test Ansari-Bradley Bartlett Levene Modified Levene Figure 18. Numerical simulations results for variance tests for Scenario 4 (l1 Ã± l2 and r1 Ã± r2 ).

32 Procedures and Guidelines for Validating Contractor Test Data (presented in Appendix D). A MATLAB code was developed to scan and sort the data based on SHA sample size per lot; all lots with less than six SHA samples were filtered out because a mini- mum of six sublots per lot is required for the proposed sampling, testing, and validation plan. During the sampling stage, three sublots were randomly selected to represent the SHA test results. The results of the contractor tests on the sublots corresponding to the SHA test results were excluded from the contractor test results for the primary validation stage. Thus, the con- tractor test results for primary validation consisted of the total number of sublots minus the three SHA sublots; the SHA test results became independent of the contractor test results (not from the same sublot). The initial step in the primary validation stage was testing the SHA and contractor data sets for outlying observations according to ASTM E178 procedure (48). The independent data set of the contractor was then validated against the SHA data set using the F-test and Welchâs t-test at a significance level, Î±, of 0.05. If the contractor test results were not validated in the primary validation, a secondary validation was conducted to compare the SHA results to the contractor results from the same sublots using the paired t-test. Eighty-six sets of percent AV data met the requirement of a minimum of six sublots per lot; results of the analysis carried out on these data sets are presented in Table 8. As shown, seven of 86 (8.1%) of the data sets failed the F-test and three (3.5%) failed the Welchâs t-test. In total, 10 of 86 (11.6%) of the data sets failed the primary validation, two (20%) of which failed the secondary validation. The results of the Welchâs t-test on the 86 data sets are presented in Figure 19. The figure shows the means ratio [i.e., the ratio of SHA sample mean (Âµ1) to the contractor sample mean (Âµ2)] versus the p-values, expressed as the negative value of the logarithm to base 10 of the p-values 0.0 0.5 1.0 1.5 2.0 2.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Âµ1 / Âµ2 W el ch 's t- te st â Lo g 1 0( p- va lu e) Figure 19. Case 1, Welchâs t-test results. Analyses Results Independent Samples Primary Validation Split Samples Secondary Validation F-test Welchâs t-test Paired t-test D2S Pass or Validated 79 83 76 67 84 8 Fail or Nonvalidated 7 3 10 21 4 2 Total 86 86 86 88 88 10 Percent Fail 8.1% 3.5% 11.6% 23.9% 4.5% 20.0% Table 8. Case 1: Results of percent in-place AVs of HMA for SHA 5.

Findings and Applications 33 [â log10 (p-value)]. As shown, the p-values take a nearly symmetrical shape around a means ratio of one. The horizontal dotted line in the figure is the threshold value for a 95% confidence level (Î± = 0.05) [represented by â log10 (0.05) or 1.3]; all values below this line represent âFailâ results. The results of the F-test on the 86 data sets are presented in Figure 20. It shows the standard deviations ratio [i.e., the ratio of SHA sample standard deviation (s1) to the contractor sample standard deviation (s2)] versus the p-values expressed as [â log10 (p-value)]. The results show a similar form to that observed in the Welchâs t-test results (Figure 19); the p-values take a right skewed shape around a standard deviations ratio of one. The horizontal dotted line in the figure is the threshold value for a 95% confidence level (Î± = 0.05) [represented by â log10 (0.05) or 1.3]; all values below this line represent âFailâ results. Although only 10 data sets were subjected to the secondary validation, the paired t-test was performed on all available data sets; the means ratio versus the p-values [expressed as â log10 (p-value)] are presented in Figure 21. The p-values appear to be random and do not take a pro- nounced symmetrical shape around a means ratio of one. The horizontal dotted line in the figure is the threshold value for a 95% confidence level (Î± = 0.05) [represented by â log10 (0.05) or 1.3]; all values below the horizontal dotted line (23.9% of the data sets) represent âFailâ results. (More SHA results are presented in the following examples and in Appendix D). 0.0 0.5 1.0 1.5 2.0 2.5 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 Ï1 / Ï2 F- te st â Lo g 1 0( p- va lu e) Figure 20. Case 1, F-test results. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 Âµ1 / Âµ2 Pa ire d t- te st â Lo g 1 0( p- va lu e) Figure 21. Case 1, paired t-test results.

34 Procedures and Guidelines for Validating Contractor Test Data D2S Limits The results of applying D2S limits on split samples from the SHA data are presented in Table 8. As shown, 23.9% of the data sets failed the paired t-test but only 4.5% failed the D2S limits when applied on the same split samples. The survey of SHAs revealed that a number of SHAs use D2S and X _ Â± CR for validation; these low power tests may contribute to making inappropriate acceptance and payment decisions. More details of the D2S results are presented in Appendix D. The observations of both numerical and SHA data analyses indicate appropriateness of using the F-test and t-tests. The t-test and the Welchâs t-test showed consistent satisfactory results but the Welchâs t-test showed more consistency in detecting the difference in means than the t-test and other hypothesis tests regardless of whether the variances were equal or not. The Welchâs t-test is an adaptation of Studentâs t-test and is more reliable when the two samples have unequal variances and unequal sample sizes; it is the default hypothesis test in most statistical software (47). 3.3 Illustrative Examples Data obtained from SHAs were used in five examples to illustrate use of the recommended three-step procedure for different scenarios: â¢ Sampling method: split versus independent. â¢ Sample size. â¢ Outlier detection. â¢ Retesting or resampling and retesting. â¢ Validation versus nonvalidation of contractor test results. 3.3.1 Sampling Method: Split Versus Independent The survey of SHAs revealed that some SHAs use independent sampling methods and a few SHAs use split sampling methods in collecting SHA and contractor test results for validation purposes. Using split rather than independent samples can lead to inappropriate acceptance and payment decisions. A split sample is defined as âa type of replicate sample that has been divided into two or more portions representing the same material,â and an independent sample is defined as âa sample taken without regard to any other sample that may also have been taken to represent the material in questionâ (2). Independent samples contain up to four sources of variability: material, process, sampling, and test method, but split samples contain only test method vari- ability (2). Therefore, as illustrated in Figure 22, independent samples are better suited for validating all sources of variability, but split samples are only suitable for validating test results. To illustrate the effect of using split versus independent samples when performing data valida- tion, a data set was obtained from a SHA that only uses SHA data for acceptance and requires a. Independent Samples b. Split Samples Figure 22. Components of variance for independent and split samples (45).

Findings and Applications 35 contractors to perform QC tests on samples split from the same SHA bulk samples used for each lot. A data set comprised of SHA and contractor data from multiple projects was used to illustrate and compare using split and independent samples. Figure 23 and Figure 24 show an example data set of random SHA and contractor results for AC obtained from the compiled SHA and contractor data. Figure 23 shows data for 28 samples split two ways between the SHA and contractor; the same data are also presented using box plots in Figure 24 to display the distribu- tion of data based on the minimum, the first quartile (Q1), the median, the third quartile (Q3), and the maximum. ASTM E178 procedure (48) was applied on the data sets prior to conducting hypothesis testing to detect outlying results; no outlying observations were detected in either data set. Step 1. Independent Samples: The independent samples were obtained by randomly selecting half of the 28 samples (i.e., 14) to represent the SHA portion of test results for validation; the results of the contractor tests on portions corresponding to the SHA samples were excluded. Thus, only results from the remaining 14 samples were used in the independent sample validation Minimum Maximum First quartile Third quartile Mean Median Median Figure 24. Box plot of SHA and contractor data. 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 As ph al t B in de r C on te nt (% ) Sample Number SHA SHA mean Contractor Contractor mean Figure 23. Scatter plot of SHA and contractor data.

36 Procedures and Guidelines for Validating Contractor Test Data testing of the contractorâs results, making the SHA tests and the contractor tests independent (not from the same âsamplesâ). Figure 25 shows the results for the 14 independent SHA samples and the 14 independent contractor samples. Step 2. Split Samples: The same 14 test results selected for the SHA in Step 1 were used for the split samples. The results of the contractor tests on portions corresponding to the SHA samples (i.e., the split portions of the 14 samples selected for the SHA) were used for split samples. Figure 26 shows a box plot and a scatter plot of the 14 split SHA samples results and the 14 split contractor sample results. Step 3. Statistical Tests: Validation of the contractor test results was performed for both the independent sample sets and split sample sets using the F-test and Welchâs t-test at a significance level, Î±, of 0.05; the results are presented together with summary statistics in Table 9. The Welchâs t-test indicated that for the independent sample sets the SHA results and the corresponding contractor results are statistically different at a significance level, Î±, of 0.05; thus (a) Box Plot 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 As ph al t B in de r C on te nt (% ) SHA Contractor (b) Scatter Plot 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 As ph al t B in de r C on te nt (% ) Sample Number SHA SHA mean Contractor Contractor mean Figure 26. Asphalt binder content data for SHA and contractor split samples. (a) Box Plot (b) Scatter Plot 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 As ph al t B in de r C on te nt (% ) 5.8 5.9 6.0 6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 As ph al t B in de r C on te nt (% ) Sample Number SHA SHA mean ContractorSHA Contractor Contractor mean Figure 25. Asphalt binder content data for SHA and contractor independent samples.

Findings and Applications 37 designated âFail.â For the split sample sets, the Welchâs t-test indicated that the SHA results and the corresponding contractor results are not statistically different; thus designated âPass.â The results can also be clearly inferred from Figures 25 and 26 where the variability of the indepen- dent samples (Figure 25) is much larger than that of the split samples (Figure 26) and is much more representative of the original data (Figure 23 and Figure 24). This example illustrates the importance of using independent samples when using F- and t-tests in the validation process. 3.3.2 Sample Size Review of SHA data indicated that the sample sizes fall under one of three categories: â¢ Category 1: Single SHA sample per lot (i.e., one per lot). â¢ Category 2: Good SHA sample per lot (i.e., greater than 2 and less than 20 per lot). â¢ Category 3: Large SHA sample lot (i.e., greater than 20 per lot). Category 1: Single SHA sample per lotâA considerable amount of the SHA data contained a single SHA result per lot, compared to a contractor sample size of three or more observations per lot. In this situation, an F-test cannot be performed because it could lead to inappropriate acceptance and payment decisions. Cumulative sampling techniques are proposed to convert data sets from Category 1 to the more desirable Category 2. The cumulative sampling technique utilizes a concept similar to a moving average, where a fixed number of lots (e.g., three) are accu- mulated to form a single CVL. Lots 1, 2, and 3 form CVL 1, then lot 1 in the set is dropped, and a new lot is added (lot 4) to form CVL 2. The technique is illustrated in Figure 27. A window of three lots (or more) will continue until a nonconforming lot is encountered. The process then restarts, and a new CVL is formed. A SHA data set was used to illustrate the cumulative sampling technique when performing data validation. Table 10 summarizes PCC strength (psi) test results of four consecutive lots; each lot contained a single SHA result and four to five contractor results. This technique is part of the validation plan described in the proposed practice for validating contractor test data. Category 2: Good SHA sample per lotâA reasonable amount of the SHA data contained SHA sample sizes of three or more observations per lot, compared to a contractor sample size of six or more observations per lot. Three SHA samples versus six contractor samples is the minimum number of samples required to perform the statistical tests (F- and t-tests). However, the impli- cations of sample size should always be assessed when establishing minimum sample sizes for both the SHA and contractor. In this case the SHA sample size is reasonable, and the application of statistical tests is possible, although a larger sample size reduces risk for both the SHA and contractor. Category 3: Large SHA sampleâOne of the SHAs that provided data considers the entire proj- ect as a single lot. The validation is performed at the end of the project, pooling all of the testing performed during the project as one single lot and thus resulting in a large SHA sample size. Independent Samples Split Samples SHA Contractor SHA Contractor Sample Size 14 14 14 14 Mean 6.41 6.22 6.41 6.34 Standard Deviation 0.138 0.158 0.138 0.150 Î± 0.050 F-test hypothesis Pass Pass p-value 0.645 0.769 UV t-test hypothesis Fail Pass p-value 0.003 0.250 Note: UV = unequal variance. Table 9. SHA and contractor data for independent and split samples.

38 Procedures and Guidelines for Validating Contractor Test Data SHA test result Contractor test result Figure 27. Illustration of the cumulative sampling technique. Cumulative Lot Lot Sublot SHA Contractor F-test UV t-test Lot 1 Sublot 1 â 5990.3 Sublot 2 â 6221.4 Sublot 3 â 5900.9 Sublot 4 â 7006.6 Sublot 5 6806.2 â CVL 1 Lot 2 Sublot 1 â 7671.9 Sublot 2 â 6324.6 Sublot 3 â 5748.3 Sublot 4 â 6020.6 Sublot 5 6913.8 5061.9 Lot 3 Sublot 1 â 6204.7 Pass Fail Sublot 2 â 6098.6 Sublot 3 â 4349.5 Sublot 4 â 3845.8 Sublot 5 6252.8 â CVL 2 Lot 4 Sublot 1 â 5646.9 Pass Fail Sublot 2 â 4700.6 Sublot 3 â 5018.0 Sublot 4 â 5996.6 Sublot 5 6334.6 â Note: â = no data; UV = unequal variance. Table 10. SHA and contractor data for PCC strength.

Findings and Applications 39 However, it leaves the contractor at risk of failing validation for the duration of the project, which may not be reasonable, especially with large projects. With this method of pooling project data, sample sizes tend to grow so large that the latter tests can lose relevance. Therefore, use of the entire project as a single lot is not desirable except for small projects and where a total number of samples does not exceed 20. Examples of SHAs Currently Using Cumulative Sampling Wisconsin Department of Transportation (WisDOT) and Kansas Department of Trans- portation (KDOT) use a form of CVL to validate contractor test results using F- and t-tests. WisDOT uses cumulative sampling techniques as a means to overcome the small SHA sample size problem (50â53), with the following features: â¢ Each lot maintains a constant 5:1 ratio (contractor to SHA ratio) of sublots per lot results in mixture testing lot size of 3,750 tons. â¢ Five lots are used to form its cumulative (referred to as a ârolling windowâ). â¢ If nonvalidation occurs, the new individual lot added to the rolling window (but not the rest of the lots in the window) is investigated. â¢ Pay adjustment is determined on a lot-basis. â¢ In case of nonvalidation, the D2S limits are checked. If within D2S limits, the contractor results are accepted; if not, the contractor could invoke Referee Testing to be performed by the Central Office Laboratory. KDOT routinely compares the variances and the means of the verification test results with the QC test results using F-test and t-test, respectively (54, 55). KDOT provides a series of spreadsheets used to compare the contractorâs QC results and KDOTâs verification (QA) results. For asphalt paving, four contractor results are validated against a single SHA result. Starting with lot 3, the F- and t-tests are used to compare the QC results and verification results. All QC results and verification results are used in the comparison for lots 3, 4 and 5. Starting with lot 6, all of the QC results and verification results for the last five lots are used in the comparison. In other words, five lots are accumulated to form a CVL, then a rolling window of five lots continues. 3.3.3 Outlier Detection One of the initial steps in the acceptance procedure is to test the SHA and contractor data sets for outlying observations. The outlier detection is included early in the process to ensure a high level of quality of the analyzed data and to eliminate the effect of including significant outliers on related decisions (48). Outlier is defined as âan observation that appears to devi- ate markedly in value from other members of the sample in which it appearsâ (2, 48). Outlying observations come from different sources, such as nonconformity with SHA procedures, errors in recording test values, poor sample integrity, errors associated with calculating results of test values, and possibly a valid test result of an extreme value. Observations that do not pass the statistical criterion for outlier detection are to be flagged for further investigation to find a cause or support suspicions against the flagged observations. Eventually, this can lead to correctly discarding invalid observations. The process is not applicable on small sample sizes (less than three); almost all outlier detection criteria assume normally distributed data (48). The criteria recommended in ASTM E178 procedure (48) was applied to a SHA data set that contained SHA and contractor results prior to conducting hypothesis testing. A comparison was made to assess the impact of including or excluding the outlier detection procedure on validating contractor test data for in-place density. The data together with summary statistics and statistical test results are illustrated in Figure 28 and also summarized in Table 11.

40 Procedures and Guidelines for Validating Contractor Test Data In this example, the ASTM E178 procedure was applied to the data set. A comparison of the raw data indicated that the SHA data contained a potential outlier. A summary statistics obtained by applying the F- and t-tests on the data sets with and without the outlying observation is also shown in Table 11. While the t-test hypothesis result did not change in this case (equal sample means), the F-test hypothesis result changed from âFail,â indicating unequal variance, to âPass,â indicating equal variances at a significance level of 0.01 (Î± = 0.01). The ASTM E178 procedure (48) does not provide guidance on how to address the reduction in the number of test results when outliers are detected and excluded. Because the power of the statistical tests decreases as the number of test results decreases, retesting or resampling and testing may be required depending on the adequacy of sample size. 3.3.4 Retesting or Resampling and Retesting Some SHA specifications require retesting when a second test or set of tests are performed to replace a previous test or set of tests for a sound reason (e.g., when an outlier is statistically (a) Raw data (b) Data after outlier detection and exclusion Figure 28. SHA data for HMA in-place density. HMA - In-place density Contractor SHA Contractor SHA (Outlier removed) 91.8 90.3 91.8 90.3 90.7 92.3 90.7 92.3 91.0 92.9 91.0 92.9 90.8 94.6 90.8 94.6 91.2 79.2* 91.2 â 90.8 â 90.8 â 92.3 â 92.3 â 91.3 â 91.3 â 92.5 â 92.5 â 92.3 â 92.3 â Sample Size, n 10.0 5.0 10.0 4.0 Sample Mean 91.5 89.9 91.5 92.5 Sample Variance 0.47 37.65 0.47 3.15 t-test hypothesis Pass Pass t-test p-value 0.406 0.128 F-test hypothesis Fail Pass F-test p-value 9.8E-07 0.023 Note: â = no data; * = potential outlier. Table 11. HMA percent in-place density for SHA data.

Findings and Applications 41 detected due to an abnormal condition). The retest may be on a split of a previous sample or a completely new sample. The retest results may then be included in the validation process and used as part of the acceptance and payment decisions. Specifications are expected to clearly address when retesting provisions apply and explicitly state if retesting is not allowed. Retesting is required for several reasons, such as when a statistical outlier is detected due to an abnormal condition, a sample is damaged (e.g., dropped or subjected to abnormal temperature because of malfunction of the temperature control system), or testing error (e.g., wrong loading rate). Retesting provisions should consider and discuss the following items: â¢ Definition of retesting. This should include when it applies, how the results will be used, and the process of notification, obtaining samples, security, and traceability. â¢ Labeling and identification of retest results in reporting. If a single sample is split into multiple portions with some portion to be retested at a later time, the provision should note if all or spe- cific AQCs are to be obtained from the resample. For example, if a 50 lb sample of HMA is split into 25 lb samples for testing multiple AQCs (e.g., AC, theoretical maximum specific gravity, and compacted mix bulk specific gravity), and abnormal conditions were identified in the theo- retical maximum specific gravity (e.g., due to poor vacuum pump performance) and retesting is required, retesting for all three AQCs is not required. However, if the abnormal condition is due to a poor sample integrity (e.g., segregation), retesting for all three AQCs would be needed. â¢ The differences inherent in the AQC tested by virtue of sample type and condition. Examples would be pavement thickness and compressive strength in concrete pavements. While retest- ing for thickness verification might be appropriate, the compressive strength obtained from molded cylinders cast during construction and cured in a controlled manner for seven or 28 days could vary significantly from that for cores taken from the pavement and cured under field conditions or cured for a period longer than the molded cylinders. This example is true for any resampling from a different population than the original. â¢ Impact of retesting on the number of samples being used for validation. â¢ Use of test results (e.g., for validation, payment calculations, or both). 3.3.5 Validation Versus Nonvalidation of Contractor Test Results This example illustrates the application of the recommended procedures for cases when con- tractor test results are validated. It also clarifies situations when the contractor test results are not validated and how that can be handled in the specifications. Two data sets were obtained from a SHA for percent AV of HMA. Figure 29 shows the SHA and contractor results for the first data set (Sample 1). The figure shows the scatter and box plots for the 15 samples split two ways between the SHA and contractor. Use of the recommended validation process is illustrated in Figure 30. The process includes the sampling, primary validation, and secondary validation steps represented by the dotted lines in the figure and described below. Step 1: SamplingâThe lot included 15 sublots, each of which is split into two portions desig- nated SHA and contractor portions; the contractor performs QC tests on all 15 portions. In the sampling stage, the SHA randomly selects three of the 15 sublots (samples number 3, 9, and 13) to test for validation; these three test results represent the SHA sample in Table 12; the remain- ing 12 sublots for the contractor (after the SHA random selection) are samples number 1, 2, 4, 5, 6, 7, 8, 10, 11, 12, 14, and 15. The results of the contractor tests on sublots corresponding to the SHA samples are excluded from the contractor sample for the primary validation stage, making the SHA test results independent of the contractor test results as they are not from the same sublots. Step 2: Primary ValidationâAs explained previously, the initial step proposed in the accep- tance procedure was to test the SHA and contractor data sets for outlying observations. The

42 Procedures and Guidelines for Validating Contractor Test Data ASTM E178 procedure (48) was applied to both SHA and contractor samples prior to con- ducting hypothesis testing, and no outlying observations were detected in either data set. The independent data set of the contractor was validated against the SHA data set using the F-test and Welchâs t-test at a significance level, Î±, of 0.05. The original SHA and contractor data, randomly selected independent data sets, the results of the statistical tests, and summary statis- tics are presented in Table 12. Both the F-test and the Welchâs t-test indicated that the independent data set of the SHA and the corresponding contractor data set are not statistically different at a significance level, Î±, of 0.05; thus âPass,â as shown in the table. Because in Sample 1, both SHA and contrac- tor data sets had similarly high variability (see Figure 29), the statistical tests indicated that the two data sets are not significantly different. In all cases, the variability in the data will be accounted for in the PWL calculation following the validation step. The target specification values and the upper and lower specification limits (USLs and LSLs) are presented in Fig- ure 29 with dashed and dotted lines. An example involving PWL and pay adjustment factor calculations is presented in Section 3.4. Scatter and box plots are shown in Figure 31 for a second data set (Sample 2). This data is independent of Sample 1 and is presented to illustrate a situation where the contractor data is not validated in the primary validation, and secondary validation (third step in Figure 30) is required. Step 3: Secondary ValidationâFollowing the primary validation, the independent data set of the contractor was validated against the SHA data set using the F-test and Welchâs t-test at a significance level, Î±, of 0.05. The original SHA and contractor data for Sample 2 are shown in Figure 31 and presented in Table 13, together with the randomly selected independent data sets and the results of the statistical tests. The F-test indicated that the independent data set of the SHA and the corresponding con- tractor data set are not statistically different at a significance level, Î±, (0.05); thus âPass,â but the Welchâs t-test indicated that the two data sets are statistically different; thus âFail,â as noted in Table 13. Therefore, the contractor test results are not validated in the primary validation, and secondary validation is required. In this validation, SHA results are compared to the contrac- tor results from the same samples [i.e., the split portions of the three portions (or more) that (a) Box Plot (b) Scatter Plot 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 1 3 5 7 9 11 13 15 Ai r V oi ds (% ) Sample number SHA SHA Contractor Contractor LSL USL Target Figure 29. SHA and contractor results for original Sample 1, where LSL stands for lower specification limits and USL stands for upper specification limits.

Findings and Applications 43 Figure 30. Sampling, testing, and validation process.

44 Procedures and Guidelines for Validating Contractor Test Data the SHA tested] using the paired t-test. This test uses the difference between pairs of tests and determines whether the average difference is statistically different from zero; use of as many portions as possible is desired. In this example, all 15 portions were used in the paired t-test. As shown in Table 13, the paired t-test indicated that the SHA data set and the contractor data set are statistically different at a significance level, Î±, (0.05), thus âFail,â and the contractor test results are not validated (SHA test results are therefore used for pay factor calculations). The validation plan is described in detail in the proposed practice (Part II) together with a discussion of dispute resolution. Original Sample Independent Sublots SHA Contractor SHA Contractor 1 3.8 3.6 4.0 3.6 2 4.6 4.4 3.5 4.4 3 4.0 3.7 3.1 3.0 4 2.8 3.0 â 4.2 5 4.3 4.2 â 5.8 6 5.0 5.8 â 3.6 7 4.0 3.6 â 3.8 8 3.8 3.8 â 4.8 9 3.5 3.9 â 3.2 10 5.2 4.8 â 2.8 11 3.2 3.2 â 6.3 12 2.5 2.8 â 3.0 13 3.1 3.4 â â 14 5.8 6.3 â â 15 3.4 3.0 â â Count 15 15 3 12 Mean 3.93 3.97 3.53 4.04 Ï 0.922 1.009 0.451 1.120 Primary Validation F-test hypothesis Pass p-value 0.29547 UV t-test hypothesis Pass p-value 0.25166 Note: â = no data; UV = unequal variance. Table 12. SHA and contractor Sample 1 data sets. 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 1 3 5 7 9 11 13 15 Ai r V oi ds (% ) (a) Box Plot 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 SHA Contractor (b) Scatter Plot Sample number SHA Contractor LSL USL Target Figure 31. AVs for SHA and contractor data for Sample 2.

Findings and Applications 45 3.4 Case Study Example In this case study, data obtained from a SHA for the thickness and the compressive strength of PCC pavements were used for contractor QC data validation. The SHA specifications and the associated quality manual were used for determining the PWL values and pay adjust- ments. The validation procedure contained in the SHA specifications was applied to sampled contractor and SHA data, followed by applying the validation plan procedures described in the proposed practice (Part II). Table 14 summarizes SHA and contractor test results for three consecutive lots. PCC Thickness Test Results Prior to conducting hypothesis testing, the SHA and contractor data sets for PCC thickness were examined for outlying observations using ASTM E178 procedure (48); no outlying obser- vations were detected in either data set. The PCC thickness test results of the contractor were validated against the SHA test results using the F-test and Welchâs t-test at a significance level, Î±, of 0.05. Both tests indicated that the PCC thicknesses obtained from SHA tests and the cor- responding contractor tests are not statistically different at this significance level; thus, âPassâ as noted in Table 14. The PCC thickness had a single-limit specification with a target value of 6.0 inches and LSL of 5.75 inches (6 inches minus 0.25 inch). PCC Compressive Strength Test Results Prior to conducting hypothesis testing, the ASTM E178 procedure (48) was applied to both SHA and contractor PCC compressive strength test results; no outlying observations were detected in either data set. The PCC compressive strength test results of the contractor were validated against the SHA test results using the F-test and Welchâs t-test at the same significance Original Sample Independent Split Sublots SHA Contractor SHA Contractor SHA Contractor 1 4.1 4.2 3.4 4.2 4.1 4.2 2 3.9 4.1 3.3 4.1 3.9 4.1 3 3.4 4.1 3.2 4.7 3.4 4.1 4 3.6 4.7 â 4.4 3.6 4.7 5 4.0 4.4 â 4.4 4.0 4.4 6 3.9 4.4 â 3.7 3.9 4.4 7 3.2 3.7 â 3.8 3.2 3.7 8 3.5 3.8 â 3.7 3.5 3.8 9 3.5 3.7 â 4.6 3.5 3.7 10 3.3 3.4 â 4.7 3.3 3.4 11 3.2 3.7 â 3.3 3.2 3.7 12 4.0 4.6 â 4.0 4.0 4.6 13 4.2 4.7 â â 4.2 4.7 14 3.1 3.3 â â 3.1 3.3 15 3.6 4.0 â â 3.6 4.0 Count 15 15 3 12 15 15 Mean 3.63 4.05 3.30 4.13 3.63 4.05 Ï 0.360 0.450 0.100 0.448 0.360 0.450 Primary Validation F-test hypothesis Pass â p-value 0.09683 UV t-test hypothesis Fail p-value 0.00005 Secondary Validation paired t-test hypothesis â Fail p-value 0.00002 Note: â = no data; UV = unequal variance. Table 13. Air voids for SHA and contractor data sets for Sample 2.

46 Procedures and Guidelines for Validating Contractor Test Data level. Both tests indicated that the PCC compressive strength test results of the SHA and the corresponding contractor test results are not statistically different at this significance level; thus, âPassâ as noted in Table 14. The PCC compressive strength also had a single-limit specifi- cation with a target value of 4,200 psi and a LSL of 3,900 psi (4,200 psi minus 300 psi). PWL Calculations The Quality Index, Q, for PCC thickness and compressive strength was calculated by subtract- ing the LSL from the mean, xâ, and dividing the result by the standard deviation, s, as follows: ( ) = â s Q x LSL Thus, the thickness, ( ) = â =6.44 6.25 0.482 1.424QT and the compressive strength, ( ) = â =5608.9 3900.0 511.23 3.343QS The percentage that falls above the specification limit PWL for thickness and compressive strength was estimated using Tables E.6 and E.7 in Appendix E, respectively for the com- puted value of Q and sample size, n. For a Q of 1.42 and n of 15, the PCC thickness, PWLT, is 92.63 (Table E.6) and for QS of 3.34 and n of 15 the PCC compressive strength, PWLS, is 100 (Table E.7). Lot Sublot Thickness (inch) Compressive Strength (psi) SHA Contractor SHA Contractor Cumulative Validation Lot CVL 1 Lot 1 Sublot 1 â 6.80 â 5137.1 Lot 1 Sublot 2 â 5.31 â 6101.1 Lot 1 Sublot 3 â 6.94 â 5628.5 Lot 1 Sublot 4 â 6.73 â 6102.7 Lot 1 Sublot 5 6.46 5.88 6064.0 6069.2 Lot 2 Sublot 1 â 6.43 â 6135.6 Lot 2 Sublot 2 â 6.26 â 5118.7 Lot 2 Sublot 3 â 6.14 â 5119.1 Lot 2 Sublot 4 â 7.00 â 4479.4 Lot 2 Sublot 5 7.58 6.94 5395.1 5670.5 Lot 3 Sublot 1 â 6.00 â 5870.1 Lot 3 Sublot 2 â 6.60 â 5010.9 Lot 3 Sublot 3 â 6.42 â 5765.0 Lot 3 Sublot 4 â 6.18 â 5967.5 Lot 3 Sublot 5 6.65 6.91 6623.6 5958.5 Mean ( ) 6.90 6.44 6027.6 5608.9 Standard Deviation (s) 0.599 0.482 615.05 511.23 Validation F-test hypothesis Pass Pass p-value 0.4904 0.5366 UV t-test hypothesis Pass Pass p-value 0.2952 0.3613 Note: â = no data; UV = unequal variance. Table 14. SHA and contractor test results for concrete pavement.

Findings and Applications 47 Thickness Pay Adjustment The PCC thickness Pay Factor, PT, is calculated as 0.01 using the following equation: P PWL T T( )= âï£® ï£°ï£¯ ï£¹ ï£»ï£º â =0.30 100 0.27 0.01 Combined Pay Adjustment The Combined Pay Factor, P, for thickness and compressive strength is calculated as 0.04 using the following equation: ( ) = + âï£® ï£°ï£¯ ï£¹ ï£»ï£º â =0.60 200 0.54 0.04P PWL PWLT S Data Manipulation Example The contractor data for PCC thickness presented in this case study was intentionally modified to examine the effect of data manipulation on PWL and pay adjustment factor calculations. In this example, the SHA test results remained the same while the contractor test results were manipulated in two ways: by increasing the contractor test results mean while maintaining the variability of the test results at the same level and by reducing the contractor test results variabil- ity while maintaining the mean value at the original level. The original SHA and contractor test results are presented in Table 15 along with five rounds of contractor results manipulation, each of which increased the contractor test results mean 0.1 inch each round while maintaining the variability of the test results at the same level (Â± 0.01 inch). The PCC thicknesses are reported to Original Data Manipulated Contractor Data Lot-Sublot SHA Contractor Round 1 Round 2 Round 3 Round 4 Round 5 1-1 6.46 6.80 7.61 7.01 7.25 7.90 6.63 1-2 7.58 5.31 6.45 7.13 6.51 7.23 8.12 1-3 6.65 6.94 5.93 6.66 6.73 6.76 7.67 1-4 â 6.73 6.17 6.10 6.70 7.10 6.35 1-5 â 5.88 6.19 5.43 6.55 7.18 6.91 2-1 â 6.43 6.21 7.30 6.95 6.97 7.74 2-2 â 6.26 6.74 6.79 6.56 6.99 7.01 2-3 â 6.14 6.40 7.06 7.67 6.65 7.02 2-4 â 7.00 6.37 6.53 6.58 6.57 6.55 2-5 â 6.94 7.25 6.72 5.52 6.47 6.83 3-1 â 6.00 6.53 6.67 6.69 6.90 6.97 3-2 â 6.60 7.29 6.16 6.74 5.76 6.89 3-3 â 6.42 6.70 6.94 7.22 7.20 6.69 3-4 â 6.18 6.49 6.86 7.16 7.30 7.27 3-5 â 6.91 7.16 6.40 7.18 7.01 7.31 Mean 6.90 6.44 6.63 6.65 6.80 6.93 7.06 Standard Deviation 0.599 0.482 0.487 0.480 0.490 0.476 0.484 F-test p-value 0.494 0.509 0.488 0.517 0.479 0.499hypothesis Pass Pass Pass Pass Pass Pass UV t-test p-value 0.313 0.533 0.559 0.813 0.929 0.684hypothesis Pass Pass Pass Pass Pass Pass Sample size, n 15 15 15 15 15 15 LSL 5.70 5.70 5.70 5.70 5.70 5.70 Q 1.914 1.982 2.243 2.590 2.822 1.914 PWLT 97.86 98.27 99.31 99.87 99.97 97.86 PT 0.02 0.02 0.03 0.03 0.03 0.02 Note: â = no data; UV = unequal variance. Table 15. SHA and contractor results (original and increased mean) for PCC pavement thickness.

48 Procedures and Guidelines for Validating Contractor Test Data Table 16. SHA and contractor results (original and reduced standard deviation) for PCC pavement thickness. Original Data Manipulation Rounds Lot-Sublot SHA Contractor Round 1 Round 2 Round 3 Round 4 Round 5 1-1 6.46 6.80 5.93 6.54 6.48 6.59 6.34 1-2 7.58 5.31 6.83 6.72 6.81 6.25 6.56 1-3 6.65 6.94 6.95 6.79 6.25 6.56 6.44 1-4 â 6.73 6.51 6.35 6.85 6.22 6.52 1-5 â 5.88 6.44 6.80 6.03 6.57 6.56 2-1 â 6.43 5.93 5.71 6.74 6.44 6.27 2-2 â 6.26 7.24 6.50 6.27 6.37 6.54 2-3 â 6.14 5.81 6.57 6.79 6.61 6.36 2-4 â 7.00 6.84 6.24 6.34 6.27 6.37 2-5 â 6.94 5.92 6.56 6.53 6.75 6.39 3-1 â 6.00 5.92 7.08 5.89 6.37 6.63 3-2 â 6.60 6.83 6.50 6.26 6.13 6.36 3-3 â 6.42 6.54 5.67 6.26 6.65 6.56 3-4 â 6.18 6.35 6.61 6.54 6.46 6.44 3-5 â 6.91 6.46 6.04 6.34 6.37 6.29 Mean 6.90 6.44 6.43 6.44 6.43 6.44 6.44 Standard Deviation 0.599 0.482 0.451 0.393 0.287 0.178 0.112 F-test p-value 0.494 0.413 0.268 0.067 0.002 0.000 hypothesis Pass Pass Pass Pass Fail Fail UV t-test p-value 0.313 0.311 0.320 0.305 0.317 0.319 hypothesis Pass Pass Pass Pass Pass Pass Sample size, n 15 15 15 15 15 15 LSL 5.75 5.75 5.75 5.75 5.75 5.75 QT 1.424 1.517 1.769 2.356 3.869 6.204 PWLT 92.63 94.07 96.80 99.58 100.00 100.00 PT 0.01 0.01 0.02 0.03 0.03 0.03 Note: â = no data; UV = unequal variance. the hundredth of an inch to illustrate the effects of the very small changes. The analysis approach was introduced by Wani and Gharaibeh (10), where they manipulated the AC of a contractor sample of 10 results validated against a SHA sample of five results. In these five rounds of contractor results manipulation (increased mean), the contractor test results passed the F- and t-tests, the PWLT increased from 97.86 to a max of 99.97, and the pay factor increased from 0.02 to 0.03. Table 16 lists the original SHA and contractor test results for PCC thickness along with five rounds of contractor results manipulation in which variability was reduced (0.05 inch minus 0.1 inch) in each round while maintaining the test results mean at the same level (Â± 0.1 inch). In these rounds of contractor results manipulation (reduced standard deviation), the contrac- tor test results passed the F- and t-tests in the first three rounds, increased from 92.63 to a max of 99.58, and increased the pay factor from 0.01 to 0.03. However, the contractor test results failed the F-test in the last two rounds. This example shows that contractor data manipulation, espe- cially when reducing variability, can lead to increased PWLs and pay adjustment factors while still passing the statistical tests. This suggests sensitivity of PWL to changes in estimated vari- ability, especially for a small sample size (the SHA sample size was three). Wani and Gharaibeh (10) recommended separating contractor quality management team from project management team and the use of larger lots for comparing contractor and SHA test results as potential means for reducing risk.

Next: Chapter 4 - Summary and Recommendations for Future Research »

Procedures and Guidelines for Validating Contractor Test Data (2020)

Chapter: Chapter 3 - Findings and Applications

Welcome to OpenBook!

Get Email Updates