**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Section 4 - Procedures and Guidelines for Validating Contractor Test Data." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Proposed Practice for Validating Contractor Test Data 65 3.44 Studentâs t distributionâa family of continuous sampling distributions employed in small sampling theory where the standard deviation is unknown. 3.45 validationâthe mathematical comparison of two independently obtained sets of data (e.g., SHA data versus Contractor data) to determine whether it can be assumed they came from the same population. Some references define validation as the process of verifying the soundness or effectiveness of a product (such as a model, a program, or specifications) thereby indicating official sanction. 3.46 variance ( )âa statistical measure of the dispersion of a population or a set of data. An unbiased measure of the dispersion of a set of data around its mean, calculated as the sum of the squares of the deviations from the mean of the results divided by the number of observations minus one. Variance is equal to the standard deviation squared. 3.47 verificationâthe process of determining or testing the truth or accuracy of test results by examining the data and/or providing objective evidence. Verification may be part of an Independent Assurance (IA) program or part of an acceptance program. 4. PROCEDURES AND GUIDELINES FOR VALIDATING CONTRACTOR TEST DATA 4.1 This practice refers to validation as the documented process used by a SHA to assess the reliability of Contractor data used to determine how well a material or construction meets the SHAâs project specification for acceptance. Validation is a required part of a SHAâs QA program when results from Contractor testing are used in acceptance decisions. 4.1.1 Acceptance decisions for a material or construction must be evaluated on a lot by lot basis as defined in the SHAâs QA program. 4.1.2 The acceptance quality characteristics (AQCs) that are used to determine acceptability (i.e., compliance with the specifications) must be relevant to performance, encourage quality, and if multiple AQCs are used, they must not be interdependent such that there is a strong chance of double-jeopardy, that is, failing of one AQC also results in failing of another. For example, asphalt paving mixtures are often accepted based on asphalt content, extracted gradation, laboratory compacted air void content, and voids in the mineral aggregate (VMA). There are strong inter-relationships among these properties which increases the probability of additive penalties. Criteria for acceptance (i.e., specification limits) must be based on achievable limits. Each AQC for the lot of material must be evaluated independently. For example, if the materialâs acceptance is determined based on a composite pay factor using four measured AQCs, then each AQC must be validated.

66 Procedures and Guidelines for Validating Contractor Test Data 4.1.3 Samples obtained for validation of Contractor results must be taken independent of those samples taken by the Contractor for acceptance testing. 4.1.4 The SHA must establish procedures regarding the security of samples (i.e., chain of custody) for materials samples to be used in validation and dispute resolution testing. 4.2 The ability of the validation procedure to identify potential differences between SHA and Contractor test results depends on the SHAâs selected level of significance and the number of test results (samples) that are being compared. The greater the number of test results from each set, the greater the ability of the comparison procedure to identify statistically valid differences. Using a single SHA sample to validate Contractor results is not valid since no measure of variability can be obtained from the single SHA sample. 4.3 Overview of the Validation Process 4.3.1 Figure 1 illustrates use of the validation process within a SHAâs acceptance plan for determining whether or not Contractor results should be used in acceptance or calculation of pay for a lot of material or construction item. Validation includes comparisons of test results from the SHA and the Contractor using formal statistical procedures that are simple to apply. A relevant detail is the 23 CFR 637B requirement of independent samples. Since test results from independent samples include three sources of variation, namely materials variability, sampling variability, and testing variability, it can be challenging to determine if results obtained by a SHA and a Contractor from the same lot are statistically different. This practice recommends an additional step for validation that compares results from multiple split samples. Further discussion is presented later in this practice. Annex A presents two example sampling and testing plans with efficient validation processes that utilize both independent and split samples.

Proposed Practice for Validating Contractor Test Data 67 Figure 1 Simplified Diagram of the Key Parts in an Acceptance Plan 4.3.2 Validation may include up to four steps: (1) outlier detection, (2) primary validation using F- and t-tests, (3) secondary validation with paired t-tests, and (4) dispute resolution. Each step has a specific statistical procedure. The first step is to identify potential outlying results and if there are outliers, replace the results from additional tests or other available samples. This step is essential to minimize the potential of including data that are beyond normal variability. The second step is the primary validation step and it has two parts. The first part uses the F-test to determine if the two data sets have similar variances. The second part uses the t-test to determine if the two data sets have similar means. If the F-test and the t-test validate the Contractorâs data, the Contractorâs data are used for acceptance and calculation of the pay factor for that AQC. Otherwise, the process moves to step 3, secondary validation. This step uses the paired t-test on multiple split samples. If the secondary validation shows that the difference in means of results from split samples is statistically different from zero, then the Contractorâs results are not validated and the fourth step, dispute resolution, may be invoked. Dispute resolution is a process 1. Project Sampling & Testing for each AQC â¢ Contractor sampling and testing for acceptance â¢ SHA sampling and testing for validation 2. Validation Process â¢ Outlier detection of Contractor and SHA data â¢ Primary validation with F- and t-tests â¢ Secondary validation with paired t-tests â¢ Dispute Resolution 3. Acceptance/Calculation of Pay Factors â¢ Use Contractor data if they pass either primary or secondary validation or win the dispute resolution â¢ Use SHA data if both primary and secondary validation fail and SHA data win the dispute resolution

68 Procedures and Guidelines for Validating Contractor Test Data used to judge whether the SHA results or the Contractor results are better estimates of the population when significant differences between those results are evident. 4.3.3 Sampling and Testing â The goal of sampling as part of a QA program is to obtain a suitable number of small but representative portions of the lot that are of sufficient size to conduct tests of the AQCs for determining specification compliance. Materials for such purposes must be obtained in a random manner and in accordance with the appropriate standards such as AASHTO R 60, R 90, T 23, and T 168. In order to adequately represent the lot, multiple samples are needed to estimate the AQCâs central tendency (i.e., mean) and dispersion (i.e., variability) within the lot. A minimum of three SHA results are needed to validate Contractor data. A fundamental truth of statistics is that having more samples provides better estimates of the population and having better estimates reduces the risks to both SHAs and Contractors. However, increasing the number of samples increases the cost for assessing the productâs quality and thus the cost of making the product. In general, the frequency of testing should be set to achieve a balance between the total cost for completed set of AQCs and risks associated with pay penalties and accepting poor quality materials and construction. 4.4 Description of Validation Steps 4.4.1 Testing of Replicates - As part of normal sampling and testing practices, all organizations that conduct tests for quality control, acceptance, validation, and Independent Assurance should consider the value of conducting replicate tests for each AQC on each sample and comparing the replicate results to the single-operator acceptable range, i.e., Difference Two-Sigma (D2S), in the precision statement of the given test method. This practice is a worthwhile check on individual data points and will help identify procedural errors and variations that may affect subsequent steps in the acceptance decision process. 4.4.2 Dealing with Outliers - Contractor and SHA data sets should be evaluated for potential outlier observations using an established statistically valid method such as ASTM E 178 Standard Practice for Dealing with Outlying Observations. This practice describes several methods for determining if one or more observations are markedly different from other observations in the same data set and allows the user to select the level of significance for determining critical values. A simple approach for identifying outliers in small data sets of materials test results developed by Maine Department of Transportation on the basis of the ASTM procedure is provided in Annex B. Possible causes for erroneous results include mishandling of the sample prior to testing, failure to properly follow the test method, malfunction of test equipment, errors in recording measurements, or computation errors. It is

Proposed Practice for Validating Contractor Test Data 69 important to identify the cause(s) of outlying results and correct them so that they do not persist. 4.4.3 Primary Validation - The first part of primary validation is a test of the hypothesis that the variabilities of the Contractor results and the SHA results for the lot (or combined lots) are the same. This hypothesis is evaluated with the F-test. Underlying assumptions of the F-test are that the samples are independent and the population is approximately normally distributed. Inputs needed to conduct an F-test are the variances, , of the Contractor results and the SHA results, the number of results, , used to calculate those variances, and the level of significance, Î±. The recommended value for Î± is 0.05, which means that there is only a five percent chance that the hypothesis will be incorrectly rejected. Note 1 â Because the calculation of the F-statistic and determining the critical value from a statistical table are easily coded, programs or spreadsheets are commonly used to conduct the F-test. When the number of SHA results is small (and much smaller than the number of Contractor results), the power of the F-test is weak such that relatively large differences between SHA and Contractor variances become less significant. In this case, combining data from consecutive lots is preferred over testing data with small numbers of results. 4.4.3.1 The F-statistic is calculated from the variances of Contractor results and SHA results for the lot or combined lots as follows: Where is the larger variance of either the Contractor results or the SHA results, and is the smaller variance of the two. 4.4.3.2 Obtain the F-critical value from an F table (Annex C, Tables C.2, C.3, C.4, or C.5) at a level of /2 and the degrees of freedom. The degrees of freedom, , are calculated as the sample sizes minus 1 as follows: Where is the number of samples corresponding to the larger variance, and is the number of samples corresponding to the smaller variance. 4.4.3.3 When F-statistic < F-critical, then the hypothesis is not rejected and it is concluded that the variabilities of the two data sets are not different. Otherwise, it is concluded that the variabilities of the two data sets are different.

70 Procedures and Guidelines for Validating Contractor Test Data Note 2 â When the variabilities are found to be different, the causes of difference should be investigated. As a starting point, the SHA and the Contractor personnel who take the samples and conduct the tests should review each otherâs methods. Because the primary validation tests are based on independent samples, there are more opportunities for differences in sampling, testing, and materials variability. Material segregation, whether it occurs in haul vehicles, placement, or sample handling, is a common source of variability. 4.4.3.4 The second part of primary validation is testing the hypothesis that the means of the Contractor results and the SHA results are equal. Welchâs t-test (also known as the âunequal variance t-testâ) is recommended for comparing the means of the data sets. Welchâs t-statistic is calculated as follows: 4.4.3.5 Obtain the critical t-value from the t-table (Annex C, Table C.1) at a level of /2 and the estimated degrees of freedom, , which is approximated as: Note 3 â The estimated degrees of freedom should be rounded down to the nearest integer because the degrees of freedom presented in the t-tables are integers. 4.4.3.6 When the absolute value of the t-statistic < t-critical, the hypothesis is not rejected and it is concluded that the means of the SHAâs results and the Contractorâs results are not different. Otherwise, it is concluded that the means of the SHAâs results and the Contractorâs results from independent samples are different. 4.4.3.7 If the F-test and Welchâs t-test indicate that the SHA results and Contractorâs results are not statistically different, the Contractorâs results for the lot are validated. In this case, the determination of the lotâs acceptance for this AQC may proceed using the Contractorâs results. If either the F-test or Welchâs t-test indicates that the SHA results and Contractorâs results are statistically different, then the Contractorâs data fail the primary validation and the comparison should proceed to the secondary validation.

Proposed Practice for Validating Contractor Test Data 71 4.4.4 Secondary Validation â This comparison of SHA and Contractor results utilizes multiple split samples that are evaluated with the paired t-test. The paired t-test determines whether the average difference between pairs of results is statistically different from zero. Note 4 â The paired t-test on split samples will only detect differences caused by testing variabilities of SHA and Contractor results. Since the results used in the paired t-test are obtained from split portions of the same sample, potential sources of variation from sampling and materials are eliminated. This can be helpful in identifying the cause of differences between SHA and Contractor results. If the paired t-test results indicate the data are similar but Welchâs t-test indicates that the means of the SHA and Contractor data are not similar, the differences between the independent samples could be attributed to sampling differences or changes in the material. However, if the paired t-test and Welchâs t-test both indicate that the SHA and Contractor data are different, there is a strong chance that the differences are caused by differences in the test method used by the two labs. 4.4.4.1 The t-statistic for the paired t-test is calculated as follows: Where is the average of the differences between the split sample test results, is the standard deviation of the differences between the split sample test results, and is the number of split samples. 4.4.4.2 Obtain the critical t-value from the t-table (Annex C, Table C.1) at a level of /2 and ( ) degrees of freedom. 4.4.4.3 When the paired t-statistic < t-critical, it is concluded that the means of the SHA results and the Contractorâs results are not statistically different and the Contractorâs results are validated by secondary assessment. In this case, the determination of the lotâs acceptance for this AQC may proceed using the Contractorâs results. When the paired t-statistic t-critical, it is concluded that the means of the SHA results and the Contractorâs results are statistically different and the Contractorâs results fail the primary and secondary validation and the comparison may proceed to Referee Testing, also known as Dispute Resolution and Conflict Resolution. Note 5 â Some SHAs give the Contractor the choice of whether or not to invoke Dispute Resolution. If the Contractor chooses to proceed with the Dispute

72 Procedures and Guidelines for Validating Contractor Test Data Resolution, the cost of the Referee Testing is paid by the SHA if the Referee results favor the Contractor and paid by the Contractor if the Referee results favor the SHA. If the Contractor does not invoke Referee Testing, then the SHAâs results are used to determine acceptance and pay factor of the AQC for the lot in question. 4.4.5 Dispute Resolution â Dispute Resolution is a defined process used when Contractor results are not validated and it is necessary to determine whether the Contractorâs data or the SHAâs verification data are correct. Dispute Resolution often consists of Referee Testing conducted by an accredited laboratory that is different (independent) from the SHA laboratory that conducted the verification testing. Often, the SHAâs central laboratory conducts Referee Testing, but it may be another SHA laboratory or an independent, consultant laboratory as long as it is accredited. A common practice in current Dispute Resolution systems is to conduct a Referee test on a single split sample and compare that result against the Contractor result and the SHA validation result. However, a more powerful statistical procedure for Dispute Resolution is to test the Referee portion of each sample corresponding to the samples used for verification. This recommended process for Dispute Resolution also utilizes the paired t-test for analysis of multiple split samples. The example sampling plans in Annex A provide additional illustrations of the recommended Dispute Resolution process. 4.4.5.1 Results of the Referee Testing are compared to both the SHA results and the Contractor results using the paired t-test steps described in 4.4.4.1 and 4.4.4.2. 4.4.5.2 There are three possible outcomes of the paired t-test comparisons using the results of the Referee tests: 1. The Referee test results are not significantly different from the SHA results, but are significantly different than the Contractorâs results. In other words, the Referee results favor the SHAâs results. In this case, the SHAâs results for the lot, excluding outliers, would be used to determine the pay factor for the AQC. 2. The Referee test results are not significantly different from the Contractorâs results but are significantly different from the SHA results. In other words, the Referee test results favor the Contractorâs results. In this case, all of the Contractorâs data for the lot, excluding outliers, would be used to determine the pay factor for the AQC. 3. The Referee results are not significantly different from the Contractorâs AND the SHA results, OR the Referee results are significantly different from