**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

**Suggested Citation:**"Annex A - Recommended Sampling Plans." National Academies of Sciences, Engineering, and Medicine. 2020.

*Procedures and Guidelines for Validating Contractor Test Data*. Washington, DC: The National Academies Press. doi: 10.17226/25823.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Proposed Practice for Validating Contractor Test Data 73 the Contractorâs results and the SHA results. In either of these cases, the party whose paired t-test result yields the lowest p-value is the party whose results are considered validated. ANNEX A â RECOMMENDED SAMPLING PLANS This annex includes optional sampling plans that provide efficient validation processes for two cases. The first case utilizes a minimum of six Contractor results per lot for each AQC and a minimum of three SHA results per lot for validation of those Contractor results. The second case is referred to as a Cumulative Validation Lot which features a smaller ratio of SHA validation tests per Contractor tests. SHA specifications should identify the minimum sample sizes for the AQCs. Case 1: Minimum of six Contractor tests and a minimum of three SHA validation tests per lot. A.1.1 For each lot, randomly take a minimum of six samples and split each into three equal portions. Label the samples 1-A, 1-C, 1-R, 2-A, 2-C, 2-R, etc. where the number designates the sample number (minimum of 6 in this case), A is the Agency (SHA) portion of the split, C is the Contractor portion of the split, and R is the Referee portion of the split. The process as illustrated in Figure A.1 is divided into four parts: Sampling, Primary Validation, Secondary Validation, and Dispute Resolution. Note A1 â The SHA must establish a policy for the security of validation and Referee sample portions (i.e., a chain of custody policy). All samples must be clearly labeled, securely sealed, and stored to avoid any concerns about sample integrity. A.1.2 The Contractor tests one portion from each of the samples. This will yield at least six Contractor test results per lot. Figure A.1 illustrates six random samples are tested by the Contractor. A.1.3 The SHA randomly selects and tests at least three portions for validation. Three verification results are considered the minimum number to produce statistically valid results for the F- and t-tests. The results of the Contractor tests on portions corresponding to the SHA samples must be excluded from the F- and t-test statistical comparisons in the primary validation. As illustrated in Figure A.1, if the SHA randomly selects samples 1-A, 3-A, and 6-A for validation, the Contractorâs results used in the primary validation testing would exclude results

74 Procedures and Guidelines for Validating Contractor Test Data from samples 1-C, 3-C, and 6-C, and the Contractorâs results from 2-C, 4-C, and 5-C will be used in the primary validation. Note A2 â In this example, the Contractorâs and SHAâs results from the same split sample are not used in the primary validation (F- and t-tests) which satisfies the 23 CFR 637B requirement that verification testing be conducted on independent samples. Note A3 â This example uses the minimum number of SHA and Contractor samples for a statistically valid verification. Larger sample sizes (e.g., up to 20 SHA samples per lot) will reduce both SHA and Contractor risk.

Proposed Practice for Validating Contractor Test Data 75 Figure A.1 Sampling and Validation Process Using a Minimum of Six Contractor Results and Three SHA Results per Lot.

76 Procedures and Guidelines for Validating Contractor Test Data A.1.4 Conduct an assessment of outliers among the SHA results and the Contractor results. A simple outlier detection procedure is provided in Annex B. A.1.4.1 If an outlier is detected in either set, then an investigation must be conducted to determine the probable cause(s) of the outlying data. The cause(s) must be corrected for subsequent sampling and testing. Note A4 â If an outlier is detected among either the SHA results or the Contractor results and it is discarded, there are several options for replacing the outlier result or combining data from consecutive lots for a proper validation. One option is resampling and testing the material or construction. The SHA should establish a resampling and testing policy and procedure as part of its QA plan. Another option exists when more than six samples were obtained in the initial sampling of the lot as described in A.1.1. If the outlier result is from the Contractorâs data, and seven or more samples were taken in the original sampling plan for the lot, then the Contractor data set will still have at least three independent test results after excluding samples used for validation. If the outlier result is from the SHAâs data, then the SHA could randomly select and test one of the other original sample portions and the Contractorâs result of the corresponding portion must be excluded from the primary validation analysis. A.1.4.2 If no outlier results are detected, then validation analysis of the Contractorâs data proceeds. A.1.5 Primary Validation - The first part of the primary validation is a comparison of the variabilities of the Contractor results and the SHA results for the lot. The statistical comparison of variabilities is evaluated with the F-test. Inputs required for the F- test are the variances, , of the Contractor results and the SHA results, the number of results, , used to calculate those variances, and the level of significance, Î±. The recommended value for Î± is 0.05. A.1.5.1 The F-statistic is calculated from the variances from the Contractor results and SHA results from the lot as follows: Where is the larger variance of either the Contractor results or the SHA results, and is the smaller variance of the two results.

Proposed Practice for Validating Contractor Test Data 77 A.1.5.2 The F-critical value is obtained from an F table (Table C.2, C.3, C.4, or C.5 in Annex C) based on the degrees of freedom and the selected level of significance. The degrees of freedom, , is calculated as the sample sizes minus 1 as follows: Where is the number of samples corresponding to the larger variance, and is the number of samples corresponding to the smaller variance. A.1.5.3 When F-statistic F-critical, it is concluded that the variabilities of the Contractor data and SHA data are statistically different; the cause of this difference should be investigated. A.1.5.4 The second part of the primary validation is a statistical comparison of the means of the Contractor results and the SHA results. The statistical comparison of means is evaluated using Welchâs t-test. Inputs required for Welchâs t-test are the means, , of the Contractor results and the SHA results, the variances, i , of the Contractor results and the SHA results, the number of results for each set, i, and the level of significance, Î±. The recommended value for Î± is 0.05. A.1.5.5 Welchâs t-statistic is calculated as follows: A.1.5.6 The t-critical value is obtained from Table C.1 in Annex C at a level of /2 based on the estimated degrees of freedom, , which is approximated as follows: The estimated degrees of freedom should be rounded down to the nearest integer. A.1.5.7 When the absolute value of Welchâs t-statistic > t-critical, then the means of the SHA results and the Contractorâs results are statistically different. A.1.5.8 If both the F-test and Welchâs t-test indicate that the SHA results and the Contractor results are not statistically different (i.e., the data sets are from the same population), the Contractorâs data are considered âvalidated by primary assessmentâ and all of the Contractorâs results (excluding outlier data) are used to determine the pay factor for this AQC.

78 Procedures and Guidelines for Validating Contractor Test Data A.1.5.9 If either the F-test or Welchâs t-test indicates that the Contractor results and the SHA results are statistically different, then the Contractorâs data are not validated by the primary assessment and the process moves into Secondary Validation. A.1.6 Secondary Validation - When the Contractorâs data are not validated by the primary assessment, the next step is to compare SHA results and the Contractor results from the same samples (i.e., the split portions) using the paired t-test. As illustrated in Figure A.1, the secondary validation compares the results of 1-A to 1-C, 3-A to 3-C, and 6-A to 6-C. The paired t-test is used to determine if the average difference between these pairs of results is statistically different from zero. A.1.6.1 The t-statistic for the paired t-test is calculated as follows: Where is the average of the differences between the split sample test results, is the standard deviation of the differences between the split sample test results, and is the number of split samples. A.1.6.2 The critical t-value is obtained from Table C.1 in Annex C at a level of /2 and ( ) degrees of freedom. A.1.6.3 When the paired t-statistic > t-critical, the means of the SHA results and the Contractorâs results are statistically different and the process should proceed to Dispute Resolution. Otherwise, the Contractorâs data are considered âvalidated by secondary assessmentâ and all of the Contractorâs results (excluding outlier data) are used to calculate the pay factor for this AQC. A.1.7 Dispute Resolution - The process for Dispute Resolution utilizes Referee Testing on each of the split samples corresponding to the samples already tested by the SHA for validation and tested by the Contractor in Section A.1.2. The results of the Referee tests are compared to the SHA validation results and the Contractorâs results on the splits from the same samples. Two paired t-tests are conducted. One of the paired t-tests is used to examine the average difference between the Referee test results and the SHA verification test results (i.e., the differences between 1-R and 1-A, 3-R and 3-A, and 6-R and 6-A). The second paired t-test is used to examine the average difference between the Referee test results and the Contractor test results (i.e., the differences between 1-R and 1-C, 3-R and 3-C, and 6-R and 6- C).

Proposed Practice for Validating Contractor Test Data 79 A.1.7.1 The paired t-test statistic and critical values for the Referee - SHA pairs and the Referee - Contractor pairs are determined following the same steps described in A.1.6.1 and A.1.6.2. A.1.7.2 The three possible outcomes of the Dispute Resolution paired t-tests using Referee test results are: 1. The difference between the Referee test results and the SHA results is not significantly different than zero, but the difference between Referee test results and Contractor results is significantly different than zero (i.e., the Referee results agree with the SHAâs results but not with the Contractor results). In this case, the SHAâs results would be used to determine the acceptance and pay factor for the AQC. 2. The difference between the Referee test results and the Contractor results is not significantly different than zero, but the difference between Referee test results and SHA results is significantly different than zero (i.e., the Referee results agree with the Contractorâs results but not with the SHA results). In this case, all Contractorâs data for the lot, excluding outliers, would be used to determine acceptance and pay factor for the AQC. 3. The Referee results do not agree (statistically) with either the Contractorâs or the SHAâs results, or the Referee results agree with both the Contractorâs results and the SHA results. In either case, the results of the party whose paired t-test result yields the lowest p-value are considered validated and that partyâs data are used to determine acceptance and pay factor for the AQC. Case 2: Cumulative Validation Lots A.2 This case enables the use of smaller lots from which three or more samples are randomly taken and tested by the Contractor and one sample is randomly obtained for validation testing by the SHA. Since this yields only a single SHA result per lot, which is insufficient for proper validation using F- and t-tests, validation results from three consecutive lots are combined as a Cumulative Validation Lot (CVL). This approach is similar to a moving average, where a fixed number of consecutive lots (e.g., 3) are combined to form a larger âvalidationâ lot that includes at least three validation results. CVL 1 includes results from lot 1, lot 2, and lot 3. The second CVL drops the results from lot 1 and combines the results from lots 2, 3, and 4. This moving CVL process continues as long as the validation process confirms the Contractorâs data. Figure A.2 illustrates this case.

80 Procedures and Guidelines for Validating Contractor Test Data For each lot, randomly take a minimum of four samples and split each of them into three equal portions. Label the samples 1-1-A, 1-1-C, 1-1-R, 1-2-A, 1-2-C, 1-2-R, etc. where the first number designates the lot number, the second number designates the sample number (minimum of 4 in this case), A is the Agency (SHA) portion of the split, C is the Contractor portion of the split, and R is the Referee portion of the split. The process, as illustrated in Figure A.2, is divided into four parts: Sampling, Primary Validation, Secondary Validation, and Dispute Resolution. Note A5 â The SHA must establish a policy for the security of validation and Referee sample portions (i.e., a chain of custody policy). All samples must be clearly labeled, securely sealed, and stored to avoid any concerns about sample integrity. A.2.1 The Contractor tests one portion from each of the samples. This will yield at least four Contractor test results per lot. Figure A.2 shows four Contractor samples per lot. A.2.2 For each lot, the SHA randomly selects and tests one portion for validation. As illustrated in Figure A.2, if the SHA randomly selects sample 1-1-A from lot 1 for validation, results from sample 1-1-C would be excluded from the Contractorâs results used in the primary validation testing; the Contractorâs results from 1-2-C, 1-3-C, and 1-4-C will be used in the primary validation. Note A6 â In this example, the Contractorâs and SHAâs results from the same split sample are not used in the primary validation (F- and t-tests) which satisfies the 23 CFR 637B requirement that verification testing be conducted on independent samples. To reduce risk on the initiation of a new CVL, the SHA may choose to select a total of three samples from the first two lots so that the first two lots can be validated, then use only one validation sample in lots 2 and 3 to form the first CVL. This will provide additional assurance that the material or construction meets the specification at the start of the work.

Proposed Practice for Validating Contractor Test Data 81 Figure A.2 Illustration of the Cumulative Validation Lot Approach

82 Procedures and Guidelines for Validating Contractor Test Data A.2.3 When the SHA has completed testing of three consecutive lots, an outlier analysis is conducted on the three SHA results. Outlier detection is also conducted on the Contractor results from the same lots. If an outlier is detected in either data set, then an investigation must be conducted to determine the probable cause(s) of the outlying data. The cause(s) must be corrected for subsequent sampling and testing. A.2.3.1 If an outlier is detected among the SHA results, then the SHA must randomly select and test one of the other original sample portions and the Contractorâs result on the corresponding portion must be excluded from the primary validation analysis. If an outlier is detected among the Contractor data for the CVL, then that result is discarded and no additional sample is needed since the Contractorâs data set for the CVL contains at least eight other results. A.2.4 Primary Validation - The three SHA results from the CVL are compared to all of the Contractor results from the three lots in the CVL. In this case, the number of validation results will be at least three, and the number of Contractor results will be nine unless an outlier was detected and discarded. A.2.4.1 The first statistical procedure for primary validation is a test of the hypothesis that the variabilities of the Contractor results and the SHA results for the lot are the same. This hypothesis is evaluated with the F-test. Inputs needed to conduct an F- test are the variances, , of the Contractor results and the SHA results, the number of results, , used to calculate those variances, and the level of significance, Î±. The recommended value for Î± is 0.05, which means that there is only a five percent chance that the hypothesis will be incorrectly rejected. A.2.4.2 The F-statistic is calculated from the variances from the Contractor results and SHA results from the lot, or combined lots as follows: Where is the larger variance of either the Contractor results or the SHA results, and is the smaller variance of the two results. A.2.4.3 The F-critical value is obtained from the F- table in Annex C, based on the degrees of freedom and the selected level of significance. The degrees of freedom, , is calculated as the sample sizes minus 1 as follows:

Proposed Practice for Validating Contractor Test Data 83 Where is the number of samples corresponding to the larger variance, and is the number of samples corresponding to the smaller variance. A.2.4.4 When F-statistic F-critical, it is concluded that the variabilities of the Contractor data and SHA data are different. The cause of the difference should be investigated. A.2.4.5 The second part of the primary validation is a statistical comparison of the means of the Contractor results and the SHA results. The statistical comparison of means is evaluated using Welchâs t-test. Inputs required for Welchâs t-test are the means, , of the Contractor results and the SHA results, the variances, , of the Contractor results and the SHA results, the number of results for each set, , and the level of significance, Î±. The recommended value for Î± is 0.05. A.2.4.6 Welchâs t-statistic is calculated as follows: A.2.4.7 The t-critical value is obtained from Table C.1 in Annex C at a level of Î±/2 based on the estimated degrees of freedom, , which is approximated as follows: The estimated degrees of freedom should be rounded down to the nearest integer. A.2.4.8 When the absolute value of Welchâs t-statistic > t-critical, then the means of the SHA results and the Contractorâs results are statistically different. A.2.4.9 If both the F-test and Welchâs t-test indicate that the SHA results and the Contractor results are not statistically different (i.e., the data sets are from the same population), the Contractorâs data from the entire CVL are considered âaccepted by Primary Validationâ and the Contractorâs results (excluding outlier data) corresponding to each lot are used to calculate the pay factor for this AQC. For example, if CVL 1 is validated, then the Contractorâs results from lot 1 are used to determine the pay factor for lot 1, the Contractorâs results from lot 2 are used to determine the pay factor for lot 2, and so on.

84 Procedures and Guidelines for Validating Contractor Test Data A.2.4.10 If either the F-test or Welchâs t-test indicates that the Contractor results and the SHA results are statistically different, then the Contractorâs data are not accepted by Primary Validation and the process moves into Secondary Validation. A.2.5 Secondary Validation - When the Contractorâs data are not validated by the primary assessment, the next step is to compare SHA results and the Contractor results from the same samples (i.e., the split portions) using a paired t-test. A.2.5.1 The Contractor tests the split portions of the three SHA verification samples from the CVL (e.g., 1-1-C, 2-3-C, and 3-2-C, as illustrated in Figure A.2). A.2.5.2 The paired t-test is used to determine if the average difference between the pairs of results (e.g., 1-1-A â 1-1-C) is statistically different from zero. The t-statistic for the paired t-test is calculated as follows: Where is the average of the differences between the split sample test results, is the standard deviation of the differences between the split sample test results, and is the number of split samples. A.2.5.3 The critical t-value is obtained from Table C.1 in Annex C at a level of Î±/2 and (n- 1) degrees of freedom. A.2.5.4 When the paired t-statistic > t-critical, the means of the SHA results and the Contractorâs results are statistically different and the process should proceed to Dispute Resolution. Otherwise, the Contractorâs data are considered âvalidated by secondary assessmentâ and all of the Contractorâs results (excluding outlier data) are used to calculate the pay factor for this AQC. A.2.6 Dispute Resolution - The process for Dispute Resolution for this case is the same as for Case 1 as described in A.1.7. However, in Case 2, when the Dispute Resolution favors SHA results, the SHA or the Referee Testing is required to test all portions for at least one lot. Note A7 â If a CVL is not validated through the entire validation process, only the latest lot added to the CVL is considered not validated. For example, if the Contractorâs results of CVL 1 (i.e., results from lot 1, lot 2, and lot 3) are validated, and the Contractorâs results of CVL 2 (i.e., results from lot 2, lot 3, and lot 4) are not validated, only lot 4 results are not validated at the secondary