Appendix E
Sampling Variability and Uncertainty Analyses

In Appendix D, uncertainty in the analytical measurement process was considered and confidence intervals that reflect that uncertainty in an unknown true concentration x were developed. However, if one obtains a series of n measurements of a given piece of equipment, or of an area of potential contamination such as a room, or n soil samples in an area where contamination may have occured, then inferences about the potential area of concern must incorporate the sampling variability associated with the n measured concentrations. In a perfect world, one would compute a (1 -α)100 percent normal upper confidence limit (UCL), and if the UCL was less than the regulatory standard, one could conclude with (1 - α)100 percent certainty that the true concentration mean for the piece of equipment or spatial area was less than the regulatory standard of interest. Note that this does not require all measurements to be below the regulatory standard. Of course, the converse is also true—namely, that all of the individual measurements can be below the regulatory standard but the UCL may still exceed the standard. It should be noted that there is considerable EPA guidance supporting this approach, including but not limited to SW846 (EPA, 2007) guidance and the EPA unified statistical guidance document (EPA, 2009). In addition, this general approach is also clearly recommended in the ASTM consensus standard (D7048) (ASTM, 2010).

Factors that complicate the simple use of a normal UCL are these: (1) the distribution of measured concentrations is rarely normal and generally has a long right tail, which is characteristic of a lognormal or gamma distribution; (2) the analyte is often not detected in a substantial proportion of the samples; and (3) the large number of statistical comparisons that are made leads to a large number of positive results, consistent with chance expectations but likely to be false positives. In the following sections, a general statistical methodology that can be followed to address such factors is outlined.

NORMAL CONFIDENCE LIMITS FOR THE MEAN1

For a normally distributed constituent that is detected in all cases the (1 -α)100 percent normal lower confidence level (LCL) (assessment sampling and monitoring) for the mean of n measurements is computed as

img

1The remainder of this appendix is largely an adaptation from Gibbons, 2009.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 163
Appendix E Sampling Variability and Uncertainty Analyses In Appendix D, uncertainty in the analytical measurement process was considered and confidence intervals that reflect that uncertainty in an unknown true concentration x were developed. However, if one obtains a series of n measurements of a given piece of equipment, or of an area of potential contamination such as a room, or n soil samples in an area where contamination may have occured, then inferences about the potential area of concern must incorporate the sampling variability associated with the n measured concentrations. In a perfect world, one would compute a (1 )100 percent normal upper confidence limit (UCL), and if the UCL was less than the regulatory standard, one could conclude with (1 )100 percent certainty that the true concentration mean for the piece of equipment or spatial area was less than the regulatory standard of interest. Note that this does not require all measurements to be below the regulatory standard. Of course, the converse is also truenamely, that all of the individual measurements can be below the regulatory standard but the UCL may still exceed the standard. It should be noted that there is considerable EPA guidance supporting this approach, including but not limited to SW846 (EPA, 2007) guidance and the EPA unified statistical guidance document (EPA, 2009). In addition, this general approach is also clearly recommended in the ASTM consensus standard (D7048) (ASTM, 2010). Factors that complicate the simple use of a normal UCL are these: (1) the distribution of measured concentrations is rarely normal and generally has a long right tail, which is characteristic of a lognormal or gamma distribution; (2) the analyte is often not detected in a substantial proportion of the samples; and (3) the large number of statistical comparisons that are made leads to a large number of positive results, consistent with chance expectations but likely to be false positives. In the following sections, a general statistical methodology that can be followed to address such factors is outlined. NORMAL CONFIDENCE LIMITS FOR THE MEAN1 For a normally distributed constituent that is detected in all cases the (1 )100 percent normal lower confidence level (LCL) (assessment sampling and monitoring) for the mean of n measurements is computed as 1 The remainder of this appendix is largely an adaptation from Gibbons, 2009. 163

OCR for page 163
164 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP s x t[ n 1, ] (1) n The (1 )100 percent normal UCL (corrective action) for the mean of n measurements is computed as s x t[ n 1, ] (2) n When nondetects are present, several reasonable options are possible. If n < 8 , nondetects are replaced by one-half of the detection limit (DL) since with fewer than eight measurements, more sophisticated statistical adjustments are typically not appropriate. Similarly, a normal UCL is typically used because seven or fewer samples are insufficient to confidently determine distributional form of the data. Because of a lognormal limit with small samples can result in extreme limit estimates, it is reasonable and conservative to default to normality for cases in which n 8. If n 8 , a good choice is to use the method of Aitchison (1955) to adjust for nondetects and test for normality and lognormality of the data using the Shapiro-Wilk test. However, the ability of the Shapiro-Wilk test (and other distributional tests) to detect nonnormality is highly dependent on sample size. For most applications, 95 percent confidence is a reasonable choice. Note that alternatives such as the method of Cohen (1961) can be used; however, the DL must be constant. LOGNORMAL CONFIDENCE LIMITS FOR THE MEDIAN For a lognormally distributed constituentthat is, y = log e ( x) is distributed N ( y , y 2 ) the (1 )100 percent LCL for the median or 50th percentile of the distribution is given by sy exp y t[ n 1, ] (3) n where y and s y are the mean and standard deviation of the natural log transformed concentrations. Note that the exponentiated limit is, in fact, an LCL for the median and not the mean concentration. In general, the median and corresponding LCL will be lower than the mean and its corresponding LCL. The (1 )100 percent UCL for the median or 50th percentile of the distribution is given by sy exp y t[ n 1, ] (4) n

OCR for page 163
APPENDIX E 165 LOGNORMAL CONFIDENCE LIMITS FOR THE MEAN The Exact Method Land (1971) developed an exact method for computing confidence limits for linear functions of the normal mean and variance. The classic example is the normalization of a lognormally distributed random variable x through the transformation y = log e ( x) , where, as noted previously, y is distributed normal with mean and variance 2, or y : N ( y , 2 y ) . Using Land's (1975) tabled coefficients H , the one- sided (1 )100 percent lognormal LCL for the mean is Hsy exp y .5s y n 1 (5) Alternatively, using H1 , the one-sided (1 )100 percent lognormal UCL for the mean is H1 s y exp y .5 s (6) n 1 y The factors H are given by Land (1975) and y and s y are the mean and standard deviation of the natural log transformed data (i.e., y = log e ( x) ). Gilbert (1987) has a small subset of these extensive tables for n = 3 through 101, s y = .1 through 10.0, and = .05 and .10 ( i.e., upper and lower 90 percent and 95 percent confidence limit factors). Because these tables had historically been difficult to find, Gibbons and Coleman (2001) reproduced the complete set of Land's (1975) tables and have also included computing approximations that can be used for automated applications. Land (1975) suggests that cubic interpolation (i.e., four-point Lagrangian interpolation) be used when working with these tables (Abramawitz and Stegun, 1964). A much easier and quite reasonable alternative is to use logarithmic interpolation. Approximate Lognormal Confidence Limit Methods There are also several approximations to lognormal confidence limits for the mean that have been proposed. These have been conveniently classified as either transformation methods or direct methods (Land, 1970). A transformation method is one in which the confidence limit is obtained for the expected value of some function of x and then transformed by some appropriate function to give an approximate limit for the

OCR for page 163
166 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP 1 expectation of x (i.e., E(x)), which in the lognormal case is E ( x) = 2 . This 2 estimate is assumed to be normally distributed and approximate confidence limits are computed accordingly. The simplest transformation method is the naive transformation, which simply involves taking a log transformation of the data, computing the confidence limit on a log scale, and then exponentiating the limit. As previously noted, this is, in fact, a confidence limit for the median and not the mean. The method provides somewhat reasonable results as a confidence limit for the mean when y is very small but deteriorates quickly as y increases (Land, 1970). Patterson (1966) proposed use of the transformation 1 ^ x = exp y 2 y (7) 2 to remove the obvious bias of the naive method. Patterson's transformation would be exact if x 2 were known; however, when the variance is unknown, it too behaves poorly when y increases (Land, 1970). More complicated alternatives described by Finney (1941) and Hoyle (1968) provide results similar to those of Patterson's transformation and are therefore not presented. Direct methods offer an advantage over transformation methods in that they obtain confidence intervals directly for E ( x) or some function of E ( x) . In light of this, these methods do not suffer from the bias introduced by failing to take into account the dependence of E ( x) on both and 2 . However, by applying normality assumptions to E ( x) , direct estimates can produce inadmissible confidence limits for E ( x) . To this end, Aitchison and Brown (1957) have suggested computing the usual normal confidence limit, which under the Central Limit theorem should converge to exact limits as n 2 becomes large. Hoyle (1968) suggested replacing x and s x /n by their minimum variance unbiased estimates (MVUE). Finney (1941) derived the MVUE of E ( x ) as follows ^ = exp( y ) ((1 n 1 ) s 2 ) (8) y and Hoyle (1968) derived the MVUE for the variance of E ( x ) as ^ = exp(2 y ) 2 (1 m 1 ) s 2 (2 4n 1 ) s 2 (9) y y where n 1 (n 1)3 g 2 (n 1) 5 g3 ( g ) = 1 g 2 3 ... (10) n n 2! n 1 n 3! (n 1)(n 3)

OCR for page 163
APPENDIX E 167 is a Bessel function with argument g . In this method, the normal quantile z replaces t since there is no reason to believe that ^ is chi-squared and independent of ^. n 1, Unfortunately, Land (1970) has shown that these methods are only useful for large n (i.e., n > 100 ) and even there only for small values of s y . The final direct method, which is attributed to D.R. Cox, has been shown to give the best overall results of any of the approximate methods (Land, 1970). The MVUE of = log E ( x) is ^ = y 1/2 s 2 , and the MVUE of the variance 2 of ^ is y 1 4 ^ 2 = sy 2 /n s y /(n 1) (11) 2 Assuming approximate normality for ^ , one may obtain approximate confidence limits for E ( x) of the form LCL = exp ^z ^ (12) and UCL = exp ^z ^ (13) NONPARAMETRIC CONFIDENCE LIMITS FOR THE MEDIAN When data are neither normally or lognormally distributed or the detection frequency is too low (e.g., < 50 %) for a meaningful distributional analysis, nonparametric confidence limits become the method of choice. The nonparametric confidence limit is defined by an order statistic (i.e., a ranked observation) of the n measurements. Note that in the nonparametric case, one is restricted to computing confidence limits on percentiles of the distribution, for example, the 50th percentile or median of the on-site/downgradient distribution. Unless the distribution is symmetric (i.e., the mean and median are equivalent), there is no direct nonparametric way of constructing a confidence limit for the mean concentration. To construct a confidence limit for the median concentration, one uses the fact that the number of samples falling below the p(100)th percentile of the distribution (e.g., p = .5 , where p is between 0 and 1) out of a set on n samples will follow a binomial distribution with parameters n and success probability p, where success is defined as the event that a sample measurement is below the p(100)th percentile. The cumulative binomial distribution, Bin ( x; n, p) , represents the probability of getting x or fewer successes in n trials with success probability p, and can be evaluated as x n i Bin ( x; n, p ) = i p (1 p) n i (14) i =1

OCR for page 163
168 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP n The notation denotes the number of combinations of n things taken i at a i time, where n n! = (15) i i!(n i))! and k! = 1 2 3 . . .k for any counting number k. For example, the number of ways in which two things can be selected from three things is 3 3! 12 3 6 = = = =3 (16) 2 2!(1)! (1 2)(1) 2 To compute a nonparametric confidence limit for the median, begin by rank ordering the n measurements from smallest to largest as x(1) , x(2) , . . . , x( n ) . Denote the 1 * .5 ) as L* and candidate end points selected to bracket the 50th percentile ( i.e., (n 1) U * for lower and upper bound, respectively. For the LCL, compute the probability B ( L* 1; 1 Bin 1 n,.5) (17) If the probability is less than the desired confidence level, 1 , select a new * * value of L = L 1 and repeat the process until the desired confidence level is achieved. For the UCL, compute the probability Bin (U * 1; 1 B 1 n,.5) (18) If the probability is less than the desired confidence level, 1 , select a new * * value of U = U 1 and repeat the process until the desired confidence level is achieved. If the desired confidence level cannot be achieved, set the LCL to the smallest value or the UCL to the largest value and report the achieved confidence level. Another distribution that is often used for skewed data is the gamma distribution. Suppose x follows a gamma distribution with the shape parameter and scale parameter . Then the gamma density is given by x 1 f ( x) = x 1e (19) ( ) Let x(1) , x(2) , . . . , x( n ) be a random sample of size n drawn from this population to estimate the unknown parameters. Denote the arithmetic and geometric means based on this random sample by x and ~ x , respectively. The maximum likelihood estimators of and , denoted by ^ and ^ , are solutions to the following equations:

OCR for page 163
APPENDIX E 169 ln(^ ) (^ ) = ln ( x /~ x ), and ^^=x (20) where denotes a digamma or Euler's psi function. The mean and variance of x are: E( x) = and V ( x) = 2 (21) To construct the UCL for this type of data, Aryal et al. (2009) constructed the following statistic: 9(n)1/3 (n 1)( X 1/3 (n)1/3 ) 2 T= (22) 2nR n where R n is the logarithm of the ratio of the arithmetic mean to the geometric mean and is the mean of the population. X is the sum of all the observations. The UCL of is obtained by solving the following equation and taking the largest root: T F1 ,1,n 1 (23) where F1 is the (1 ) 100th percentile of the F distribution with degrees of freedom 1 and n 1 . To compute the (1 )100 percent UCL, invert the test statistic T , from which one obtains UCL = x i (24) n(1 U ) 3 where x F U = 2ln ~ 1 (25) x 9(n 1) REFERENCES Abramawitz, M. and I. Stegun. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Washington, D.C.: National Bureau of Standards. Aitchison, J. 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of American Statistical Association 50: 901- 908.

OCR for page 163
170 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP Aitchison, J. and J. Brown. 1957. The Log-normal Distribution. Cambridge, UK: Cambridge University Press. Aryal, S., D. Bhaumik, S. Santra, and R. Gibbons. 2009. Confidence interval for random- effects calibration curves with left-censored data. Environmetrics 20(2): 181-189. ASTM (American Society for Testing and Materials). 2010. ASTM D7048-04 Standard Guide for Applying Statistical Methods for Assessment and Corrective Action Environmental Monitoring Programs. West Conshohocken, Pa.: ASTM International. Cohen, A. 1961. Tables for maximum likelihood estimates: singly truncated and singly censored samples. Technometrics 3: 535-541. U.S. Environmental Protection Agency (EPA). 2007. SW-846 Test Methods for Evaluating Solid Waste, Physical/Chemical Methods. Washington, D.C.: Environmental Protection Agency. EPA. 2009. Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities Unified Guidance. EPA 530/R-09-007. Washington, D.C.: Environmental Protection Agency Office of Resource Conservation and Recovery. Finney, D. 1941. On the distribution of a variate whose logarithm is normally distributed. Journal of the Royal Statistical Society, Series B 7: 155-161. Gibbons, R. 2009. Assessment and corrective action monitoring. Pp. 317-335 in Statistical Methods for Groundwater Monitoring, edited by R. Gibbons, D. Bhaumik, and S. Aryal. Hoboken, N.J.: John Wiley & Sons, Inc. Gibbons, R. and D. Coleman. 2001. Statistical Methods for Detection and Quantification of Environmental Contamination. New York, N.Y.: John Wiley & Sons, Inc. Gilbert, R. 1987. Statistical Methods for Environmental Pollution Monitoring. New York, N.Y.: John Wiley and Sons, Inc. Hoyle, M. 1968. The estimation of variances after using a gaussianating transformation. Annals of Mathematical Statistics 39: 1125-1143. Land, L. 1970. Phreatic Versus Vadose Meteoric Diagenesis of Limestones: Evidence from a Fossil Water Table. Land, C. 1971. Confidence intervals for linear functions of the normal mean and variance. Annals of Mathematical Statistics 42:1187-1205. Land, C. 1975. Tables of confidence limits for linear functions of the normal mean and variance. Selected Tables in Mathematical Statistics 3: 385-419.

OCR for page 163
APPENDIX E 171 Patterson, C. and D. Settle. 1966. 7th Materials Research Symposium. National Bureau of Standards Special Publication 422. Washington, D.C.: U.S. Government Printing Office.

OCR for page 163