**Appendix E****Sampling Variability and Uncertainty Analyses**

In Appendix D, uncertainty in the analytical measurement process was considered and confidence intervals that reflect that uncertainty in an unknown true concentration *x* were developed. However, if one obtains a series of *n* measurements of a given piece of equipment, or of an area of potential contamination such as a room, or *n* soil samples in an area where contamination may have occured, then inferences about the potential area of concern must incorporate the sampling variability associated with the *n* measured concentrations. In a perfect world, one would compute a (1 -α)100 percent normal upper confidence limit (UCL), and if the UCL was less than the regulatory standard, one could conclude with (1 - α)100 percent certainty that the true concentration mean for the piece of equipment or spatial area was less than the regulatory standard of interest. Note that this does not require all measurements to be below the regulatory standard. Of course, the converse is also true—namely, that all of the individual measurements can be below the regulatory standard but the UCL may still exceed the standard. It should be noted that there is considerable EPA guidance supporting this approach, including but not limited to SW846 (EPA, 2007) guidance and the EPA unified statistical guidance document (EPA, 2009). In addition, this general approach is also clearly recommended in the ASTM consensus standard (D7048) (ASTM, 2010).

Factors that complicate the simple use of a normal UCL are these: (1) the distribution of measured concentrations is rarely normal and generally has a long right tail, which is characteristic of a lognormal or gamma distribution; (2) the analyte is often not detected in a substantial proportion of the samples; and (3) the large number of statistical comparisons that are made leads to a large number of positive results, consistent with chance expectations but likely to be false positives. In the following sections, a general statistical methodology that can be followed to address such factors is outlined.

**NORMAL CONFIDENCE LIMITS FOR THE MEAN ^{1}**

For a normally distributed constituent that is detected in all cases the (1 -α)100 percent normal lower confidence level (LCL) (assessment sampling and monitoring) for the mean of *n* measurements is computed as

^{1}The remainder of this appendix is largely an adaptation from Gibbons, 2009.

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 163

Appendix E
Sampling Variability and Uncertainty Analyses
In Appendix D, uncertainty in the analytical measurement process was considered
and confidence intervals that reflect that uncertainty in an unknown true concentration
x were developed. However, if one obtains a series of n measurements of a given piece
of equipment, or of an area of potential contamination such as a room, or n soil samples
in an area where contamination may have occured, then inferences about the potential
area of concern must incorporate the sampling variability associated with the n measured
concentrations. In a perfect world, one would compute a (1 )100 percent normal
upper confidence limit (UCL), and if the UCL was less than the regulatory standard, one
could conclude with (1 )100 percent certainty that the true concentration mean for the
piece of equipment or spatial area was less than the regulatory standard of interest. Note
that this does not require all measurements to be below the regulatory standard. Of
course, the converse is also truenamely, that all of the individual measurements can be
below the regulatory standard but the UCL may still exceed the standard. It should be
noted that there is considerable EPA guidance supporting this approach, including but not
limited to SW846 (EPA, 2007) guidance and the EPA unified statistical guidance
document (EPA, 2009). In addition, this general approach is also clearly recommended in
the ASTM consensus standard (D7048) (ASTM, 2010).
Factors that complicate the simple use of a normal UCL are these: (1) the
distribution of measured concentrations is rarely normal and generally has a long right
tail, which is characteristic of a lognormal or gamma distribution; (2) the analyte is often
not detected in a substantial proportion of the samples; and (3) the large number of
statistical comparisons that are made leads to a large number of positive results,
consistent with chance expectations but likely to be false positives. In the following
sections, a general statistical methodology that can be followed to address such factors is
outlined.
NORMAL CONFIDENCE LIMITS FOR THE MEAN1
For a normally distributed constituent that is detected in all cases the (1 )100
percent normal lower confidence level (LCL) (assessment sampling and monitoring) for
the mean of n measurements is computed as
1
The remainder of this appendix is largely an adaptation from Gibbons, 2009.
163

OCR for page 163

164 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
s
x t[ n 1, ] (1)
n
The (1 )100 percent normal UCL (corrective action) for the mean of n
measurements is computed as
s
x t[ n 1, ] (2)
n
When nondetects are present, several reasonable options are possible. If n < 8 ,
nondetects are replaced by one-half of the detection limit (DL) since with fewer than
eight measurements, more sophisticated statistical adjustments are typically not
appropriate. Similarly, a normal UCL is typically used because seven or fewer samples
are insufficient to confidently determine distributional form of the data. Because of a
lognormal limit with small samples can result in extreme limit estimates, it is reasonable
and conservative to default to normality for cases in which n 8.
If n 8 , a good choice is to use the method of Aitchison (1955) to adjust for
nondetects and test for normality and lognormality of the data using the Shapiro-Wilk
test. However, the ability of the Shapiro-Wilk test (and other distributional tests) to detect
nonnormality is highly dependent on sample size. For most applications, 95 percent
confidence is a reasonable choice. Note that alternatives such as the method of Cohen
(1961) can be used; however, the DL must be constant.
LOGNORMAL CONFIDENCE LIMITS FOR THE MEDIAN
For a lognormally distributed constituentthat is, y = log e ( x) is distributed
N ( y , y
2
) the (1 )100 percent LCL for the median or 50th percentile of the
distribution is given by
sy
exp y t[ n 1, ] (3)
n
where y and s y are the mean and standard deviation of the natural log transformed
concentrations. Note that the exponentiated limit is, in fact, an LCL for the median and
not the mean concentration. In general, the median and corresponding LCL will be lower
than the mean and its corresponding LCL. The (1 )100 percent UCL for the median or
50th percentile of the distribution is given by
sy
exp y t[ n 1, ] (4)
n

OCR for page 163

APPENDIX E 165
LOGNORMAL CONFIDENCE LIMITS FOR THE MEAN
The Exact Method
Land (1971) developed an exact method for computing confidence limits for
linear functions of the normal mean and variance. The classic example is the
normalization of a lognormally distributed random variable x through the transformation
y = log e ( x) , where, as noted previously, y is distributed normal with mean µ and
variance 2, or y : N ( y , 2
y ) . Using Land's (1975) tabled coefficients H , the one-
sided (1 )100 percent lognormal LCL for the mean is
Hsy
exp y .5s y n 1 (5)
Alternatively, using H1 , the one-sided (1 )100 percent lognormal UCL for the mean
is
H1 s y
exp y .5 s (6)
n 1
y
The factors H are given by Land (1975) and y and s y are the mean and standard
deviation of the natural log transformed data (i.e., y = log e ( x) ). Gilbert (1987) has a small
subset of these extensive tables for n = 3 through 101, s y = .1 through 10.0, and = .05
and .10 ( i.e., upper and lower 90 percent and 95 percent confidence limit factors).
Because these tables had historically been difficult to find, Gibbons and Coleman (2001)
reproduced the complete set of Land's (1975) tables and have also included computing
approximations that can be used for automated applications. Land (1975) suggests that
cubic interpolation (i.e., four-point Lagrangian interpolation) be used when working with
these tables (Abramawitz and Stegun, 1964). A much easier and quite reasonable
alternative is to use logarithmic interpolation.
Approximate Lognormal Confidence Limit Methods
There are also several approximations to lognormal confidence limits for the
mean that have been proposed. These have been conveniently classified as either
transformation methods or direct methods (Land, 1970). A transformation method is one
in which the confidence limit is obtained for the expected value of some function of x
and then transformed by some appropriate function to give an approximate limit for the

OCR for page 163

166 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
1
expectation of x (i.e., E(x)), which in the lognormal case is E ( x) = 2 . This
2
estimate is assumed to be normally distributed and approximate confidence limits are
computed accordingly.
The simplest transformation method is the naive transformation, which simply
involves taking a log transformation of the data, computing the confidence limit on a log
scale, and then exponentiating the limit. As previously noted, this is, in fact, a confidence
limit for the median and not the mean. The method provides somewhat reasonable results
as a confidence limit for the mean when y is very small but deteriorates quickly as y
increases (Land, 1970).
Patterson (1966) proposed use of the transformation
1
^ x = exp y 2
y (7)
2
to remove the obvious bias of the naive method. Patterson's transformation would be
exact if x
2
were known; however, when the variance is unknown, it too behaves poorly
when y increases (Land, 1970). More complicated alternatives described by Finney
(1941) and Hoyle (1968) provide results similar to those of Patterson's transformation
and are therefore not presented.
Direct methods offer an advantage over transformation methods in that they
obtain confidence intervals directly for E ( x) or some function of E ( x) . In light of this,
these methods do not suffer from the bias introduced by failing to take into account the
dependence of E ( x) on both and 2 . However, by applying normality assumptions to
E ( x) , direct estimates can produce inadmissible confidence limits for E ( x) . To this end,
Aitchison and Brown (1957) have suggested computing the usual normal confidence
limit, which under the Central Limit theorem should converge to exact limits as n
2
becomes large. Hoyle (1968) suggested replacing x and s x /n by their minimum variance
unbiased estimates (MVUE). Finney (1941) derived the MVUE of E ( x ) as follows
^ = exp( y ) ((1 n 1 ) s 2 )
(8)
y
and Hoyle (1968) derived the MVUE for the variance of E ( x ) as
^ = exp(2 y ) 2 (1 m 1 ) s 2 (2 4n 1 ) s 2
(9)
y y
where
n 1 (n 1)3 g 2 (n 1) 5 g3
( g ) = 1 g 2 3 ... (10)
n n 2! n 1 n 3! (n 1)(n 3)

OCR for page 163

APPENDIX E 167
is a Bessel function with argument g . In this method, the normal quantile z replaces
t since there is no reason to believe that ^ is chi-squared and independent of ^.
n 1,
Unfortunately, Land (1970) has shown that these methods are only useful for large n
(i.e., n > 100 ) and even there only for small values of s y .
The final direct method, which is attributed to D.R. Cox, has been shown to give
the best overall results of any of the approximate methods (Land, 1970). The MVUE of
= log E ( x) is ^ = y 1/2 s 2 , and the MVUE of the variance 2 of ^ is
y
1 4
^ 2 = sy
2
/n s y /(n 1) (11)
2
Assuming approximate normality for ^ , one may obtain approximate confidence limits
for E ( x) of the form
LCL = exp ^z
^ (12)
and
UCL = exp ^z
^ (13)
NONPARAMETRIC CONFIDENCE LIMITS FOR THE MEDIAN
When data are neither normally or lognormally distributed or the detection
frequency is too low (e.g., < 50 %) for a meaningful distributional analysis,
nonparametric confidence limits become the method of choice. The nonparametric
confidence limit is defined by an order statistic (i.e., a ranked observation) of the n
measurements. Note that in the nonparametric case, one is restricted to computing
confidence limits on percentiles of the distribution, for example, the 50th percentile or
median of the on-site/downgradient distribution. Unless the distribution is symmetric
(i.e., the mean and median are equivalent), there is no direct nonparametric way of
constructing a confidence limit for the mean concentration.
To construct a confidence limit for the median concentration, one uses the fact
that the number of samples falling below the p(100)th percentile of the distribution (e.g.,
p = .5 , where p is between 0 and 1) out of a set on n samples will follow a binomial
distribution with parameters n and success probability p, where success is defined as the
event that a sample measurement is below the p(100)th percentile. The cumulative
binomial distribution, Bin ( x; n, p) , represents the probability of getting x or fewer
successes in n trials with success probability p, and can be evaluated as
x
n i
Bin ( x; n, p ) = i p (1 p)
n i
(14)
i =1

OCR for page 163

168 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
n
The notation denotes the number of combinations of n things taken i at a
i
time, where
n n!
= (15)
i i!(n i))!
and k! = 1 2 3 . . .k for any counting number k. For example, the number of ways in
which two things can be selected from three things is
3 3! 12 3 6
= = = =3 (16)
2 2!(1)! (1 2)(1) 2
To compute a nonparametric confidence limit for the median, begin by rank
ordering the n measurements from smallest to largest as x(1) , x(2) , . . . , x( n ) . Denote the
1 * .5 ) as L* and
candidate end points selected to bracket the 50th percentile ( i.e., (n 1)
U * for lower and upper bound, respectively. For the LCL, compute the probability
B ( L* 1;
1 Bin 1 n,.5) (17)
If the probability is less than the desired confidence level, 1 , select a new
* *
value of L = L 1 and repeat the process until the desired confidence level is achieved.
For the UCL, compute the probability
Bin (U * 1;
1 B 1 n,.5) (18)
If the probability is less than the desired confidence level, 1 , select a new
* *
value of U = U 1 and repeat the process until the desired confidence level is achieved.
If the desired confidence level cannot be achieved, set the LCL to the smallest value or
the UCL to the largest value and report the achieved confidence level.
Another distribution that is often used for skewed data is the gamma distribution.
Suppose x follows a gamma distribution with the shape parameter and scale
parameter . Then the gamma density is given by
x
1
f ( x) =
x 1e
(19)
( )
Let x(1) , x(2) , . . . , x( n ) be a random sample of size n drawn from this population
to estimate the unknown parameters. Denote the arithmetic and geometric means based
on this random sample by x and ~ x , respectively. The maximum likelihood estimators of
and , denoted by ^ and ^ , are solutions to the following equations:

OCR for page 163

APPENDIX E 169
ln(^ ) (^ ) = ln ( x /~
x ), and ^^=x (20)
where denotes a digamma or Euler's psi function. The mean and variance of x are:
E( x) = and V ( x) = 2 (21)
To construct the UCL for this type of data, Aryal et al. (2009) constructed the
following statistic:
9(n)1/3 (n 1)( X 1/3 (n)1/3 ) 2
T= (22)
2nR n
where R n is the logarithm of the ratio of the arithmetic mean to the geometric mean and
is the mean of the population. X is the sum of all the observations. The UCL of is
obtained by solving the following equation and taking the largest root:
T F1 ,1,n 1 (23)
where F1 is the (1 ) 100th percentile of the F distribution with degrees of freedom
1 and n 1 . To compute the (1 )100 percent UCL, invert the test statistic T , from
which one obtains
UCL =
x i
(24)
n(1 U ) 3
where
x F
U = 2ln ~ 1 (25)
x 9(n 1)
REFERENCES
Abramawitz, M. and I. Stegun. 1964. Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables. Washington, D.C.: National Bureau of
Standards.
Aitchison, J. 1955. On the distribution of a positive random variable having a discrete
probability mass at the origin. Journal of American Statistical Association 50: 901-
908.

OCR for page 163

170 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
Aitchison, J. and J. Brown. 1957. The Log-normal Distribution. Cambridge, UK:
Cambridge University Press.
Aryal, S., D. Bhaumik, S. Santra, and R. Gibbons. 2009. Confidence interval for random-
effects calibration curves with left-censored data. Environmetrics 20(2): 181-189.
ASTM (American Society for Testing and Materials). 2010. ASTM D7048-04 Standard
Guide for Applying Statistical Methods for Assessment and Corrective Action
Environmental Monitoring Programs. West Conshohocken, Pa.: ASTM International.
Cohen, A. 1961. Tables for maximum likelihood estimates: singly truncated and singly
censored samples. Technometrics 3: 535-541.
U.S. Environmental Protection Agency (EPA). 2007. SW-846 Test Methods for
Evaluating Solid Waste, Physical/Chemical Methods. Washington, D.C.:
Environmental Protection Agency.
EPA. 2009. Statistical Analysis of Groundwater Monitoring Data at RCRA Facilities
Unified Guidance. EPA 530/R-09-007. Washington, D.C.: Environmental Protection
Agency Office of Resource Conservation and Recovery.
Finney, D. 1941. On the distribution of a variate whose logarithm is normally distributed.
Journal of the Royal Statistical Society, Series B 7: 155-161.
Gibbons, R. 2009. Assessment and corrective action monitoring. Pp. 317-335 in
Statistical Methods for Groundwater Monitoring, edited by R. Gibbons, D. Bhaumik,
and S. Aryal. Hoboken, N.J.: John Wiley & Sons, Inc.
Gibbons, R. and D. Coleman. 2001. Statistical Methods for Detection and Quantification
of Environmental Contamination. New York, N.Y.: John Wiley & Sons, Inc.
Gilbert, R. 1987. Statistical Methods for Environmental Pollution Monitoring. New York,
N.Y.: John Wiley and Sons, Inc.
Hoyle, M. 1968. The estimation of variances after using a gaussianating transformation.
Annals of Mathematical Statistics 39: 1125-1143.
Land, L. 1970. Phreatic Versus Vadose Meteoric Diagenesis of Limestones: Evidence
from a Fossil Water Table.
Land, C. 1971. Confidence intervals for linear functions of the normal mean and
variance. Annals of Mathematical Statistics 42:1187-1205.
Land, C. 1975. Tables of confidence limits for linear functions of the normal mean and
variance. Selected Tables in Mathematical Statistics 3: 385-419.

OCR for page 163

APPENDIX E 171
Patterson, C. and D. Settle. 1966. 7th Materials Research Symposium. National Bureau of
Standards Special Publication 422. Washington, D.C.: U.S. Government Printing
Office.

OCR for page 163