**Appendix D****Statistical Calibration**

In the following, the ordinary least squares (OLS) and the weighted least squares (WLS) approaches to estimating the calibration function and related interval are reviewed.

**OLS ESTIMATION ^{1}**

As preparation for the following discussion, consider the relationship between response signal *y* and spiking concentration *x* in the region of the detection and quantification limits as a linear function of the form

where is a random variable that describes the deviations from the regression line, distributed with mean 0 and constant variance . The assumption of constant variance is not critical to this approach and will be relaxed in a later section; however, it is useful to simplify the initial exposition. The sample regression coefficient

provides an estimate of the population parameter β_{1} (i.e., the slope of the calibration function). The sample intercept

provides an estimate of the population parameter β_{0} (i.e., the intercept of the calibration function, which describes the mean instrument response or measured concentration when

^{1}This section is adapted from Gibbons, 1995.

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 151

Appendix D
Statistical Calibration
In the following, the ordinary least squares (OLS) and the weighted least squares
(WLS) approaches to estimating the calibration function and related interval are
reviewed.
OLS ESTIMATION1
As preparation for the following discussion, consider the relationship between
response signal y and spiking concentration x in the region of the detection and
quantification limits as a linear function of the form
y = 0 1 x (1)
where is a random variable that describes the deviations from the regression line,
distributed with mean 0 and constant variance y
2
x . The assumption of constant variance
is not critical to this approach and will be relaxed in a later section; however, it is useful
to simplify the initial exposition. The sample regression coefficient
n
( x x ) y
i =1
i i
b1 = (2)
( x x )
n
2
i
i =1
provides an estimate of the population parameter 1 (i.e., the slope of the calibration
function). The sample intercept
b0 = y b1 x (3)
provides an estimate of the population parameter 0 (i.e., the intercept of the calibration
function, which describes the mean instrument response or measured concentration when
1
This section is adapted from Gibbons, 1995.
151

OCR for page 151

152 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
true concentration x = 0 ). An unbiased sample estimate of y
2
x (i.e., the variance of
deviations from the population regression line) is given by
n
x = ( yi yi ) /( n 2)
2
sy ^ 2 (4)
i =1
^ i = b0 b1 xi .
where y
WLS ESTIMATION2
When variance is not constant, as is typically the case in the calibration setting,
then the previous OLS solution for constant or "homoscedastic" errors no longer applies.
There are several approaches to this problem, but in general the most widely accepted
approach is to model the variance as a function of true concentration x and to then use
the estimated variances as weights in estimating the calibration parameters, which are
now denoted as 0 w and 1w .
The weighted least squares regression of measured concentration or instrument
response ( y ) on true concentration ( x ) is denoted by
^ wi = b0 w b1w xi
y (5)
where
n
( x x
i =1
i w ) y i /k i
b1w = (6)
( x x
n
2
i w ) /k i
i =1
b0 w = y w b1w xw (7)
n
y /k
i =1
i i
yw = n
(8)
[1/k ]
i =1
i
2
This section is adapted from Gibbons and Bhaumik, 2001.

OCR for page 151

APPENDIX D 153
n
x /k
i =1
i i
xw = n
(9)
[1/k ]
i =1
i
2
and the weight ki = s x is the variance for sample i , which is computed from those
i
samples with true concentration xi = x . The weighted residual variance is
n
2
sw = ( yi y
^ wi ) 2 /ki /( n 2) (10)
i =1
ESTIMATING THE WEIGHTS3
When the number of replicates at each concentration is small, as is typically the
case, or there are no replicates, the observed variance at each concentration provides a
poor estimate of the true population variance. Two better alternatives are to (1) model the
observed variance or standard deviation as a function of true concentration or (2) model
the sum of squared residuals as a function of concentration. The latter approach can also
be performed iteratively, in which improved estimates of 0 and 1 are obtained from
weights computed from the current sum of squared residuals on each iteration. These new
estimates of 0 and 1 are in turn used to obtain a new set of estimated weights and so
forth until convergence. This algorithm is commonly termed "iteratively reweighted least
squares." An essential element of either approach is to identify a plausible model for the
variance function. The following sections consider a few models that are particularly well
suited to this problem.
Rocke and Lorenzato Model
To measure the true concentration of an analyte ( x) , the traditional simple linear
calibration model, y = 0 1 x e with the standard normality assumption on errors, is
not appropriate, as it fails to explain increasing measurement variation with increasing
analyte concentration, which is commonly observed in analytical data. To overcome this
situation, one may propose a log linear model, for example, y = xe , where is a
normal variable with mean 0 and standard deviation . This model also fails to explain
near-constant measurement variation of y for low true concentration level x (Rocke and
Lorenzato, 1995). To better model the calibration curve, Rocke and Lorenzato (1995)
proposed a combined model that has both types of errors:
3
This section is adapted from Gibbons and Bhaumik, 2001.

OCR for page 151

154 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
y jr = 0 1 x jr e e jr (11)
where y is the r th measurement at the j th concentration level, x jr is the corresponding
true concentration, and 0 and 1 are the fixed calibration parameters. In this model,
represents proportional error at higher true concentrations and the e jr 's are the additive
errors that are present primarily at low concentrations. Now assume that and the e jr 's
are independent and follow normal distributions with means 0 's and variances 2 and
e2 , respectively. Data near zero (i.e., x 0 ) determine e2 , and data for large
concentrations determine 2
. The model specification also indicates that errors at larger
concentrations are lognormally distributed and at low concentrations are normally
distributed, which agrees with common experience.
In their original paper, Rocke and Lorenzato (1995) derived the maximum
likelihood estimators for their model based on maximizing the likelihood function:
2 ( yi 0 1xi e ) 2
n
1 2 2 2
2
2 e
e
e d (12)
i =1 e
These computations require complex numerical evaluation of the required integrals.
Alternatively, Gibbons et al. (1997) and Rocke and Durbin (1998) have described a WLS
solution that involves the following algorithm:
1. Use OLS regression to find initial estimates of 0 and 1 by fitting the linear model:
y = 0 1 x e (13)
2. Using the sample standard deviation of the lowest concentration as an estimate for e
and the standard deviation of the log of the replicates at the highest concentration as an
initial estimate for , refit the model in step 1 using WLS with weights equal to
2 2
w( x) = 1/[ e2 12 x 2 e (e 1)] (14)
3. Using the new estimates of 0 and 1 , compute the predicted response y
^=^ ^x
0 1
and standard error of the calibration curve at each concentration x :
m( x)
(y
^y) i
2
s 2 ( x) = i =1
(15)
m( x )
where m(x) is the number of replicates for concentration x.

OCR for page 151

APPENDIX D 155
4. Using WLS, fit the variance function:
s 2 ( x) = x 2 e (16)
where
= e2 (17)
and
2 2
= 12 e (e 1) (18)
using weights:
w( x) = m( x)/s 2 ( x) (19)
5. Compute the new estimates of e2 = and
2
= log e (1 1 4/ 2 )/2 (20)
1
6. Iterate until convergence.
In general, this algorithm will converge to positive values of and . Note that
this algorithm uses WLS to compute the parameters of the calibration curve ( 0 and 1 )
as well as the parameters of the variance function ( and ). In this way, the lowest
concentrations with the smallest variances provide the greatest weight in the estimation.
The net result is to not sacrifice precision in estimating the calibration function and
corresponding interval estimates at low levels by including higher concentrations in the
analysis. This is quite useful if the interest is in low-level detection and quantification.
Exponential Model
An alternative parameterization of the variance function involves modeling the
relationship between and x as an exponential function of the following form:
a ( x)
x = a0 e 1 (21)
Although less well theoretically motivated than the Rocke and Lorenzato model, the
exponential model provides excellent fit to a wide variety of analytical data (see Gibbons

OCR for page 151

156 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
et al., 1997). The model can be applied either to the observed standard deviations at each
concentration or, iteratively, to the sum of squared residuals. For estimating a0 and a1 ,
the traditional approach involves substituting s x for x and using nonlinear least squares
(e.g., Gauss-Newton) or using OLS regression of the natural log transformed observed
standard deviation on true concentration (Snedecor and Cochran, 1989). Similarly, WLS
can also be used on the regression of log e ( s) on x using weights
w( x) = m( x)/s 2 ( x) (22)
Linear Model
The linear model has also been used to model the variance function (Currie,
1995). The linear model is of the form
x = a0 a1 ( x) (23)
The primary disadvantage of the linear model is that the small sampling
fluctuations in the observed sample variance at each concentration can lead to a negative
intercept (i.e., a0 < 0 ) and negative variance estimates. This can lead to improper
detection and quantification limit estimates and corresponding interval estimates. As
such, the linear model is generally not recommended for routine use. This is not a
problem for either of the two preceding models, which can mimic a linear model if
required.
ITERATIVELY REWEIGHTED LEAST SQUARES ESTIMATION
An alternative to modeling the observed variance at each concentration is to
model the squared residuals as a function of x , and then to use this estimated variance
function to obtain weights that are then used in estimating the regression coefficients.
This process is iterated until convergence, hence the term "iteratively reweighted least
squares"' (Carroll and Rupert, 1988). As noted by Neter et al. (1990) the methods of
maximum likelihood and weighted least squares lead to the same estimators for linear
regression models of the form considered here. The previous example of the WLS
estimator for the Rocke and Lorenzato model is an example of iteratively reweighted
least squares. The general algorithm is as follows:
1. Use OLS estimation to find initial estimates of 0 and 1 by fitting the linear model
y = 0 1 x (24)

OCR for page 151

APPENDIX D 157
2. Using the OLS estimates of 0 and 1 , compute the predicted response y^=^ ^x
0 1
and the standard error of the calibration curve at each concentration x:
m( x)
(y
^y) i
2
s 2 ( x) = i =1
(25)
m( x )
where m(x) is the number of replicates for concentration x.
3. Using an appropriate model for the variance function, fit the variance function to the
sum of squared residuals:
s 2 ( x) = f ( x) (26)
4. Using the provisional weights:
w( x) = m( x)/s 2 ( x) (27)
recompute 0 and 1 using WLS.
5. Iterate until convergence.
WLS PREDICTION INTERVALS
For WLS estimates of 0 and 1 the estimated variance for a predicted value y
^ wj
is
1 ( x j xw ) 2
V (y 2
^ wj ) = sw k j n n (28)
(1/ki ) ( xi xw ) /ki
i =1 i =1
2
where k j is the estimated variance at concentration x j . An upper (1 )100 percent
^ wj (i.e., an upper prediction limit for a new measured
confidence interval for y
concentration or instrument response at true concentration x j ) is
^ wj t V ( y
y ^ wj ) (29)

OCR for page 151

158 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
where t is the upper (1 )100 percent point of Student's t -distribution on n 2
degrees of freedom. For example, at x = 0 , the upper prediction limit (UPL) is
tsw 1 ( xw ) 2
UPL = 2
s0 n
n
(30)
b1w
(1/k ) ( x x
i =1
i
i =1
i w
2
) /k i
2
where s0 is the variance of the measured concentrations or instrument responses for a
sample that does not contain the analyte.
CONFIDENCE REGION FOR AN UNKNOWN TRUE CONCENTRATION
In general practice, measured concentrations are reported as if they are true
concentrations, without the benefit of an index of uncertainty. There are two problems
with this. First, the measured concentration may provide a biased estimate of the true
concentration to the extent that 0 0 . Second, even in the absence of bias, the measured
concentration is only an estimate of the true concentration, and it has a level of
uncertainty that is ignored by simply presenting the measured concentration in the
absence of a proper uncertainty interval.
To provide an estimate of true concentration x from measured concentration y
for the Rocke and Lorenzato model, compute
y ^
^=
0
x (31)
^ ^
1
^ as
Bhaumik and Gibbons (2005) derived the asymptotic variance of x
e2
Var ( x) = (1 1/n0 ) x 2 ( 2 1) (32)
12 2
where n0 is the number of calibration measurements at or near zero. As expected, the
variance of x^ depends on x and increases with increasing concentration.
Bhaumik and Gibbons (2005) developed a confidence interval for an unknown
true concentration x given a measured concentration y, separately for true concentrations
at or near x = 0 and for larger non-zero true concentrations. For a low-level true
concentration x0 , the (1 )100 percent confidence region for x0 is
max( 0, y0 z/2 ^ e2 /n0 , y0 z/2 ^ e2 /n0 ). To construct a confidence interval for an
unknown higher level concentration x, they use a lognormal approximation. Let
c1 = Var ( y ) = 12 x 2 ( 4 2 ) e2 (33)

OCR for page 151

APPENDIX D 159
c1
c2 = (34)
12 x 2
1 1 4c2
c3 = ln (35)
2
where c3 is the approximate variance of
y 0
ln x (36)
1
The quantity
ln( y 0 ) ln( 1 x)
z ( x) = (37)
c3
is distributed N (0,1) so that the (1 )100 percent confidence region for x0 is obtained
by iteratively solving
( x) = x : z/2 z ( x) z/2 (38)
In addition to reporting measured concentrations, the point estimate of x and its
95 percent confidence interval should also be routinely reported; it can be used for the
purpose of making both detection decisions and comparisons to regulatory standards. If,
for example, the lower 95 percent confidence limit is greater than zero, there is 95
percent confidence that the true concentration is greater than zero. By contrast, if the
upper 95 percent confidence limit is less than a regulatory standard, there is 95 percent
confidence that the true concentration is less than the regulatory standard, and the
corresponding (and potentially less costly) disposal options can be pursued.
DETECTION AND QUANTIFICATION
The previously described WLS prediction limit y ^ 0 corresponds to the concept of
a decision limit LC defined by Currie (1968) for the case in which the data arise from a
calibration experiment, and at x = 0 are unknown, and one wishes to make a
detection decision for a single future test sample. Measured concentrations (or instrument
responses) that exceed the UPL should yield the binary decision of "detected" with
(1 )100 percent confidence. Note that when the true concentation x = LC , the
probability of exceeding the UPL is only 50 percent. As such, Currie defined the

OCR for page 151

160 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP
detection limit LD as the 95 percent UPL for a true concentration at LC . The WLS
estimate of LC is therefore
ts 1 ( LC xw ) 2
LC = w s 2
LC n
n
(39)
b1w
(1/k ) ( x x
i =1
i
i =1
i w
2
) /k i
and the WLS estimate of LD is
tsw 1 ( LD xw ) 2
LD = LC 2
sL n
n
(40)
b1w D
(1/k ) ( x x
i =1
i
i =1
i w
2
) /k i
2
Note that in order to compute LC and LD , one must have estimates of sL and
C
2
s LD , which are often unavailable and must be estimated using a model of standard
deviation versus concentration, as previously described. The final estimates of LC and
LD are obtained from simple repeated substitution beginning from LC = 0 and LD = LC
until convergence (i.e., change of less than 10 4 in estimates of LC and LD on
successive iterations).
Finally, Currie (1968) defined the limit of determination LQ as the concentration
at which the signal-to-noise ratio is 10 to 1. In the current context, one can estimate LQ
by identifying the true concentration at which the estimated standard deviation is one-
tenth of its magnitude. Again, a simple iterative approach generally performs quite well
(Gibbons and Coleman, 2001; Gibbons et al., 1997).
REFERENCES
Bhaumik, D. and R. Gibbons. 2005. Confidence regions for random-effects calibration
curves with heteroscedastic errors. Technometrics 47(2): 223-231.
Carroll, R. and D. Rupert. 1988. Transformation and Weighting in Regression. Boca
Raton, FL: CRC Press.
Currie, L. 1968. Limits for qualitative detection and quantitative determination:
Application to radiochemistry. Analytical Chemistry 40(3): 586-593.
Currie, L. 1995. Nomenclature in evaluation of analytical methods including detection
and quantification capabilities. Pure and Applied Chemistry 67: 1699-1723.

OCR for page 151

APPENDIX D 161
Gibbons, R. 1995. Some statistical and conceptual issues in the detection of low-level
environmental pollutants. Environmental and Ecological Statistics 2(2): 125-145.
Gibbons, R. and D. Bhaumik. 2001. Weighted random-effects regression models with
applications to interlaboratory calibration. Technometrics 43(2): 192-198.
Gibbons, R. and D. Coleman. 2001. Statistical Methods for Detection and Quantification
of Environmental Contamination. New York, N.Y.: John Wiley & Sons, Inc.
Gibbons, R., D. Coleman, and R. Maddalone. 1997. An alternate minimum level
definition for analytical quantification. Environmental Science & Technology 31(7):
2071-2077.
Neter, J., W. Wasserman, and M. Kutner. 1990. Applied Linear Regression Models, 2nd
Ed. Homewood, IL: McGraw-Hill/Irwin.
Rocke, D. and B. Durbin. 1998. Models and Estimators for Analytical Measurement
Methods with Non-constant Variance. Davis, Calif.: University of California at
Davis, Center for Image Processing and Integrated Computing.
Rocke, D. and S. Lorenzato. 1995. A two-component model for measurement error in
analytical chemistry. Technometrics 37(2): 176-184.
Snedecor, G. and W. Cochran. 1989. Statistical Methods. Ames, IA: Iowa University
Press.

OCR for page 151