Appendix D
Statistical Calibration

In the following, the ordinary least squares (OLS) and the weighted least squares (WLS) approaches to estimating the calibration function and related interval are reviewed.

OLS ESTIMATION1

As preparation for the following discussion, consider the relationship between response signal y and spiking concentration x in the region of the detection and quantification limits as a linear function of the form

image

where image is a random variable that describes the deviations from the regression line, distributed with mean 0 and constant variance image. The assumption of constant variance is not critical to this approach and will be relaxed in a later section; however, it is useful to simplify the initial exposition. The sample regression coefficient

image

provides an estimate of the population parameter β1 (i.e., the slope of the calibration function). The sample intercept

image

provides an estimate of the population parameter β0 (i.e., the intercept of the calibration function, which describes the mean instrument response or measured concentration when

img

1This section is adapted from Gibbons, 1995.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 151
Appendix D Statistical Calibration In the following, the ordinary least squares (OLS) and the weighted least squares (WLS) approaches to estimating the calibration function and related interval are reviewed. OLS ESTIMATION1 As preparation for the following discussion, consider the relationship between response signal y and spiking concentration x in the region of the detection and quantification limits as a linear function of the form y = 0 1 x (1) where is a random variable that describes the deviations from the regression line, distributed with mean 0 and constant variance y 2 x . The assumption of constant variance is not critical to this approach and will be relaxed in a later section; however, it is useful to simplify the initial exposition. The sample regression coefficient n ( x x ) y i =1 i i b1 = (2) ( x x ) n 2 i i =1 provides an estimate of the population parameter 1 (i.e., the slope of the calibration function). The sample intercept b0 = y b1 x (3) provides an estimate of the population parameter 0 (i.e., the intercept of the calibration function, which describes the mean instrument response or measured concentration when 1 This section is adapted from Gibbons, 1995. 151

OCR for page 151
152 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP true concentration x = 0 ). An unbiased sample estimate of y 2 x (i.e., the variance of deviations from the population regression line) is given by n x = ( yi yi ) /( n 2) 2 sy ^ 2 (4) i =1 ^ i = b0 b1 xi . where y WLS ESTIMATION2 When variance is not constant, as is typically the case in the calibration setting, then the previous OLS solution for constant or "homoscedastic" errors no longer applies. There are several approaches to this problem, but in general the most widely accepted approach is to model the variance as a function of true concentration x and to then use the estimated variances as weights in estimating the calibration parameters, which are now denoted as 0 w and 1w . The weighted least squares regression of measured concentration or instrument response ( y ) on true concentration ( x ) is denoted by ^ wi = b0 w b1w xi y (5) where n ( x x i =1 i w ) y i /k i b1w = (6) ( x x n 2 i w ) /k i i =1 b0 w = y w b1w xw (7) n y /k i =1 i i yw = n (8) [1/k ] i =1 i 2 This section is adapted from Gibbons and Bhaumik, 2001.

OCR for page 151
APPENDIX D 153 n x /k i =1 i i xw = n (9) [1/k ] i =1 i 2 and the weight ki = s x is the variance for sample i , which is computed from those i samples with true concentration xi = x . The weighted residual variance is n 2 sw = ( yi y ^ wi ) 2 /ki /( n 2) (10) i =1 ESTIMATING THE WEIGHTS3 When the number of replicates at each concentration is small, as is typically the case, or there are no replicates, the observed variance at each concentration provides a poor estimate of the true population variance. Two better alternatives are to (1) model the observed variance or standard deviation as a function of true concentration or (2) model the sum of squared residuals as a function of concentration. The latter approach can also be performed iteratively, in which improved estimates of 0 and 1 are obtained from weights computed from the current sum of squared residuals on each iteration. These new estimates of 0 and 1 are in turn used to obtain a new set of estimated weights and so forth until convergence. This algorithm is commonly termed "iteratively reweighted least squares." An essential element of either approach is to identify a plausible model for the variance function. The following sections consider a few models that are particularly well suited to this problem. Rocke and Lorenzato Model To measure the true concentration of an analyte ( x) , the traditional simple linear calibration model, y = 0 1 x e with the standard normality assumption on errors, is not appropriate, as it fails to explain increasing measurement variation with increasing analyte concentration, which is commonly observed in analytical data. To overcome this situation, one may propose a log linear model, for example, y = xe , where is a normal variable with mean 0 and standard deviation . This model also fails to explain near-constant measurement variation of y for low true concentration level x (Rocke and Lorenzato, 1995). To better model the calibration curve, Rocke and Lorenzato (1995) proposed a combined model that has both types of errors: 3 This section is adapted from Gibbons and Bhaumik, 2001.

OCR for page 151
154 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP y jr = 0 1 x jr e e jr (11) where y is the r th measurement at the j th concentration level, x jr is the corresponding true concentration, and 0 and 1 are the fixed calibration parameters. In this model, represents proportional error at higher true concentrations and the e jr 's are the additive errors that are present primarily at low concentrations. Now assume that and the e jr 's are independent and follow normal distributions with means 0 's and variances 2 and e2 , respectively. Data near zero (i.e., x 0 ) determine e2 , and data for large concentrations determine 2 . The model specification also indicates that errors at larger concentrations are lognormally distributed and at low concentrations are normally distributed, which agrees with common experience. In their original paper, Rocke and Lorenzato (1995) derived the maximum likelihood estimators for their model based on maximizing the likelihood function: 2 ( yi 0 1xi e ) 2 n 1 2 2 2 2 2 e e e d (12) i =1 e These computations require complex numerical evaluation of the required integrals. Alternatively, Gibbons et al. (1997) and Rocke and Durbin (1998) have described a WLS solution that involves the following algorithm: 1. Use OLS regression to find initial estimates of 0 and 1 by fitting the linear model: y = 0 1 x e (13) 2. Using the sample standard deviation of the lowest concentration as an estimate for e and the standard deviation of the log of the replicates at the highest concentration as an initial estimate for , refit the model in step 1 using WLS with weights equal to 2 2 w( x) = 1/[ e2 12 x 2 e (e 1)] (14) 3. Using the new estimates of 0 and 1 , compute the predicted response y ^=^ ^x 0 1 and standard error of the calibration curve at each concentration x : m( x) (y ^y) i 2 s 2 ( x) = i =1 (15) m( x ) where m(x) is the number of replicates for concentration x.

OCR for page 151
APPENDIX D 155 4. Using WLS, fit the variance function: s 2 ( x) = x 2 e (16) where = e2 (17) and 2 2 = 12 e (e 1) (18) using weights: w( x) = m( x)/s 2 ( x) (19) 5. Compute the new estimates of e2 = and 2 = log e (1 1 4/ 2 )/2 (20) 1 6. Iterate until convergence. In general, this algorithm will converge to positive values of and . Note that this algorithm uses WLS to compute the parameters of the calibration curve ( 0 and 1 ) as well as the parameters of the variance function ( and ). In this way, the lowest concentrations with the smallest variances provide the greatest weight in the estimation. The net result is to not sacrifice precision in estimating the calibration function and corresponding interval estimates at low levels by including higher concentrations in the analysis. This is quite useful if the interest is in low-level detection and quantification. Exponential Model An alternative parameterization of the variance function involves modeling the relationship between and x as an exponential function of the following form: a ( x) x = a0 e 1 (21) Although less well theoretically motivated than the Rocke and Lorenzato model, the exponential model provides excellent fit to a wide variety of analytical data (see Gibbons

OCR for page 151
156 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP et al., 1997). The model can be applied either to the observed standard deviations at each concentration or, iteratively, to the sum of squared residuals. For estimating a0 and a1 , the traditional approach involves substituting s x for x and using nonlinear least squares (e.g., Gauss-Newton) or using OLS regression of the natural log transformed observed standard deviation on true concentration (Snedecor and Cochran, 1989). Similarly, WLS can also be used on the regression of log e ( s) on x using weights w( x) = m( x)/s 2 ( x) (22) Linear Model The linear model has also been used to model the variance function (Currie, 1995). The linear model is of the form x = a0 a1 ( x) (23) The primary disadvantage of the linear model is that the small sampling fluctuations in the observed sample variance at each concentration can lead to a negative intercept (i.e., a0 < 0 ) and negative variance estimates. This can lead to improper detection and quantification limit estimates and corresponding interval estimates. As such, the linear model is generally not recommended for routine use. This is not a problem for either of the two preceding models, which can mimic a linear model if required. ITERATIVELY REWEIGHTED LEAST SQUARES ESTIMATION An alternative to modeling the observed variance at each concentration is to model the squared residuals as a function of x , and then to use this estimated variance function to obtain weights that are then used in estimating the regression coefficients. This process is iterated until convergence, hence the term "iteratively reweighted least squares"' (Carroll and Rupert, 1988). As noted by Neter et al. (1990) the methods of maximum likelihood and weighted least squares lead to the same estimators for linear regression models of the form considered here. The previous example of the WLS estimator for the Rocke and Lorenzato model is an example of iteratively reweighted least squares. The general algorithm is as follows: 1. Use OLS estimation to find initial estimates of 0 and 1 by fitting the linear model y = 0 1 x (24)

OCR for page 151
APPENDIX D 157 2. Using the OLS estimates of 0 and 1 , compute the predicted response y^=^ ^x 0 1 and the standard error of the calibration curve at each concentration x: m( x) (y ^y) i 2 s 2 ( x) = i =1 (25) m( x ) where m(x) is the number of replicates for concentration x. 3. Using an appropriate model for the variance function, fit the variance function to the sum of squared residuals: s 2 ( x) = f ( x) (26) 4. Using the provisional weights: w( x) = m( x)/s 2 ( x) (27) recompute 0 and 1 using WLS. 5. Iterate until convergence. WLS PREDICTION INTERVALS For WLS estimates of 0 and 1 the estimated variance for a predicted value y ^ wj is 1 ( x j xw ) 2 V (y 2 ^ wj ) = sw k j n n (28) (1/ki ) ( xi xw ) /ki i =1 i =1 2 where k j is the estimated variance at concentration x j . An upper (1 )100 percent ^ wj (i.e., an upper prediction limit for a new measured confidence interval for y concentration or instrument response at true concentration x j ) is ^ wj t V ( y y ^ wj ) (29)

OCR for page 151
158 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP where t is the upper (1 )100 percent point of Student's t -distribution on n 2 degrees of freedom. For example, at x = 0 , the upper prediction limit (UPL) is tsw 1 ( xw ) 2 UPL = 2 s0 n n (30) b1w (1/k ) ( x x i =1 i i =1 i w 2 ) /k i 2 where s0 is the variance of the measured concentrations or instrument responses for a sample that does not contain the analyte. CONFIDENCE REGION FOR AN UNKNOWN TRUE CONCENTRATION In general practice, measured concentrations are reported as if they are true concentrations, without the benefit of an index of uncertainty. There are two problems with this. First, the measured concentration may provide a biased estimate of the true concentration to the extent that 0 0 . Second, even in the absence of bias, the measured concentration is only an estimate of the true concentration, and it has a level of uncertainty that is ignored by simply presenting the measured concentration in the absence of a proper uncertainty interval. To provide an estimate of true concentration x from measured concentration y for the Rocke and Lorenzato model, compute y ^ ^= 0 x (31) ^ ^ 1 ^ as Bhaumik and Gibbons (2005) derived the asymptotic variance of x e2 Var ( x) = (1 1/n0 ) x 2 ( 2 1) (32) 12 2 where n0 is the number of calibration measurements at or near zero. As expected, the variance of x^ depends on x and increases with increasing concentration. Bhaumik and Gibbons (2005) developed a confidence interval for an unknown true concentration x given a measured concentration y, separately for true concentrations at or near x = 0 and for larger non-zero true concentrations. For a low-level true concentration x0 , the (1 )100 percent confidence region for x0 is max( 0, y0 z/2 ^ e2 /n0 , y0 z/2 ^ e2 /n0 ). To construct a confidence interval for an unknown higher level concentration x, they use a lognormal approximation. Let c1 = Var ( y ) = 12 x 2 ( 4 2 ) e2 (33)

OCR for page 151
APPENDIX D 159 c1 c2 = (34) 12 x 2 1 1 4c2 c3 = ln (35) 2 where c3 is the approximate variance of y 0 ln x (36) 1 The quantity ln( y 0 ) ln( 1 x) z ( x) = (37) c3 is distributed N (0,1) so that the (1 )100 percent confidence region for x0 is obtained by iteratively solving ( x) = x : z/2 z ( x) z/2 (38) In addition to reporting measured concentrations, the point estimate of x and its 95 percent confidence interval should also be routinely reported; it can be used for the purpose of making both detection decisions and comparisons to regulatory standards. If, for example, the lower 95 percent confidence limit is greater than zero, there is 95 percent confidence that the true concentration is greater than zero. By contrast, if the upper 95 percent confidence limit is less than a regulatory standard, there is 95 percent confidence that the true concentration is less than the regulatory standard, and the corresponding (and potentially less costly) disposal options can be pursued. DETECTION AND QUANTIFICATION The previously described WLS prediction limit y ^ 0 corresponds to the concept of a decision limit LC defined by Currie (1968) for the case in which the data arise from a calibration experiment, and at x = 0 are unknown, and one wishes to make a detection decision for a single future test sample. Measured concentrations (or instrument responses) that exceed the UPL should yield the binary decision of "detected" with (1 )100 percent confidence. Note that when the true concentation x = LC , the probability of exceeding the UPL is only 50 percent. As such, Currie defined the

OCR for page 151
160 ASSESSMENT OF AGENT MONITORING STRATEGIES FOR BGCAPP AND PCAPP detection limit LD as the 95 percent UPL for a true concentration at LC . The WLS estimate of LC is therefore ts 1 ( LC xw ) 2 LC = w s 2 LC n n (39) b1w (1/k ) ( x x i =1 i i =1 i w 2 ) /k i and the WLS estimate of LD is tsw 1 ( LD xw ) 2 LD = LC 2 sL n n (40) b1w D (1/k ) ( x x i =1 i i =1 i w 2 ) /k i 2 Note that in order to compute LC and LD , one must have estimates of sL and C 2 s LD , which are often unavailable and must be estimated using a model of standard deviation versus concentration, as previously described. The final estimates of LC and LD are obtained from simple repeated substitution beginning from LC = 0 and LD = LC until convergence (i.e., change of less than 10 4 in estimates of LC and LD on successive iterations). Finally, Currie (1968) defined the limit of determination LQ as the concentration at which the signal-to-noise ratio is 10 to 1. In the current context, one can estimate LQ by identifying the true concentration at which the estimated standard deviation is one- tenth of its magnitude. Again, a simple iterative approach generally performs quite well (Gibbons and Coleman, 2001; Gibbons et al., 1997). REFERENCES Bhaumik, D. and R. Gibbons. 2005. Confidence regions for random-effects calibration curves with heteroscedastic errors. Technometrics 47(2): 223-231. Carroll, R. and D. Rupert. 1988. Transformation and Weighting in Regression. Boca Raton, FL: CRC Press. Currie, L. 1968. Limits for qualitative detection and quantitative determination: Application to radiochemistry. Analytical Chemistry 40(3): 586-593. Currie, L. 1995. Nomenclature in evaluation of analytical methods including detection and quantification capabilities. Pure and Applied Chemistry 67: 1699-1723.

OCR for page 151
APPENDIX D 161 Gibbons, R. 1995. Some statistical and conceptual issues in the detection of low-level environmental pollutants. Environmental and Ecological Statistics 2(2): 125-145. Gibbons, R. and D. Bhaumik. 2001. Weighted random-effects regression models with applications to interlaboratory calibration. Technometrics 43(2): 192-198. Gibbons, R. and D. Coleman. 2001. Statistical Methods for Detection and Quantification of Environmental Contamination. New York, N.Y.: John Wiley & Sons, Inc. Gibbons, R., D. Coleman, and R. Maddalone. 1997. An alternate minimum level definition for analytical quantification. Environmental Science & Technology 31(7): 2071-2077. Neter, J., W. Wasserman, and M. Kutner. 1990. Applied Linear Regression Models, 2nd Ed. Homewood, IL: McGraw-Hill/Irwin. Rocke, D. and B. Durbin. 1998. Models and Estimators for Analytical Measurement Methods with Non-constant Variance. Davis, Calif.: University of California at Davis, Center for Image Processing and Integrated Computing. Rocke, D. and S. Lorenzato. 1995. A two-component model for measurement error in analytical chemistry. Technometrics 37(2): 176-184. Snedecor, G. and W. Cochran. 1989. Statistical Methods. Ames, IA: Iowa University Press.

OCR for page 151