Evaluation of the Methodology to Determine Background Concentrations in the Lower Basin
The determination of background concentrations for compounds of potential concern (COPCs) in the lower basin is described in the Final Technical Memorandum (Rev. 3) Estimation of Background Concentrations in Soil, Sediment, and Surface Water in the Coeur d’Alene and Spokane River Basins (URS Greiner, Inc. and CH2M Hill 2001) (Background Technical Memo). Although the upper basin, lower basin, and Spokane River are addressed in the memo, only the lower basin is considered in this Appendix.
The data to determine these background concentrations were derived from an ambitious coring study conducted in the lower basin to determine the vertical extent of metal contamination and estimate the volume of contaminated sediments within the basin (URS Greiner, Inc. and CH2M Hill 1998). In this study, a multitude of cores were taken in the lateral lakes, lower basin floodplain, and the river.
The metals concentration data from these cores were assembled into a database, which was processed by the ten-step method described in the Background Technical Memo (Section 3.2, pp. 3-4 to 3-6) and is evaluated below.
It appears that the proposed basis of the ten-step method is this statement made in Step 1:
For each COPC, the distribution of the pooled data was identified as lognormal and a lognormal CFD (cumulative frequency distribution) of the pooled data set (283 samples for each COPC) was plotted with log concen-
tration in milligrams per kilograms (mg/kg) as the independent variable and the normal standard variate of the population as the dependent variable using the methods described in Section 3.1 (see URS Greiner, Inc. and CH2M Hill 2001, Fig. A-11).
On a lognormal CFD plot, a pooled data set containing both background and contaminant concentrations will ideally show two distinct populations identifiable by their distinct slopes, separated by a transition zone of rapidly escalating concentrations. The population with lower concentrations represents background, while the population to the upper right of the distribution is taken to represent contaminated sediments.
No clearer definition of what is considered background is provided; it appears from the procedures adopted that the “distinct population” with lowest concentration is assumed to be the distribution of background concentrations, and this is how we interpret the data below. It is not described how “the pooled data was identified as lognormal,” but they clearly are not for any COPC. Single lognormal distributions would plot as approximately a single straight line on the plots constructed,1 and the pooled data clearly do not fall along such single straight lines.
It appears to be implied that the observed data are necessarily a probabilistic sum of two lognormal distributions that would plot as two distinct straight lines. However, this implication is false. A probabilistic sum of two lognormal distributions does not plot as two straight lines, and there is no guarantee that there are only two component distributions, nor is there a guarantee that any component distributions are lognormal. In practice, the data on individual COPCs often show plots that approximate the description given in Step 2, and the distributions for individual COPCs often can be approximated as a sum of lognormals, but it is not necessarily possible to discern by eye on such plots how many component lognormals are necessary to fit the data adequately.
Practically, there is reason to suggest that the assumption of two populations—background and contaminated sediments—is too simplistic, especially considering the environment being modeled. These proposed sediment populations would exist in a continuum with each other and vary greatly through time as background sediments and tailings interacted in varying proportions based on the dynamic interaction of flooding events,
tailings production, changing mining technologies (for example, stamp and jig techniques versus flotation), tailings disposal practices, secondary releases of tailings, and input of sediments from unaffected watersheds and floodplains. Also, as mentioned in the text of Chapter 4, the large sample intervals used in the coring studies have the potential for sampling both pre-and postmining sediments in a single analysis.
Steps 2 and 3 of the procedure are subjective because they call for visually selecting a straight line “through the lower bound population” and selecting a location where the data plot “diverges from” this straight line.
Steps 4 and 5 call for plotting on a similar “lognormal CFD plot” the data lying below the point of divergence identified in Step 3 and the least-squares fitting of a line to those data. Although least-squares fitting is an objective procedure, there is no objective basis for selecting an unweighted least-squares fitting procedure, and there is good reason not to, because even for a true lognormal distribution the variation of plotting points away from their expected values is heteroskedastic.
Step 6 calls for constructing a line bisecting the two lines constructed so far (the “visually fit tangent line and the lower bound data population regression line”). No basis is supplied for selecting a bisecting line rather than any other. Step 7 selects the 95th percentile on this line (the value of the abscissa at ordinate 1.645). Again, no basis is supplied for the selecting the 95th percentile.
Steps 8-10 then select the data points below the selected 95th percentile as being representative of the background lognormal distribution and use least-squares fitting to estimate the parameters of it.
The overall effect of this ten-step process is to obtain estimates that artificially truncate the background distribution of concentrations, assuming that it is lognormal.
The Background Tech Memo states (p. 3-6) the following:
This approach is believed to provide a reliable means of estimating background concentrations for each COPC in the Lower Basin. This approach is supported by both empirical testing and statistical evaluation of the best-estimate background data set. In all cases, the identity of the best estimate background data set as a distinct population representative of background is supported by high r2 values.
No indication is given of what empirical testing or what statistical evaluation has been performed. Overall, the evaluation indicates that the procedure is subjective and contains several assumptions unsupported by any documented statistical theory. However, as mentioned in the text, the background concentration for lead in lower basin sediments appears reasonable, considering evaluation of the metals analysis data from the cores and other studies assessing background concentrations in the lower basin.
If this type of mathematical analysis is to be used, the following suggestions are provided:
Explicitly define the assumptions behind the analysis applied to obtain estimates of background distribution.
Adopt objective techniques to obtain the parameters of interest with known uncertainty bounds (for example, the ten-step process relies on subjective approaches).
Use appropriate statistical techniques, either explicitly proving any required statistical properties or citing literature for such support (for example, there is no evidence that the ten-step process is reasonably unbiased, and no estimator of its uncertainties is available).
Implement adequate quality control to ensure that all the data used are included in the report—for example, the data for zinc concentrations in sediments of the lower basin are not provided in the report as they are for the other metals (URS Greiner, Inc. and CH2M Hill 2001, Table C-2).
Royston, P. 1993. A toolkit for testing for non-normality in complete and censored samples. Statistician 42(1):37-43.
Royston, P. 1995. AS R94 A remark on algorithm AS 181: The W-test for normality. Appl. Statist. 44(4):547-551.
URS Greiner, Inc., and CH2M Hill. 1998. Sediment Contamination in the Lower Coeur d’Alene River Basin (LCDARB): Geophysical and Sediment Coring Investigations in the River Channel, Lateral Lakes, and Floodplains. Bunker Hill Facility Basin-Wide RI/FS Data Report, Vol.1, Section 1-9. Contract No. 68-W-98-228. Prepared for U.S. Environmental Protection Agency, Region 10, Seattle, WA, by URS Greiner, Inc., Seattle, WA, and CH2M Hill, Bellevue, WA. October 1998.
URS Greiner, Inc., and CH2M Hill. 2001. Final Technical Memorandum (Rev. 3) Estimation of Background Concentrations in Soil, Sediment, and Surface Water in the Coeur d’Alene and Spokane River Basins. URSG DCN 4162500.6790.05a. EPA Site File No. 2.7. Prepared for U.S. Environmental Protection Agency, Region 10, Seattle, WA, by URS Greiner, Inc., Seattle, WA, and CH2M Hill, Bellevue, WA. October 2001.