Appendix B
In-Lake Analysis Methodology for Metals, Nutrients, and Dissolved Oxygen
This appendix details the committee’s methodological trends analyses of CDA Lake data on dissolved oxygen and nutrients (Chapter 5) and metals (Chapter 6).
DISSOLVED OXYGEN TRENDS ANALYSIS METHODOLOGY
The first step in the analyses of dissolved oxygen trends in Chapter 5 was screening out problematic raw data and then standardizing the analyses within each sampling station to maximize data comparability. When analyzing the vertical profile data, the committee started by screening out any data collected on sonde upcasts (see Chapter 8). An upcast is when the sonde is retrieved from the bottom of the profile to the top while recording data. This is evident by comparing the time stamp for sonde readings to the corresponding depths. Upcast data are invalid because when the probe is being pulled upward through the water column it creates a void behind it, which gets filled with water that has been turbulently mixed and displaced upward. Conversely, downcast data are collected as the probe is being lowered through the water column; therefore, the sensors that are located on the bottom of the probe encounter laminar conditions as the sonde is lowered through the water column. Therefore, upcast data are very noisy and offset compared to downcast data. In most cases, only a small portion of the profile data was discarded. However, many profiles collected during 2016 were taken as upcasts. Upcast data should not be used, and people collecting profile data from CDA Lake in the future should be warned not to collect vertical profiles in the reverse order.
The first goal when analyzing the profile data was to characterize seasonal and depth trends in the dissolved oxygen data collected at each station. Once the times and depths with the lowest oxygen concentrations were identified, the committee used this information to design its long-term trend analyses. To look for general seasonal/depth trends in these data, the committee sorted all of the profile data collected by their Julian dates (i.e., days 1–365 within a year) and plotted all of the data collected from different years as a single annual time series. Because there was considerable year-to-year heterogeneity in sampling dates and even sampling depths, the committee binned these data into weekly and 1-meter intervals. When doing these preliminary analyses, it was evident that the script used to interpolate the data for these plots (rLakeAnalyzer1) created odd outcomes
___________________
1 https://cran.r-project.org/web/packages/rLakeAnalyzer/rLakeAnalyzer.pdf
in regions of the overall time-depth sampling matrix that were poorly represented by real data. For example, at station C1 the majority of the vertical profiles ended at a depth of 41 m, but a small number of profiles extended down to 42–44 m (n = 6 for the entire C1 dataset). Because the deepest profile extended to 44 m, the script output estimated results for all dates down to this depth. In the vast majority of cases, the results for depths 42–44 m were based on interpolations, and some of these interpolated data were suspiciously different from the observed data at adjacent depths. Therefore, the committee changed all of the 42–44 m depths in the dataset to 41 m to eliminate strange interpolated values. Because most profiles collected at the C4 station ended at approximately 39 m, all of the observations below that depth at this station were rounded to this depth. Similarly, the deeper data collected from station C5 were rounded to 13 m, and the deeper data from station C6 were rounded to 11 m. Then the committee used the R-script (rLakeAnalyzer) to create annual time-depth dissolved oxygen isopleths for stations C1 and C4, where the most data were available, and for stations C5 and C6, where dissolved oxygen depletion was most evident. Because these plots are based on a time by depth calculated matrix of oxygen concentrations, they should not be biased by inconsistencies in the temporal and spatial frequency of sample collection.
Once the annual patterns in depth average dissolved oxygen concentrations were characterized, the committee analyzed the long-term dataset for intra-annual trends for the same sites. In this case, the data were binned by 1-meter and monthly increments. As already noted, the occasional deeper sonde readings were rounded to the most common deepest sonde reading for each sampling station. When examining the long-term vertical profile datasets, it was also apparent that ≈ 20 percent of sampling dates utilized a greatest sampling depth that was less than the previously noted most common sampling depth. For example, at the C1 station, 40 m was the most common deepest sampling depth, but in 21 percent of the profiles, the deepest depth sampled was shallower. Missing 40-m data were common before 2010 and rare afterward. Because the script used to interpolate the depth profiles is quite sensitive when the deepest depth is missing, the committee estimated the oxygen concentrations for 40 m using the next available depth, which in 75 percent of cases was either 38 or 39 m. The committee estimated the missing 40-m data by generating statistical relationships between the oxygen concentrations at lesser depths and at 40 m when data were available for both and applying this equation to estimate the 40-m values when direct observations were not available. For example, when predicting the 40-m oxygen concentrations from the 35-m data, the fit between the two datasets was strong (r2 = 0.96), with predicted dissolved oxygen concentration at 40 m being 0.74 mg L−1 lower when the observed concentration at 35 m averaged 7.0 mg L−1 (the lower end of observed values) and 0.17 mg L−1 lower when the observed concentration at 35 m averaged 12.0 mg L−1 (the upper end of observed values).
These calculations generated a series of vertical dissolved oxygen profiles interpolated at 1-meter increments from the surface to the most common deepest depth sampled for each month from January 1991 to April 2021 for stations C1, C4, and C5. These profiles were then used to generate monthly matrices of surface and bottom dissolved oxygen concentrations for all of the years for which data were available. For stations C1 and C4, which were deep, the upper average was for 1–10 m and roughly represented the surface mixed layer or epilimnion when CDA Lake is stratified. The bottom layer for these stations was the deepest 10 m for which data were available. Ideally, the bottom layer average would only represent the area immediately above the sediment–water interface because the presence or absence of oxygen at this interface determines the fate of many biogeochemical reactions in the lake sediments. However, the exact depth of this interface was not known when analyzing these data, and the availability of oxygen data at the deepest depths in these profiles was not consistent. At the C5 station, which is much shallower, the surface was represented by the 1–5 m average and the bottom was represented by the 14–17 m average. The dissolved oxygen data for the C6 station were analyzed slightly differently than described above.2 For this station, the surface averages were calculated from all data collected at depths ≤ 4 m, and the bottom averages were
___________________
2 C6 data were analyzed slightly differently because this was the first station analyzed (since it had the lowest O2), and because it was shallow, and using the r-script to interpolate the data would not have saved time.
calculated from all data collected at depths ≥ 8 m (i.e., 8–11 m) without any data interpolation. When two profiles were available for the same month (as was often the case for June–November since 2011), these values were averaged.
METHODS FOR IN-LAKE TREND STATISTICAL ANALYSIS
Locally Weighted Scatterplot Smoothing
One important feature of any method for evaluating long-term trends in water quality data is that the method must be able to reveal the presence of non-monotonic trends. As discussed in Chapter 3 and Appendix A, for the river data, some of the constituents of interest (specifically, lead and phosphorus) at some sites, when characterized using the Weighted Regressions on Time, Discharge, and Season method, revealed increasing trends over a decade or two, followed by decreasing trends in the most recent decade. Given that pattern in river inputs to the Lake, it is reasonable to assume that similar non-monotonic trends are possible for lake chemistry data; hence, the committee used locally weighted scatterplot smoothing.
The first step in processing the data was to plot them and determine the date at which the detection limits appear to be relatively stable, and the frequency of data collection is generally monthly or more frequent and includes both surficial water and the deeper water. All of the datasets considered here end at the end of calendar year 2020, but the starting year varies from as early as 1991 to as recent as 2004 (the record lengths are given in the tables associated with these results). The data were separated into deeper water and shallower water. For C1 and C4 (sampled by the Idaho Department of Environmental Quality), the definition of deep is > 21 meters. For C5, C6, and SJ1 (sampled by the CDA Tribe) the shallower category is denoted as euphotic zone or epilimnion (typically with sampled depths of 6 to 10 meters), and other samples are referred to as bottom water, and their sampling depths are not recorded in the database. Censored data (those observations reported as “less than” some reporting limit) were treated as values that are half the reporting limit. If the number of censored values were large, this would be an inappropriate method, but in these datasets the percentage of data points that are censored is rather low, and thus this approach makes little difference (and is certainly preferable to deleting the data or treating them as values of zero or the reporting limit). Of the 19 datasets evaluated here, 6 of them had no censored data, the median percentage of censored data was 0.7, and the maximum was 5 percent. The metals datasets from C6 and SJ1 had at least 52 percent of their data censored and most had 94–100 percent of their data censored; hence, they were eliminated from consideration for that reason.
Locally weighted scatterplot smoothing uses the loess function in R (based on Cleveland et al., 2017) to show the change in concentration as a function of time. It was applied separately to the shallow and deep datasets. The smoothing parameter (called the “span”) was set to its default value of 0.75 and the fitting used a second-degree polynomial. The data were pre-processed by computing a median value for each day in the record, for both the shallow and deep water datasets, because there are often replicate samples taken on any given day. The fitting was applied to the logarithms of these daily median concentration values, and, as such, the fitted line is an estimate of the median of the data (rather than the mean). The reason the logarithms are used is to avoid oversensitivity to some of the extreme high values and to make the model residuals more nearly normal. The idea of this approach (loess on the logarithms) is to show the central tendency of the data as it changes over time.
Looking at the plotted data points, particularly from 2017 to 2020, many of the datasets suggest very steep downward trends in these final years. By design, these loess curves are designed to avoid abrupt changes in slope and tend de-emphasize what might be rapid changes in values near the start or end of the dataset. One could apply a loess smooth to the data with a substantially narrower smoothing window (lower span value). This would show the downward trends in the more recent years as being substantially steeper, but it would be intuitively difficult to argue that the many oscillations that would appear are realistic interpretations of changes in the system. This steep downward trend in the final years of some of these datasets is rather unexpected. Reevaluating the data after another one or two years have gone by will be critical to evaluating how the system is changing. The amounts of
change presented in the subsequent tables are based on these loess curves and are an average of the changes in the shallow and deep waters. For example, “Change over past 10 years” is computed as follows:
where:
Δ and Δd are the change in percent for the shallow and deep water, respectively;
cs(tm) is the loess estimate of median concentration for the shallow data at time tm, which is the maximum of the time variable in the loess model expressed in decimal years (typically about 2020.95, near the end of calendar year 2020);
cs(tm − 10) is the estimated median at a time 10 years before tm; and
cd values are the loess estimates for the median concentration for the deep water data.
The loess model results are also used to provide estimates of the median at the end of 2020 for both the shallow and deep water data, in order to quantify differences among sites and between the shallow and deep water data.
Seasonal Kendall Test
The second approach uses the Seasonal Kendall test (Hirsch et al., 1982), which is widely used in environmental science applications for datasets that are expected to exhibit seasonality. The test is based on comparisons of data for each season (typically a month) from each of the years in the record and does not make comparisons across seasons. For example, it compares June values from one year to June values from another year but not June values to October values. It assumes that the probability distribution for June is different from the probability distribution for October. It is a non-parametric test, which means that it requires very minimal assumptions regarding the data. Specifically, no probability distribution is assumed for the data, no particular seasonal pattern (such as a sine wave) is assumed, and the trend is not assumed to be linear with time. However, the test is only a test for monotonic trends (meaning that a dataset with a trend that looks like a letter “U” or an inverted “U” would not be seen as having a trend). It also is not sensitive to substantial differences in the amount of trend in different seasons. For example, if six months showed upward trends and six months showed downward trends, the result might indicate that there is no trend in the record. Censored data create no particular problems for the test provided the reporting limits do not change over time (censoring is minor here, and the reporting limits do not change greatly over time). The test is performed separately on the median value for each sampled day, both for the deep water dataset and the shallow water dataset.
The implementation used in this analysis (the kendallSeasonalTrendTest in the EnvStats package in R) is documented in Millard (2013). The purpose for using it here is simply to evaluate if the observed trend evidence is stronger than what we might expect by chance alone if the true conditions were totally without trend. The results are categorized in the tables by colors, where the deep blue (for downward trends) and deep red (for upward trends) colors indicate trends that are statistically highly significant (p < 0.01). The lighter shades of blue and red indicate trends for which the p values lie between 0.10 and 0.01 (moderate statistical significance). Those trends that do not attain a p < 0.05 are shown with no shading.
Commonly, when the Seasonal Kendall test is used, the seasons identified are equivalent to the calendar months. This application departs from that in two ways. The first departure arises because sampling in the months of January and February is rather limited. Thus, data from those two months were grouped into a single two-month period while the rest of the year the periods are equivalent to the calendar months. The other departure is based on the recognition that there is typically a difference in the distributions of the data between the shallow and the
deep data. This can be seen by the rather regular separation of the red and blue curves on some of the loess plots. As such, the “seasons” used here are actually defined by a particular depth group and month of the year. So, in this application, for each site and analyte pair, the number of “blocks” of data is 22. (In statistics, a “block” is a subset of the full dataset within which comparisons are made, but comparisons between blocks are not made.) The blocks are a combination of the 11 time periods and the two depth groups. The Seasonal Kendall test then performs a Mann-Kendall trend test on each of the 22 blocks of data and sums these results to form an overall test statistic, and the variance of that test statistic under the null hypothesis (the null hypothesis is that all of the 22 blocks are trend free) is the sum of the variances for each of the 22 blocks. The p-value for the test is based on the overall test statistic and its variance. It has been shown that this test statistic is well approximated by a normal distribution under the null hypothesis, regardless of the underlying distribution of the data.
The Seasonal Kendall test for each dataset is done twice: the first is on the full duration of the dataset and the second is on the last 10 years of the dataset (2011–2020). The Seasonal Kendall test does not directly inform us as to differences among the trends for the different blocks. For example, there may be trends in some months and not in others, and there could even be positive trends in some months and negative trends in others. These types of comparisons can all be made from the full set of outputs, but they were not evaluated or reported here. As part of a long-term program of trend studies to be carried out on the Lake, these kinds of differences would be a topic of interest.
In the analyses of lake dissolved oxygen (Table 5-10) and lake chlorophyll (Table 5-11), the Seasonal Kendall test was used. In the case of dissolved oxygen, the test was run on all 12 months but only for the deep water values. The slopes are based on the Theil-Sen slope estimator on the mean concentration for that month. Trend slopes are expressed as mg/L/yr, and the color codes are based on the significance of the trend for that month. These results are then summarized by the Seasonal Kendall test, and the overall slope is computed as the median of all pairwise slopes computed over all 12 months. These trend results were computed using the rkt function in the rkt R-package. The same approach was used for the shallow water chlorophyll data except for the fact that the seasons were collapsed to a total of 10. That is, because data are very scarce in the months of November through February, the months of November and December were combined into an individual season and the months of January and February were combined into an individual season, and all of the eight remaining months were defined as seasons. Here again, the rkt function was used to analyze the data.
REFERENCES
Cleveland, W. S., E. Grosse, and W. M. Shyu. 2017. Local regression models. Pp. 309–376 In: Statistical models in S. Routledge.
Hirsch, R. M., J. R. Slack, and R. A. Smith. 1982. Techniques of trend analysis for monthly water quality data. Water Resour. Res. 18(1):107–121. doi:10.1029/WR018i001p00107.
Millard, S. P. 2013. Hypothesis Tests. In: EnvStats. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8456-1_7.
This page intentionally left blank.