Review of Chapter 4
Chapter 4 raises the following question: what is our understanding of the contribution made by observational or methodological uncertainties to the previously reported vertical differences in temperature trends? The chapter gives separate discussion to three main layers of the atmosphere (stratosphere, mid/upper troposphere, and lower troposphere) and to surface data on both sea and land. For the atmospheric data, there is separate discussion of radiosonde and satellite data, although less attention is given to radiosonde data because of the well-known historical bias in radiosonde stratosphere measurements (which also affects averages computed for the troposphere). Much of the discussion is devoted to the conflict between the University of Alabama, Huntsville (UAH) and Remote Sensing Systems (RSS) reconstructions of the middle and upper troposphere, with particular emphasis on the consequences of different calibration corrections the two groups make for one satellite (NOAA-9). The possibility of combining Microwave Sounding Units (MSU) channels (work by Fu et al.) is mentioned, though dismissed as controversial. The chapter also mentions, but similarly dismisses, the use of reanalyses. The last part of the chapter is about the different sources of bias in surface data.
As discussed previously, there is some overlap between Chapters 2 and 4. The report would benefit from some reorganization to more clearly focus the two chapters; a suggestion for what material belongs in which chapter is provided in the committee’s comments on Chapter 2. The emphasis throughout is on “structural” sources of uncertainty, as opposed to statistical uncertainty which has been dealt with in Chapter 2 (for reasons that are not quite clear because it would seem more logical to deal with all the sources of uncertainty together). Within its thus-defined scope, the chapter does a generally good job of describing the different sources of bias in temperature reconstructions. The following comments are aimed at improving the chapter discussion.
1. There should be better discussion of the Fu et al. approach. Simply stating that this is controversial is a value judgment and not an adequate reason for dismissing the approach. The review panel sensed that some of the authors had more specific objections to the approach, but these are not adequately documented. For example, why should it be
a problem that the approach uses negative weights for part of the signal? The ultimate goal here is to eliminate or reduce stratospheric contributions to middle troposphere temperature trends. As described in Line 184, about 10 percent of the weight of Channel 2 comes from the stratosphere, but the integrated weight for Fu et al. weighting function is near zero. As stated on Line 187, the stratospheric contamination on TMid-Trop trends is about 0.05 K/decade, while the trend uncertainty due to the uncertainty in derived coefficients in the Fu et al. method is only about 0.01 K/decade. The potential for incorrect stratospheric temperatures to corrupt the mid-tropospheric values should receive greater emphasis in Chapter 4 and Chapter 5. In conclusion, the Fu et al. method appears to reduce stratospheric contributions and may represent a valuable resource for this report. The report could, if appropriate, include references to more recent work of Fu et al. and possibly other authors. The new papers might give more insights on controversial issues of negative weights in the Fu et al. method and its impacts on trends. The Fu et al. (2004) reference on line 487 is missing in the chapter’s references.
2. The report gives a very even-handed discussion of the reasons for different trend estimates by UAH and RSS. Is there any way to go further, for example by stating which approach is better or proposing ways to reconcile the two approaches? Would the authors recommend further statistical analyses? If so, what form should these take? It appears to be the case that, although issues with diurnal corrections and the calibration target are important, these are not the major reason why the two groups obtain different trend estimates. These differences appear to hinge on the different treatments of the NOAA-9 satellite. It may be possible to do better using Bayesian statistical methods. For example, one could treat the unknown shift of the time series (resulting from the change of satellites) as a parameter with a prior distribution, construct a posterior distribution by analyzing data from both satellites, and then integrate out that posterior distribution using Monte Carlo methods to derive a reconstructed time series that allows for uncertainty in the shift. This method could potentially work better than current methods when there is only a very limited amount of overlapping data. In lines 294-300, the authors use the lack of a diurnal correction in the University of Maryland (UMD) dataset as an excuse for not discussing it. Because of the differences between UAH and RSS and the small residual uncertainty from diurnal sampling, it could be informative to use the UMD dataset as an independent check to understand and possibly reconcile the differences between UAH and RSS. The suggestion that the correction for target temperature is a function of latitude (or orbit relative to the Sun), as done by the UMD group (Grody et al., 2004), but not by UAH and RSS, is an interesting one and builds in some diurnal cycle corrections. These issues ought to be discussed openly.
3. Overall, the satellite uncertainty is summarized in detail and in depth, while the radiosonde uncertainty is described in less detail and less quantitatively (see below for more detailed comments). There is no discussion of the strengths and weaknesses of homogenized methods used by different dataset groups. There is a lack of attention to developing physical-based correction schemes. For example, radiosonde radiation error is the main source of errors for upper troposphere and lower stratosphere temperatures. it appears that none of the groups has implemented radiation corrections to non-corrected historical data or adjusted applied corrections. It is true that the trend analysis relies more on long-term homogeneity than on the absolute accuracy. But accurate data throughout the period would minimize the temporal inhomogeneity and can be used for other studies.
Also, the report has no discussion of missing data within a month for radiosonde data. In Hadley Center Radiosonde Temperature (HadAT) only 12 soundings are required to make a monthly mean and two monthly means to make a season; there is no allowance for this in the error bars. Missing months are especially an issue in the tropics, where records are woefully incomplete, as shown by Hurrell et al. (2000). Free and Seidel (2005), however, find missing monthly data to have a fairly minor effect on trends.
4. There is no discussion of statistical uncertainties in methodologies for calculating trends, calculating monthly mean values and creating global time series (i.e., spatial averaging techniques for radiosonde data). Some of this discussion appears instead in Chapter 2. Somewhere (Chapter 2, Chapter 4, or a separate appendix) there should be a separate section on statistical methods for estimating trends in time series, including standard errors or other measures of statistical uncertainty.
5. The largest discrepancy between radiosondes and satellite estimates of trends is in the stratosphere. More detailed discussions on the stratosphere discrepancy are needed in Section 2. Section 2.1 briefly describes two uncertainties associated with undetected changes in instrumentation and early bursting of balloons in early radiosondes. There can be significant biases in the radiosonde temperature data in the stratosphere due to radiation errors. Both radiosonde datasets do not include physical models for radiation adjustments. Durre et al. (2002) show that Luers and Eskridge (1998) adjustments make radiosonde temperatures more homogeneous in the stratosphere, although it frequently amplifies the discontinuities in the troposphere. Regarding the statement “The discrepancy … is likely to be mostly due to pervasive uncorrected biases in the radiosonde measurements” on lines 96-98, can the authors be more specific about what those uncorrected biases are? What about time lag errors of radiosonde data that could cause a cold bias in the stratosphere? There are minimal discussions on the largest disparities in the tropics between two radiosonde datasets and between radiosonde and satellite data in Figure 6.2.2. in Chapter 3. How does the difference in station distributions between these two radiosondes contribute to the largest discrepancy in the tropics? Is the enhanced cooling in the tropics relative to the midlatitudes in the stratosphere in radiosonde datasets due to a lack of sampling over open oceans, or is it due to larger adjustments associated with the switch to Vaisala radiosondes for most of tropical stations? It seems that the former has minor impacts because Figure 6.2.3 in Chapter 3 shows that the stratosphere trends in the tropics are zonally uniform.
6. It seems that the difference in homogeneity adjustment methods is the main contributor to disagreements in trends among different radiosonde datasets presented in Sections 2.1 and 3.1. Do the adjustments reduce or increase the discrepancies in trends (by comparing the trends before and after the adjustments)?
1. In lines 176-193, does the bias in the radiosonde-derived TMid_Trop from stratospheric errors have the same magnitudes of about 0.05 K/decade for NOAA and UK-Met datasets? As shown in the middle panel of Figure 6.2.2 in Chapter 3, the difference between TMid-Trop-U and TMid-Trop-N at around 5N is about 0.1 K/decade. Adding
~0.05 K/decade to both datasets still cannot explain the large disparity between two datasets at this latitude.
2. In lines 335-347, how can the uncertainty of the lower troposphere temperature record be consistent with the mid-troposphere uncertainty, especially given that the mid tropospheric record is biased low by contaminating lower stratosphere influences?
3. Section 4.3 fails to examine root mean square (RMS) differences (e.g., Hurrell et al., 2000) and only deals with average trends.
4. The surface record also has problems that are not discussed in Section 5.1 of Chapter 4. In particular, no error bars are assigned to the systematic corrections.