a problem that the approach uses negative weights for part of the signal? The ultimate goal here is to eliminate or reduce stratospheric contributions to middle troposphere temperature trends. As described in Line 184, about 10 percent of the weight of Channel 2 comes from the stratosphere, but the integrated weight for Fu et al. weighting function is near zero. As stated on Line 187, the stratospheric contamination on TMid-Trop trends is about 0.05 K/decade, while the trend uncertainty due to the uncertainty in derived coefficients in the Fu et al. method is only about 0.01 K/decade. The potential for incorrect stratospheric temperatures to corrupt the mid-tropospheric values should receive greater emphasis in Chapter 4 and Chapter 5. In conclusion, the Fu et al. method appears to reduce stratospheric contributions and may represent a valuable resource for this report. The report could, if appropriate, include references to more recent work of Fu et al. and possibly other authors. The new papers might give more insights on controversial issues of negative weights in the Fu et al. method and its impacts on trends. The Fu et al. (2004) reference on line 487 is missing in the chapter’s references.
2. The report gives a very even-handed discussion of the reasons for different trend estimates by UAH and RSS. Is there any way to go further, for example by stating which approach is better or proposing ways to reconcile the two approaches? Would the authors recommend further statistical analyses? If so, what form should these take? It appears to be the case that, although issues with diurnal corrections and the calibration target are important, these are not the major reason why the two groups obtain different trend estimates. These differences appear to hinge on the different treatments of the NOAA-9 satellite. It may be possible to do better using Bayesian statistical methods. For example, one could treat the unknown shift of the time series (resulting from the change of satellites) as a parameter with a prior distribution, construct a posterior distribution by analyzing data from both satellites, and then integrate out that posterior distribution using Monte Carlo methods to derive a reconstructed time series that allows for uncertainty in the shift. This method could potentially work better than current methods when there is only a very limited amount of overlapping data. In lines 294-300, the authors use the lack of a diurnal correction in the University of Maryland (UMD) dataset as an excuse for not discussing it. Because of the differences between UAH and RSS and the small residual uncertainty from diurnal sampling, it could be informative to use the UMD dataset as an independent check to understand and possibly reconcile the differences between UAH and RSS. The suggestion that the correction for target temperature is a function of latitude (or orbit relative to the Sun), as done by the UMD group (Grody et al., 2004), but not by UAH and RSS, is an interesting one and builds in some diurnal cycle corrections. These issues ought to be discussed openly.
3. Overall, the satellite uncertainty is summarized in detail and in depth, while the radiosonde uncertainty is described in less detail and less quantitatively (see below for more detailed comments). There is no discussion of the strengths and weaknesses of homogenized methods used by different dataset groups. There is a lack of attention to developing physical-based correction schemes. For example, radiosonde radiation error is the main source of errors for upper troposphere and lower stratosphere temperatures. it appears that none of the groups has implemented radiation corrections to non-corrected historical data or adjusted applied corrections. It is true that the trend analysis relies more on long-term homogeneity than on the absolute accuracy. But accurate data throughout the period would minimize the temporal inhomogeneity and can be used for other studies.