Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 11
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop 2 Cross-Cutting Issues The December 2008 workshop on uncertainty management for remotely sensed climate data identified three major issues that cut across multiple areas of climate research, illustrating the need for a more sophisticated framework for data analysis in order to better understand climate processes: The challenge of validating remotely sensed data. The need to stay focused on the users of climate information, including policymakers. Even if an analysis is perfect and the statistical tools work, the results might not be interpretable and useful for the end-user. The need for strengthening collaboration between the earth science and statistical communities, the importance of leveraging the strengths of each, and the challenges inherent in doing so. VALIDATION OF REMOTELY SENSED CLIMATE DATA Validation of parameters is an essential component of nearly all remote sensing-based studies and there are many considerations in performing validation. Errors in different validation techniques are complex and difficult to quantify. The workshop participants discussed many challenges of validating remotely sensed clouds, precipitation, winds, and aerosols, though the presentations did not go into great detail on methodologies. Some common questions include: Can the data meet the needs
OCR for page 12
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop of remote sensing application? What is the accuracy of the data product, and retrieval product? The following section offers examples of validation techniques for remotely sensed climate data. Techniques for validating remotely sensed data vary for different geophysical parameters (e g., winds, aerosols, precipitation, clouds). The most common technique is comparing ground measurements to remote sensing observations or modeled results. In many situations, a mismatch exists between the sensor’s field of view and the scale at which in situ measurements are collected. Ground-based measurements cover small spatial scales while satellite retrievals cover an area of many kilometers. However, in the process of working through a validation, the real structure of the data can be revealed. This was nicely described at the workshop by Tom Bell from the National Aeronautics and Space Administration (NASA), who presented challenges in remote sensing of precipitation. The use of in situ measurements for model calibration and validation requires a robust method to spatially aggregate ground measurements to the scale at which the remotely sensed data are acquired (Box 2-1). As previously mentioned, defining the uncertainty of model parameters is a continuing challenge, but there are multiple methodologies in validation studies that can be combined in an optimal way. Examples described at the workshop were studies to understand fluxes in atmospheric carbon dioxide (CO2). Some studies will employ measurements of CO2 concentration to infer sources and sinks, while other studies build biosphere models in an attempt to predict the fluxes. These can be combined, using the biospheric models as a first guess followed by a Bayesian framework to integrate the modeled outputs with atmospheric data to get a best estimate of carbon sources and sinks. In this approach, there is an opportunity to account for the uncertainty in the individual parameters as well as the modeling framework that is used to predict the processes of interest. Inherent in many different techniques that are used in processing remotely sensed data is the issue of biases. A workshop participant described that bias in validation studies of some geophysical parameters occurs because of the uneven global distribution of surface cloud observations. The oceans tend to be cloudier on average than most of the land, but there are fewer surface observations over the oceans. For example, if a threshold is set for the number of surface observations present in a 2.5 degree grid box before accepting a data point, the global mean that is calculated will depend on that threshold. A threshold will therefore force parts of the earth (i.e., the southern oceans), which are known to be very cloudy, to be omitted from the averaged data. Furthermore, the samples do not stay constant; measurements in a grid box can change from month to month, which introduces a source of variability that is generated from
OCR for page 13
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop BOX 2-1 Challenges in Validation of Remote Sensing Data: Comparison of Error for Monthly Average Precipitation A common challenge in remote sensing datasets is accounting for errors due to sparse sampling. Tom Bell, a NASA scientist, discussed the challenges for precipitation. In general, it does not rain very often and it is difficult to quantify rain amounts through rain gauges and compare this data with satellite measurements taken over many kilometers. The most common method for validating satellite rain estimates is to compare rain gauge data collected over a time interval during which the satellite passes over, with the satellite rain estimates. However, rain gauges do not actually measure what the satellite sees, and the average rain estimates from rain gauges and satellites differ over spatial and temporal scales. Therefore, it is problematic to account for the validation problem as a difference between the satellite average and rain gauge average. The best approach for validating satellite rain estimates is to create a model of errors in precipitation estimates. A spectral model can be used to help predict the best time interval over which to average the rain gauge data when comparing the gauge measurement with a single overflight of the satellite (Figure 2-1). In addition to understanding the time interval to average gauge data, the average area also needs to be determined. Figure 2-1 shows the various sampling intervals between different satellite visits occurring over one month and demonstrates how the error between satellite averages and (surface) rain gauge averages varies in a complex way with the sampling interval of different satellites and the choice of spatial area over which averages are taken. For example, the error between the rain gauge and the satellite with one sample per day is minimized over larger areas, a swath width approximately 300 km, as shown by the blue curve, representing the Tropical Rainfall Measuring and Aqua missions. However, if satellite measurements were to be taken every three hours, as shown in the black curve, representing the planned Global Precipitation Measurement mission, the error would be minimized by spatially averaging over a swath of width about 70 km. This figure demonstrates not only that there is an optimal point for sampling but that temporal and spatial sampling are related. If satellite data are averaged over 100-km swaths, it is best to look at three-hour data, whereas with 500-km swaths, sampling every 24 hours is sufficient. The curves also show that the spatial phenomenon of rain events tend to occur on the 100-km scale and that there is a “sweet spot” of error for each spatial scale (i.e., three-hr sampling for 70-km swaths). the sampling, rather than the cloud itself. Hence, not covering the domain of a phenomenon completely (in either space or time) leads to biases. Another bias can result in studies of cloud processes if spatial and temporal autocorrelation is ignored. William Rossow, City College of New York, described that the polar orbiting satellite samples low latitudes twice per day and has spatial scales of approximately 2,000 km. Physical processes that evolve rapidly and on scales smaller than 2,000 km are therefore dif-
OCR for page 14
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop FIGURE 2-1 Relationship between spatial and temporal sampling of different satellites used for measuring precipitation (TRMM—blue curve; GPM—black curve). Figure courtesy of Tom Bell, NASA. ficult to quantify. To study cloud processes, both the space and the time scales need to be measured appropriately to see the physical process unfold. It is also important not to ignore temporal autocorrelation in the tropics, where convection is rapid, or this will introduce bias into the results. A monthly mean is not always based on 30 independent samples. Caution should be taken in studies investigating interannual variability as this is often based on monthly means. The assumption that a monthly mean is based on 30 independent samples can lead to what looks like climate variation in the data, when it could actually be statistical noise. Box 2-2 gives an example by Tom Bell, NASA, of biases resulting in rainfall measurements. At the 2008 workshop, Jay Mace, University of Utah, compared and
OCR for page 15
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop BOX 2-2 The Need for a Model of Biases: An Example from Rainfall Measurements Regression analyses between satellite and parameter measurements are commonly used for calibration studies and to evaluate the error level of the retrievals. However, validation with regression analysis is a common source of bias in precipitation studies. For example, comparing long-term averages of rain gauges to three different windows of time during satellite overpass reveals that the relationship between what the gauge sees and what the satellite sees improves as the averaging interval around the satellite overpass is reduced (Bowman, 2005). Conversely, the agreement is poor when the averaging interval is increased. This comparison can be problematic given that the common methodology (e.g., linear regression analysis) to understand the amount of agreement between the remote sensing estimate and the ground estimate is based on the assumption that the ground measurements are accurate. Rain gauges, however, are an imperfect measure of how much it actually rains, have their own set of biases and sampling issues, and thus, this methodology is not appropriate for precipitation studies given that the ground-based estimates are not true values of what the satellite sees. The regression analysis also does not take into account the minimum overlap in sampling. For example, a poor agreement between the satellite and ground-based estimates does not necessarily mean that the satellite is performing poorly. Rather, the regression technique may not be the proper tool for calibration studies. This is where a statistical model will be helpful in not only describing retrieval error, but also useful for validation exercises. An error model can help disentangle spurious biases, those generated from methodology, from the real biases of the instruments. Validation exercises can generate spurious biases, such as trends that look like biases in the remote sensing method which are not really present in the data, but are byproducts of the analysis method. validated cloud microphysical properties, with data from multiple instruments, including satellite measurements, ground-based remote sensing measurements, and aircraft measurements. The premise for this approach is that clouds and precipitation influence the radiation and hydrology of the earth through an evolving vertical profile of microphysics. Therefore, scientists need to become more skillful at deriving the vertical profile of microphysics from remote sensing data in a statistically meaningful way. An example is the comparison of Moderate Resolution Imaging Spectroradiometer (MODIS) derived ice water paths with ice water paths derived from a ground-based validated radar algorithm. One is a snapshot spatially averaged measurement and the other is a time average point measurement. For cirrus clouds, a spatial average is generated by averaging the MODIS measurements over a rectangle that is oriented
OCR for page 16
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop along the mean wind at the cloud level, while the ground-based radar data is an average over a period of time when the cloud layer remained uniform (Mace, 2001). Comparing these measurements over a long period of time will aid in determining the error characteristics of the satellite data. However, the uncertainty in the data, combined with the uncertainty in the science, requires techniques to quantitatively assess these errors, and more sophisticated statistical approaches would be helpful in accomplishing that objective. The validation problem for measurements of aerosols presents other issues. Lorraine Remer of NASA spoke at the workshop about validation of aerosol optical depth (AOD) measurements, which are measures of the column-integrated extinction, and the amount of light that is either scattered or absorbed as it passes through the aerosol layer, which gives an indication of the amount of aerosol. A satellite measures not just the radiation scattered from the aerosol layer, but it also collects some radiation that made it through the aerosol layer from the earth’s surface. The surface effect needs to be removed from the satellite signal to estimate the extinction, which requires assumptions about the aerosol and the surface, thus leaving room for error. A more direct way of measuring AOD is to use a sunphotometer on the ground to measure the transmitted light directly. By combining measurements of the sunlight at the top of the atmosphere with the amount of sunlight at the surface, the extinction can be determined. This approach uses fewer assumptions and under the best conditions MODIS can retrieve AOD to within ±0.03, and a well-calibrated sunphotometer can measure it within ±0.01. The widespread network of sunphotometers called AERONET retrieves data globally. The primary challenge with this technique is the mismatch between spatially varying MODIS data and the temporally varying sunphotometer data so there are only select areas with coincident coverage in measurements between MODIS and AERONET. Since AERONET is a land-based network, it is difficult to match an overpass with an aerosol observation, and, as addressed earlier, the location of the ground-based observation within the satellite grid square is a consideration in validation process. There are many types of aerosols (e.g., sulfates, black carbon, sea salt). Some are natural, while others are anthropogenic. Uncertainties in aerosol models, as presented by Joyce Penner of the University of Michigan, result from uncertainties in the sources, types, and radiative properties of aerosols. Validation of these models cannot come solely from comparisons with ground-based data like that of AERONET, because the measured AOD is a composite of the effects of the different aerosol types the models are attempting to simulate. Aerosol models also cannot be validated with satellite data alone, but require a suite of observational data from surface
OCR for page 17
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop stations of different aerosol species. Sources of uncertainty are also associated with cloud interaction, chemical production, vertical transport, and grid resolution. As noted by several workshop participants, the climate science community does not have a good understanding of several relevant processes, potentially creating biases in the models. Today it is common to produce daily and monthly mean maps of the distribution of aerosols which can carry their own uncertainty. For example, few retrievals available in a particular grid square will generate a higher degree of uncertainty, ultimately affecting the mean distribution maps. Different methods of weighting and averaging the data result in different distribution maps and mean optical depths. Moreover, not all retrievals carry the same level of confidence. A number of options and methods need to be considered when analyzing satellite and sunphotometer data, all within the context of the application of the dataset. THE NEED TO STAY FOCUSED ON THE END-USER Several participants at the workshop noted that greater attention to uncertainties in climate data would help to address important questions in climate research and policy. Likewise, greater attention to uncertainty quantification, in part, can be driven by specific needs of researchers, policy makers, or other end-users of the remotely sensed product. This is in contrast, for example, to the situation in the nuclear weapons laboratories, which have already developed sophisticated methods of uncertainty quantification in order to address the policy question of whether aging warheads remain safe and functional in the absence of complete testing. With regard to testing of climate data and models, experiments are run to better understand specific physical processes, but there is no option for controlled experiments of complete systems. As the uncertainties of remote sensing instruments become better characterized, the question arises of how to represent that information in a way that is useful to researchers who examine the data. How does one capture the uncertainty of an entire dataset so that the end user can use the information? This remains an open question and a difficult one. Researchers, both in the geoscience and statistics communities, need to become very familiar with a dataset, the data-collection process, and the applications of the data in order to understand all the issues and assumptions that are associated with the data and its use. Understanding the uncertainties of different processes in the climate system requires different approaches. For example, a challenge recognized by the community is reconciling initial states in models with the observational data. The initial states must reflect that uncertainty, thus, every dataset needs attention by the scientists to account for that uncertainty in the forecast model.
OCR for page 18
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop Collaborations between remote sensing climate scientists and statisticians can potentially result in products that are ultimately more useful to end-users. Participants pointed out the value of true collaborations, wherein statisticians who become familiar with an entire data-collection system will help characterize the uncertainty or the variability in the equations that model the physical processes, and not just the uncertainty in the data collection. The two communities working together may lead to different ways to look at the data for the science questions. Jay Mace, University of Utah, presented an example of how uncertainty analysis can contribute to climate modeling. Sanderson et al. (2008) published results of a statistical study of climate models in which the investigators varied the parameterizations over some parameter space and looked at the sensitivity of the model results to the tuning. One of the tuning parameters examined was the ice crystal fall speed, and that paper concluded that it is an important parameter in a climate model, in part, because this process takes ice out of the upper troposphere where it shields the upward infrared radiation. Thus, ice fall speed turns out to be a powerful tuning knob for a climate modeler. In response to that paper, Deng and Mace (2008) published a study that looked at Doppler velocity data from the Atmospheric Radiation Measurement (ARM) sites and parameterized the ice crystal fall speed as a function of ice water path and temperature. This is an example of how information from various remote sensors can feed directly into model parameterizations. Because the many different climate models in existence use different parameters and generate different results, better knowledge of the uncertainties is critical for building the next generation of climate models. Workshop participants noted that statisticians and earth scientists need to consider the end-user of the product because different end-users will require different applications of remotely sensed data, and this will determine how the dataset will be processed. For example, end-users, such as policy makers, want to know about uncertainty in climate projections, which includes model uncertainty, observational uncertainty, and the overall uncertainty of our knowledge. As the climate community focuses more on addressing the questions posed by policy makers and other end-users, the collaborations between earth scientists and statisticians will likely be encouraged. A remote sensing scientist produces a product, which has biases and uncertainties that are spatiotemporally correlated as a function of the statistical properties of the observed fields and the manner in which they were sampled. A modeler uses the products, for example as gridded, averaged fields, and introduces biases and uncertainties into the predictions, projections, and analyses. The remote sensing scientist and the modeler must collaborate to ensure the accuracy
OCR for page 19
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop of the product. This implies an enterprise-wide need for collaboration between the data producers and data users. CROSS-DISCIPLINARY COLLABORATIONS BETWEEN CLIMATE SCIENTISTS AND STATISTICIANS The challenge of understanding a system as complex as climate requires a partnership between geoscientists and statisticians because neither community has all of the expertise that is required. Each community is attempting to understand climate processes interacting at multiple scales and the workshop participants called for more sophisticated techniques to study these interactions that can benefit both communities. The workshop demonstrated that the community is still at a fairly rudimentary stage of understanding the complexity of the climate system. Being at an early stage of understanding, it is not always obvious to outsiders, including those who might fund such investigations, that these lines of research and their necessary collaborations are essential to progress in climate science. The development of tools and methods to approach climate datasets for analyses is an ongoing endeavor. Research into better statistical methods enables us to, among other things, account for autocorrelation and provide ways to infer greater information from sparse data. A primary motivation for this workshop, and a fundamental area for collaboration recognized by the workshop participants, lies in quantifying uncertainty in climate records, given that a better understanding of the climate system can be obtained with a more sophisticated approach to handling uncertainties. This includes accounting for the uncertainty in the individual model parameters and also the uncertainty that is inherent in the modeling framework. Statisticians can help formalize how geoscientists can use well-characterized uncertainties to ultimately understand the uncertainties in the forecast model. Other areas were suggested as being ripe for collaboration between climate scientists and statisticians, including the monitoring of simple state variables in the atmosphere, and understanding the large cycles in the atmosphere, such as the water cycle and the carbon cycle and the interactions between them. It is difficult to examine these cycles due to their complexities, and statistical methods can be useful for teasing out information. Another area for collaboration is looking deeper into the interactions between variables, including a better understanding of forcing variables such as CO2 concentrations and aerosols. We need a conceptual framework for applying different statistical techniques to these areas. Statisticians need to tackle the full state space at the full resolution, and physically quantify and validate the results of any models that are generated. The agencies that fund remote sensing benefit when statistical
OCR for page 20
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop investigations explore data in a new way and find additional value in systems that are already aloft, as well as when they provide information that will guide resource decisions in the future. Collaborations between geoscientists and statisticians open up the opportunity for tailoring methods to fit a specific geophysical situation, with specific improvements in accuracy, precision, and/or run time. Statistical textbooks, publications, and software do not necessarily provide the necessary technology transition, making collaborations critical. For example in a method like kriging, presented by Noel Cressie at Ohio State University,1 it can be difficult to discern how to define a statistical model and probability distributions that will lead to optimal interpolation. The standard formulation assumes that parameters follow simple probability distributions (e.g., Gaussian distribution), but real data have many structures. Creating a prior distribution that incorporates an understanding, sometimes quite subtle, of the actual physical processes is a task best performed through collaborations. Workshop participants described that most productive collaborations are two-way, in that both perspectives, of the geoscientist and the statistician, are applied to understanding all aspects of the problem. Certainly, the geoscientist may evaluate the appropriateness of statistical steps and the assumptions embedded in them. Conversely, the statistician may also be intimately familiar with how the geoscientists developed the mathematical models, because some of the uncertainties and assumptions are embedded in the equations. As described earlier, a primary goal for climate scientists is to understand the physical processes that are directly relevant to climate model, and this can be addressed through the use of statistical models. Many of the workshop participants recognized that interdisciplinary work is hard. It is difficult to get funding because research proposals need to convince two separate communities that often do not communicate. Additionally, if a study generates a new approach that is too complicated, that approach will often not be utilized. An advance that requires others to learn a new technique has to be of obvious value and explainable. Complex solutions can be dangerous if they hide assumptions that were made during the derivations of the method, but which are not appropriate for the remote sensing application. This is another reason why a statistician involved in collaboration must be intimately involved in the geosciences modeling to recognize such a situation. Climate research is ripe for additional statistical sophistication, even at the risk of adding complexity, because the climate model predictions are critical to society. There are also impediments to cross-disciplinary collaborations. First, 1 A detailed description of kriging is presented in Appendix B, in the talk summary for Noel Cressie.
OCR for page 21
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop communication across the disciplines is a necessity that was highlighted in the workshop discussion. Both communities, geosciences and statistics, will benefit from an ongoing interaction as well as continued effort to understand the literature outside their field to find what has been done in a certain area in order to make improvements. Second, if done well, cross-disciplinary collaboration can be a very productive, but requires commitment from both communities. And third, workshop participants expressed that the funding is not adequate to support enough cross-disciplinary collaboration. What incentives exist for collaborations? For many years, statisticians have had little trouble finding interesting work. It takes a special kind of statistician to want to be in this cross-disciplinary area. Cross-disciplinary collaboration also needs support from the climate community; that is, geoscientists need to make known that they want statistical expertise to solve some of these complex problems. Participants also felt that the federal agencies that fund climate research need to be aware of this constraint. Box 2-3 illustrates the nature of the problem as ensemble approaches to large datasets are becoming more common. A key point in the discussion is that a simple mean is not the complete answer. Rather, the statistics and earth science communities can come together to take advantage of the variability in the data. The structure of the dataset needs to be analyzed to better understand the multiple physical processes that make up the climate system.
OCR for page 22
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop BOX 2-3 The Nature of the Problem: Lessons from a Santa Claus The use of ensembles, permutations of data, provides a sampling of the space over which the data can range and can be an effective way to begin thinking about complicated statistical problems. For example, Figure 2-2 presented by Doug Nychka, National Center for Atmospheric Research, includes 100 variants of a depiction of Santa Claus. Kriging, a common statistical technique, allows us to generate the best statistical estimate of the mean of the variants, and an ensemble around that mean can provide information about the uncertainty. However, a traditional method such as kriging may not be the optimal way to describe Santa’s features, as this approach represents a point-wise “expected value” and does not preserve the spatial relationships present in individual sample images. The “mean Santa” shown in the large image does not capture specific information such as Santa’s nose or other particulars. However, having the 100 realizations of Santa enables one to query what is known. In addition, the ability to infer details about the physical processes associated with Santa (e.g., his delivery of presents on Christmas Eve) is complicated by the fact that the only information available is through parameterizations that come from modeling efforts, much like the parameterizations in global climate models. Moreover, when a parameterization is needed, it might be better to rely on one or more of the 100 variants rather than basing the parameterization on the mean Santa, which does not directly represent any underlying model of processes as do the individual depictions. It is important to remember that the ensemble of variants is generated according to some assumptions about how to sample parameter space, and it might not be the best sampling for every purpose. Workshop participants argued that climate is defined by characteristic variation, not by average values. Climate is the product of a complex fluid dynamical system, and the products of such systems are defined by the system’s history, not its equilibrium. The atmosphere is constantly evolving, whereas the different versions of Santa in Figure 2-2 are not related in a fundamentally dynamical way. Rather than trying to understand the outcome of the averaging, it may be more beneficial to solve the problem of inaccurate model representation of climate. While kriging and generating an ensemble provide a valuable comprehensive view of a dataset, the ultimate goal of analysis is to discover the more subtle features of the structure of the distribution, which are often lost in simple analyses focusing on the mean, or average, or an observed process. For example, the average of a Mozart sonata is a single note, but that does not convey anything of importance about the piece of music.
OCR for page 23
Uncertainty Management in Remote Sensing of Climate Data: Summary of a Workshop FIGURE 2-2 Top: 100 variants of Santa Claus; Bottom: the average Santa Claus based on the 100 variants. Figure courtesy of Jason Salavon.