1
Introduction

Great advances have been made in our understanding of the climate system over the past few decades, and remotely sensed data have played a key role in supporting many of these advances. Improvements in satellites and in computational and data-handling techniques have yielded high quality, readily accessible data. However, rapid increases in data volume have also led to large and complex datasets that pose significant challenges in data analysis (NRC, 2007). Uncertainty characterization is needed for every satellite mission and scientists continue to be challenged by the need to reduce the uncertainty in remotely sensed climate records and projections. The approaches currently used to quantify the uncertainty in remotely sensed data, including statistical methods used to calibrate and validate satellite instruments, lack an overall mathematically based framework. An additional challenge is characterizing uncertainty in ways that are useful to a broad spectrum of end-users.

In December 2008, three standing committees of the National Academies held a workshop to survey how statisticians, climate scientists, and remote sensing experts might address the challenges of uncertainty management in remote sensing of climate data. The emphasis of the workshop was on raising and discussing issues that could be studied more intently by individual researchers or teams of researchers, and on setting the stage for possible future collaborative activities. Issues and questions that were addressed at the workshop include the following:



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
1 Introduction G reat advances have been made in our understanding of the cli- mate system over the past few decades, and remotely sensed data have played a key role in supporting many of these advances. Improvements in satellites and in computational and data-handling tech - niques have yielded high quality, readily accessible data. However, rapid increases in data volume have also led to large and complex datasets that pose significant challenges in data analysis (NRC, 2007). Uncertainty char- acterization is needed for every satellite mission and scientists continue to be challenged by the need to reduce the uncertainty in remotely sensed climate records and projections. The approaches currently used to quan- tify the uncertainty in remotely sensed data, including statistical methods used to calibrate and validate satellite instruments, lack an overall math - ematically based framework. An additional challenge is characterizing uncertainty in ways that are useful to a broad spectrum of end-users. In December 2008, three standing committees of the National Acad- emies held a workshop to survey how statisticians, climate scientists, and remote sensing experts might address the challenges of uncertainty man- agement in remote sensing of climate data. The emphasis of the workshop was on raising and discussing issues that could be studied more intently by individual researchers or teams of researchers, and on setting the stage for possible future collaborative activities. Issues and questions that were addressed at the workshop include the following: 

OCR for page 1
 UNCERTAINTY MANAGEMENT IN REMOTE SENSING OF CLIMATE DATA 1. What methods are currently used to compare time series at single points in space with instantaneous but sparsely sampled area averages to “validate” remotely sensed climate data? Are there more sophisticated or advanced methods that could be applied to improve validation tools or uncertainty estimates? Are there alternative means of measuring the same phenomena to confirm the accuracy of satellite observations? 2. How can fairly short-term, spatially dense remote sensing observa - tions inform climate models operating at long time scales and relatively coarse spatial resolutions? Are there remotely sensed data that could, through the use of modern statistical methods, be useful for improving climate models or informing other types of climate research? 3. What are the practical and institutional barriers (e.g., lack of quali - fied statisticians working in the field) to making progress on developing and improving statistical techniques for processing, validating, and ana - lyzing remotely sensed climate data? In her introductory remarks at the workshop, planning team chair Amy Braverman from the Jet Propulsion Laboratory presented Table 1-1 to illustrate how statistical methods (rows) can help address three major challenges in the use of remotely sensed climate data (the columns). The first of these three major challenges is the validation of remote sensing retrievals. When a remote sensing instrument retrieves a measurement that is used to infer a geophysical value (e.g., atmospheric temperature), uncertainties exist both in the measured values and in the statistical model used to validate the remotely sensed parameter. The second challenge is improving the representation of physical processes within all types of climate models. Workshop participants stressed the need to better rep- resent physical processes within global earth system models, a critical component to projecting future climate accurately, reducing uncertainty, and ultimately aiding policy decisions. The third major challenge in cli - mate research where statistics plays an important role is aggregating the observed and modeled knowledge, each with their associated uncertain- ties, to develop a better understanding of the climate system that can lead to useful predictions. Complex and multifaceted relationships in the physics of the climate system contribute uncertainty over and above that which is normally present in making inferences from massive, spatio-temporal data. Isolat - ing and quantifying these uncertainties in the face of multiple scales of spatial and temporal resolution, nonlinear relationships, feedbacks, and varying levels of a priori knowledge poses major challenges to achieving the linkages shown in Table 1-1. A formal statistical model that articu - lates relationships among both known and unknown quantities of inter- est and observations can sharpen the picture and make the problem

OCR for page 1
 INTRODUCTION TABLE 1-1 Three Major Challenges in the Use of Remotely Sensed Climate Data (Columns) and Three Roles Played by Statistical Methods (Rows) Challenge: Improving Challenge: physical Challenge: Validation of representations Extrapolating to remote sensing and future climate retrievals understanding predictions Role for Characterize Develop new Maximize value statistics: spatio-temporal statistical of limited data Clarify and mismatches, methods to and hard- characterize retrieval make the most to-formalize sources of algorithm of new data assumptions uncertainty in differences; types to address about remote sensing address new science relationships data sparseness questions among past, or absence of present, ground truth and future Role for Develop formal Develop new Develop statistics: statistical error methods to formalisms Develop measures for exploit massive for combining statistical both bias and datasets in output from methods variance an inferential different models to quantify setting in light of and reduce available data uncertainty Role for Overcome Pose problems Combine statistics: mismatches as formal physical Provide an by statistical questions and statistical overarching modeling of of statistical models framework relationships inference between observed and unobserved quantities. SOURCE: Table courtesy of Amy Braverman, JPL. more tractable. Random variables can represent uncertain quantities and describe relationships through joint and conditional distributions. Ran - dom variables can also be infused into systems of physical equations, to carry information about uncertainties along with information provided by physical knowledge.

OCR for page 1
 UNCERTAINTY MANAGEMENT IN REMOTE SENSING OF CLIMATE DATA Crafting such hybrid physical-statistical models to capture the essence of our understanding is not easy. The climate system is inherently nonlin - ear and includes feedback loops where variables directly and indirectly affect one another. Figure 1-1, presented at the workshop by William Rossow from the City College of New York, is a simple diagram of the energy and water cycles of the climate system that demonstrates how the system is interconnected. In order to gain a true understanding of climate feedbacks it is important to understand multiple variables in the climate system and their interactions. Clouds and precipitation, for example, play a crucial role in both the water cycle and the earth’s energy balance, affecting the sources and sinks of heat in the climate system. The release of latent heat during precipitation events provides energy that drives atmo- spheric circulations, and, in turn, atmospheric circulation processes that affect the distribution of water vapor and the formation of clouds have a pronounced effect on the transfer of radiation through the atmosphere. Therefore, analyzing the interrelationships between multiple variables in the climate system is key to understanding processes of interest. Large volumes of remote sensing data are available to assist in refining models of physical systems like that shown in Figure 1-1. Data provide information about physical mechanisms at work in the atmosphere, and also about the uncertainties or gaps in our understanding of how those mechanisms operate. To make use of data in this way, however, requires that inherent uncertainties and biases in the data themselves be known and quantified. Therefore, the problem requires a holistic approach to uncertainty management beginning with data collection and validation strategies that are cognizant of the uses of the data. These challenges can be addressed in two ways: 1. By identifying data collection and analysis methods that minimize the uncertainties; and 2. By identifying the contributions to uncertainty at the various steps in collection and analysis, thereby pointing out the most promising targets for improvement. Uncertainty quantification, in the broadest sense, is to account for not only uncertainty in individual parameters within the models that are used, but also to account for the uncertainty inherent in the actual mod- els themselves, which are only approximate representations of physical processes. Workshop participants emphasized that improving physical process representation is critical for both improving climate models and for better characterizing their uncertainties. Statistics can contribute to solving this problem by moving beyond linear analysis for individual parameters to capture more complex relationships that have a physical

OCR for page 1
 INTRODUCTION FIGURE 1-1 Schematic of energy and water cycles. Red represents transfers of en- ergy while blue lines show transfers of water. Figure courtesy of William Rossow, City College of New York. meaning. A good statistical model is built in a way that captures some of the physical processes that control elements of the climate system, or 1-1 alternative hypotheses about those processes. The classical method, described at the workshop, for characterizing uncertainty in earth science modeling is through sensitivity analysis. Sim- ply, this method includes changing parameter values in a model to learn how much that parameter affects the model output. This method does not account for the possibility that more than one process represented in the model might rely on the parameter itself, which will affect the uncer- tainty estimate. In addition, the compounding effect of different sources of uncertainty on different parameters is difficult to quantify through sensitivity analyses. Alternative statistical approaches define uncertainty through joint probability distributions of parameters. While it is difficult to use this approach to identify the correct parameters and distributions when data - sets are small, advances in data collection, management, and processing technologies are increasingly resulting in large datasets. Statistical dis - tributions and their parameters can be estimated accurately when large

OCR for page 1
 UNCERTAINTY MANAGEMENT IN REMOTE SENSING OF CLIMATE DATA volumes of data are available. Scientists in the satellite era have this lux- ury, but are concomitantly faced with massive data volumes that create challenges for processing and analysis techniques. In principle, large, complex, and detailed datasets offer the promise of new knowledge from which to better understand the climate system. Statistical methods that are developed specifically for new data types can better exploit these large, complex datasets that traditional methods (i.e., sensitivity analysis) cannot. Understanding the uncertainties of different processes in the climate system requires a variety of approaches. Collaborations involving climate scientists and statisticians were identified at the workshop as an effective way to promote the development of targeted new methods that would aid the science community to question all aspects of the data, and geophysical and statistical models. Workshop participants also remarked that modern statistical methods can be useful for fusing data from two different instru- ments, which is a more challenging problem than is generally appreciated. For example, data assimilation techniques are one approach to address - ing the spatial and temporal mismatch between models and observations (Daley, 1991; Luo et al., 2007). As described by workshop participant Anna Michalak from the University of Michigan, such approaches need to account for the spatial and temporal structure of the dataset to allow a better understanding of the physical processes that make up the climate system. WHY WORRY ABOUT STATISTICAL STRUCTURE: AN EXAMPLE FROM MODELING SNOW DEPTH Anna Michalak at the University of Michigan described how the sta - tistical properties in remote sensing datasets offer both a challenge and an opportunity. For example, understanding and accounting for statistical dependence, including spatial and temporal correlations, can improve the utility of observational datasets. The opportunity is that by skillfully handling these complexities, we can better take advantage of the full information content of the available data, and use this information to guide high-payout improvements in models of the Earth system. In some cases, statisticians and earth scientists use similar techniques to evaluate, and take advantage of, the spatial and temporal structure of observations of environmental parameters. For example, spatial statistical techniques allow one to interpolate (the earth scientist’s term) or predict (the statistician’s term) the value of specific environmental parameters at unsampled locations. The vast majority of environmental parameters (e.g., clouds, precipitation, winds) exhibit spatial and/or temporal cor- relation, with associated characteristics of scales of variability. As stated

OCR for page 1
 INTRODUCTION by Tobler as the “first law of geography”: “Everything is related to every - thing else, but near things are more related than distant things” (Tobler, 1970). Both statisticians and earth sciences have used quantitative tools to assess the spatial autocorrelation exhibited by sampled data. Both use variograms and/or covariance functions to quantify the degree of spatial autocorrelation. An accurate assessment of the spatial variability of observed parameters can be used to better understand the underlying physical processes. Figure 1-2 illustrates how understanding and exploiting the spatial and temporal structure of data can be useful. In this example, a limited number of measurements that are clustered in a non-ideal way are used to estimate the mean snow depth in a valley. Simply averaging the ten measurements of snow depth does not provide a good representation of mean snow depth. Instead, the clustered observations in the left portion of the valley clearly need to be weighted less relative to the isolated observa- tion in the right region. However, how much weight should be assigned to each data point? Spatial statistics methods can be used to determine the degree of spatial variability in the snow-depth distribution based on an analysis of how similar nearby measurements are to one another, and how dissimilar far-away measurements are to one another. This informa- tion, in turn, can be used to quantify the optimal weights to be assigned to each measurement. This simple method in spatial statistics allows one to calculate an unbiased estimate of mean snow depth in the valley based on an uneven distribution of measurements. In Figure 1-3, we look at a hypothetical dataset describing snow depth as a function of elevation, and assuming that the snow depth is also auto - correlated in space (top panel). These synthetic data were generated in such a way that, in reality, there is no overall trend of snow depth with elevation, and any observed trend is therefore the result of randomness introduced in generating the data. This hypothetical dataset is then used to test whether two competing approaches are able to correctly conclude that there is no relationship between snow depth and elevation (middle panel). In the first approach (red line), classical linear regression is used, which ignores the spatial correlation in the data. In the second approach (green line), the spatial correlation is accounted for in the estimation pro - cess. In the example shown in the figure, the classical approach incorrectly rejects the null hypothesis that there is no trend between snow depth and elevation, whereas the approach based on spatial statistics correctly does not reject this hypothesis at the 95 percent confidence level. As the experi - ment is repeated multiple times with new synthetic data (bottom panel), we observe that the linear regression approach incorrectly concludes that there is a trend between elevation and snow depth approximately 20 percent of the time, which is much too high given that the test was run

OCR for page 1
 UNCERTAINTY MANAGEMENT IN REMOTE SENSING OF CLIMATE DATA Map of an alpine basin snow dept h measur ement 600 500 snow depth [cm] 400 un biased es timate o f mean snow 300 de pth (ass umes spatial correlation) 200 mean of snow depth measur ements 100 (assumes spat ial independence) 0 0 200 400 600 800 1000 x [m] FIGURE 1-2 Example of sampling snow depth in a watershed. Top: aerial map of an alpine basin with sample locations (•). Bottom: snow depth at sampling location versus distance from the left edge of the valley. The red line represents the biased estimate of average snow depth obtained from a simple average of the available observations. The green line represents the unbiased estimate obtained by assigning weights to the observations based on an understanding of the scales of spatial variability of the snow depth in the valley. Figure courtesy of Anna Michalak, University of Michigan. Original figure by Tyler Erickson, Michigan Tech Research Institute.

OCR for page 1
 INTRODUCTION H0 Rejected! 1.3 A bitmapped H0 Not Rejected 1.3 B 5% H0 rejected FIGURE 1-3 Hypothetical data on snow depth as a function of elevation. Top: illustrates one case of the generated data, and the estimated slope between snow depth and elevation, using simple linear regression (red line), and an approach that accounts for the spatial correlation of the data (green line). Middle: illustrates 1.3 C bitmapped the probability distribution of the trend of snow depth with elevation using the two approaches. Bottom: demonstrates that if the experiment were repeated many times, one would erroneously conclude that there was a relationship between snow depth and elevation too often if using simple linear regression. Figure cour- tesy of Anna Michalak, University of Michigan. Original figure by Tyler Erickson, Michigan Tech Research Institute.

OCR for page 1
0 UNCERTAINTY MANAGEMENT IN REMOTE SENSING OF CLIMATE DATA in a way that should have yielded only a 5 percent chance of incorrectly concluding that there was a trend. The approach that accounts for spa - tial correlation, concludes that there is a trend 5 percent of the time, as expected. Overall, this example illustrates that statistical approaches that ignore spatial and/or temporal correlation inherent in environmental data carry with them an increased risk of erroneously concluding that significant relationships exist between physical phenomena (snow depth and elevation, in this case), and, more generally, yield biased estimates due to their assumption of independent observations.