Appendix B
Summaries of Workshop Presentations

DIFFERENCES IN TERMINOLOGY, TECHNIQUES, AND APPROACHES BETWEEN STATISTICIANS AND EARTH SCIENTISTS

Anna M. Michalak, University of Michigan


Characterizing the complexity and quantifying the uncertainty in environmental systems will aid in better understanding the systems and improving model forecasts. However, the differences in terminology and approaches used by the statistics and earth science communities are just one of the impediments to successfully analyzing remote sensing climate datasets. This talk illustrates some applications of statistical methods for optimizing the use of in situ and remote sensing datasets of the climate system. In order to assess the predictions obtained using models that integrate such datasets, tools must be developed that quantify the full uncertainty associated with such models, rather than simply evaluating the sensitivity of model predictions to a set of model parameters. In many cases, the uncertainty associated with the conceptual framework of the models and the specific parameterizations included in the models, outweigh the uncertainty caused by incomplete knowledge of individual parameters.

When sparse spatial data are integrated in analyses, uncertainties that arise due to spatially and temporally non-uniform sampling can be accounted for using the principles of spatial statistics. Because classical statistics is based on the assumption of independent observations,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 32
Appendix B Summaries of Workshop Presentations DIFFERENCES IN TERMINOLOGY, TECHNIqUES, AND APPROACHES BETWEEN STATISTICIANS AND EARTH SCIENTISTS Anna M. Michalak, Uniersity of Michigan Characterizing the complexity and quantifying the uncertainty in environmental systems will aid in better understanding the systems and improving model forecasts. However, the differences in terminology and approaches used by the statistics and earth science communities are just one of the impediments to successfully analyzing remote sensing climate datasets. This talk illustrates some applications of statistical methods for optimizing the use of in situ and remote sensing datasets of the climate system. In order to assess the predictions obtained using models that integrate such datasets, tools must be developed that quantify the full uncertainty associated with such models, rather than simply evaluat- ing the sensitivity of model predictions to a set of model parameters. In many cases, the uncertainty associated with the conceptual framework of the models and the specific parameterizations included in the models, outweigh the uncertainty caused by incomplete knowledge of individual parameters. When sparse spatial data are integrated in analyses, uncertainties that arise due to spatially and temporally non-uniform sampling can be accounted for using the principles of spatial statistics. Because clas - sical statistics is based on the assumption of independent observations, 

OCR for page 32
 APPENDIX B these tools do not account for the spatial and/or temporal autocorrelation inherent to the majority of environmental phenomena. This can lead to biased estimates and erroneous identification of relationships between parameters. Developing statistical tools that explicitly account for spatial or temporal autocorrelation avoids such errors. In addition, using models that quantify and account for spatial and/or temporal correlation can decrease the uncertainty associated with model predictions because the spatial or temporal information footprint of available data can be assessed and used to inform the model. Spatial statistics tools can be used to combine data collected from dif- ferent instruments with differing resolutions, and to reduce uncertainties associated with data interpolation, among other things. This talk empha - sized that principles of spatial statistics can address many of the chal- lenges in geoscience. The simplest examples of spatial statistics are inter- polating data and generating realizations (i.e., equally likely scenarios) of a given process given sparse data. These principles can be applied to data at any scale. On a global scale, for example, data from the Orbiting Carbon Observatory, a satellite designed to measure carbon dioxide (CO2) from space to improve our understanding of global CO2 concentrations, would have contained large gaps due to the satellite track and the presence of clouds and aerosols. Methods based on geostatistics are being developed to generate estimates of the global distribution of CO2 based on such data, by first characterizing the degree of spatial variability in the CO 2 observations, and using this information to estimate CO2 for portions of the globe that are not measured. On a local scale, similar principles have been applied to a project that assesses areas of low oxygen in Lake Erie. It is difficult to quantify the extent of the Lake Erie dead zone and how it varies from year to year because the in situ measurements are sparse. Therefore, new statistical techniques were developed to identify remote sensing variables that are good predictors of the dissolved oxygen con - centration, and these variables are then merged in a geostatistical frame - work with available in situ data to estimate the spatial extent of hypoxia, and how it varies across years. Results show that fusing the in situ and remotely sensed data yields a more realistic distribution of the extent of hypoxia, with lower associated uncertainties, when compared with the results using only the in situ data. Spatial aggregation, averaging, and linear interpolation are often using to merge data collected from multiple remote sensing instruments. Such approaches, however, do not yield at optimal estimate at the target scale of analysis, and estimated values can be influenced by samples in neighboring pixels. This problem is exacerbated when remote sensing data are “re-sampled” multiple times. Spatial statistical tools applied to measurements from one or more instruments that may have different

OCR for page 32
 APPENDIX B resolutions, coupled with an understanding of the point spread function, and a quantification of the spatial and temporal covariance, can be used to optimally combine data from multiple sensors to yield estimates at any target resolution. Three areas that are ripe for collaboration between statisticians and earth scientists are (i) the development of rigorous tools for quantifying the uncertainty associated with parameters using in climate models, (ii) the development of tools for comprehensively quantifying the uncertainty associated with model predictions, including errors caused by the model formulation, and (iii) developing tools for integrating data across spatial and temporal scales. Furthermore, these two groups can come together to build a better understanding of the physical processes that are relevant to climate models, ultimately leading to improved physics-based models and projections. This could be achieved by developing statistical models that emulate many of the underlying physical processes, and explicitly account for the uncertainty associated with all model components. REMOTE SENSING OF SURFACE WINDS Ralph Milliff, Northwest Research Associates Bayesian hierarchical modeling (BHM) is a fundamental statistical approach for addressing problems in remote sensing climate datasets. Building blocks for BHM include the data-stage distribution (i.e., likeli - hood), which quantifies the uncertainty in observations, and the process- model-stage distribution (i.e., prior), which quantifies the uncertainty in the physics of the process. These stages introduce parameters, and esti - mates in the posterior of the parameters can be determined. One advan - tage of the BHM approach is that it takes multiple data sources into one model. The first two stages allow for satellite data and data from other platforms to be combined with the physics. Estimates in the posterior of the parameters introduced in a model may be influenced by both the data-distribution and process-model stages. BHM has been successful in characterizing surface-wind processes over the Labrador Sea, the tropical ocean and the Mediterranean. This talk describes how BHM may be used to characterize surface-vector winds for ensemble data assimilation and an ocean forecast model for the Mediterranean Sea. The research further demonstrates that surface-vector winds can be used to identify uncertain - ties in dependent variables in the forecast model. The Mediterranean ocean forecast system uses realizations from the posterior of the wind BHM to generate physically realistic spreads in the forecast initial conditions from which ocean ensemble forecasts can be launched. The strategy is to exploit the precisely characterized uncertainty of satellite observations. The prediction grid shows clusters of wind vec-

OCR for page 32
 APPENDIX B tors, and the uncertainty in the clusters is a function of space and time. This is an alternative way of thinking about uncertainty. Traditionally, geoscientists think of an error in the satellite as projected to the error in the field. However, the uncertainty in this case depends on whether or not the satellite was there and the strength of the surface winds. Thinking about uncertainty in this way can lead to important understandings of processes that depend on the dynamics of the system. Subtle shifts in the spread of wind vectors from the posterior generate different wind stress curls, the vorticity in the wind that drives the eddy field in the ocean. The vorticity-driven eddies occur on a scale (i.e., the ocean mesoscale) that is a primary source of uncertainty in the ocean forecast (i.e., at synoptic scales). Over the Mediterranean Sea, the eddies affect the general circula - tion of the basin including deep water formation. Mistral events, where cold dry air blows off the European continent in late winter, generate large wind stress curl, the uncertainty of which can be characterized. The forecast model generates physically realistic spreads in the forecast initial condition, and the spread is focused on uncertain scales of the general circulation in the Mediterranean. The uncertainty is temporally and regionally specific. This methodology can also be used to better understand the physics from the posterior distribution of BHM parameters in the process model (i.e., the pressure gradient terms in the present case). The posterior distri - butions for those parameters on the geostrophic and ageostrophic terms, with and without the QuikSCAT data, are examined. The scatterometer is providing more ageostrophic information than the weather center fields. The example demonstrates that the prior can be more complex. REMOTE SENSING AND PRECIPITATION Tom Bell, National Aeronautics and Space Administration Precipitation is highly erratic in space and time, and success in devel- oping adequate statistical descriptions of precipitation has been mixed. The fact that precipitation rates are zero much of the time over much of an area is a special nuisance. In addition, remote sensing errors in this area are complex and difficult to quantify. Rain gauges, providing the most accurate ground-based measurements of precipitation, cover only a very small area, essentially the size of a bucket, and are used to validate time- specific satellite measurements taken over many kilometers. This poses many challenges in modeling the errors associated with these estimates to characterize the types of biases in the satellite estimates. Rain gauges provide time-averaged rain rates, whereas satellites take measurements from a volume of space above the ground. It is, therefore, problematic to try to make validations and comparisons between these

OCR for page 32
 APPENDIX B two very different sources of data. Analogous challenges can arise in comparing estimates collected from different satellites. For example, the Tropical Rainfall Measuring Mission (TRMM) satellite has a radar instru- ment that measures rainfall amounts at every level in the atmosphere, as well as a visible infrared instrument and a microwave instrument that detects raindrops. The observational swath of the microwave instrument is 800 km wide, and the satellite makes roughly one observation per day. This results in a very sparse dataset distribution in time that also becomes increasingly complex to describe once one considers the orbital variations of the satellite and its measurements. It is important for statisticians to appreciate the complexity of the dataset to gain an understanding of the issue prior to analysis. The temporal and spatial discrepancies between the rain gauge data and the satellite measurements lead to several questions about the valida- tion process that must be considered: what size area should be used in averaging satellite data, what is the optimal time period over which the rain gauge data should be averaged, and if a satellite passes several times in one month, how long before and after the pass should the rain gauge data be averaged? A spectral model can be very useful for spatial and temporal statistics. The model is designed to represent the larger areas that tend to evolve more slowly in time and have very long correlation times versus small areas with short correlation times. This is an important characteristic in rainfall and other geophysical fields. Small events evolve and move over an area more quickly than large events. An example of the uses for the spectral model is represented in Box 2-1, illustrating that there is an optimal time averaged sampling interval. The complexity of rain distributions requires better statistical models that can account for the rain-rate distribution, the space-time behavior distribution, the multivariate distribution, and also the error. The under- lying assumptions need to be carefully considered before the models are used for any specific application. In addition, it is highly beneficial to both the earth science and statistics communities when techniques are well understood and accessible. DIFFERENT TYPES OF UNCERTAINTIES IN CLOUD DATASETS William Rossow, City College of New York Clouds pose multiple challenges for interpreting datasets to detect climate variability. Better statistical methods can aid the geoscience com - munity by helping to better define important measurements related to clouds, such as cloud area and coverage, point-area comparisons of clouds, sampling of cloud variability, and monitoring cloud evolution. Remote sensing of clouds is very difficult in part because of resolution

OCR for page 32
 APPENDIX B and the effects of finite sensitivity and detection. The latter effects had been largely ignored until the last couple of decades. Cloud size varies greatly, and this complicates the measurement of cloud fraction. A well- designed detection threshold that produces a good estimate of cloud fraction despite the problems associated with size and area measurement has been demonstrated by the International Satellite Cloud Climatology Project (ISCCP). In terms of point-area comparisons, it is important to be cautious when comparing ground measurements and satellite data. The two represent very different measurements and have different accuracies and errors associated with them. Cloud sampling requires knowledge of the scales of the variability, the sampling scale, the precision of the measurements and the end goal to which the estimates will be applied. Clouds exhibit the most variability at larger spatial scales. The variability is dominated by the same scales as is the atmospheric circulation. In addition, temporal autocorrelation must be accounted for when considering inferences about monthly means. For example, monthly data may consist of only 10 truly independent samples rather than 30 because of day-to-day variations. If not properly calculated, this can lead to monthly means that, for instance, give misleading infor- mation about inter-annual variability, such as El Niño-Southern Oscilla - tion (ENSO) patterns. Monthly means are commonly used in discussions of climate variation, but the mean can also mask small variations in the dataset. In addition, statistical methods need to be improved so we can remove the effects of autocorrelations. Cloud interactions with the climate system should be considered in the context of multiple processes and the relationships between those processes. Atmospheric circulation, clouds, precipitation, and radiative fluxes are all related and often occur within the same cloud system. Clouds and precipitation, for example, are fluid systems that are coupled spatially and temporally and should be considered together. Studying just one parameter at a time does not provide a full picture of the dynamics that are actually occurring. Similarly, averaging can provide valuable information, but averaged values do not help us determine the structure of the physics. Thus we need more sophisticated ways to analyze the non- Gaussian, highly nonlinear, multi-variate, and multi-scale coupled data that are available. A partnership between the fields of geosciences and statistics is needed to develop techniques and models capable of render- ing the available data into something interpretable and useful. There is a multitude of climate models that utilize different parameters and generate different results. The community needs better ways to determine which of these models best represent the climate system.

OCR for page 32
 APPENDIX B MACHINE LEARNING TECHNIqUES FOR CLOUD CLASSIFICATION Bin Yu, Uniersity of California at Berkeley The uncertainties in cloud radiation feedback on global climate remain a great obstacle in understanding and predicting future climate changes. This talk describes a case study of cloud detection over the Arctic region using machine learning methods and data generated from the Multi-angle Imaging SpectroRadiometer (MISR). The MISR’s algorithm retrieves the cloud height and cloud movement by matching the same cloud from three angles. However, clouds above snow- and ice-covered surfaces are particularly difficult to detect because the temperature and reflectivity of the clouds are similar to the snow and ice surfaces. Retrieval is also particularly difficult when trying to detect thin clouds in polar regions. This talk describes a methodology for addressing these challenges. The starting point was to measure the ground rather than clouds, because the ground is fixed. It was found that correlations between angles are strong over snow- and ice-covered surfaces and weak in areas covered by high clouds. Exploiting the multiple angles in MISR, the linear-correlation matching clustering (LCMC) (Shi et al., 2002) technique was developed to distinguish between smooth surfaces and thin clouds. However, the LCMC in polar regions was insufficient for detecting smooth surfaces, such as frozen rivers, and areas of thin clouds, which led to the develop - ment of an enhanced LCMC (ELCMC) (Shi et al., 2004). The question remains of how to quantify the clouds. This talk com - pares performance of the ELCMC against two other machine learning techniques, the quadratic discriminant analysis (ELCMC-QDA) and the ELCMC support vector machine. The latter was determined to be compu- tationally slow. Expert labeling provides the highest accuracy, but that is too slow for most purposes; the ELCMC-QDA provides about 92 percent of the accuracy of expert labeling. The use of expert labels improve the accuracy rates, however, they are expensive and impossible to obtain for every block of data. Information from the Moderate Resolution Imaging Spectroradiometer (MODIS), was then utilized to inform accurate labels, and gives complementary information to MISR. The MISR and MODIS consensus pixels are more accurate than MISR or MODIS alone and fusing the data from MISR ELCMC and MODIS improves the average accuracy of polar cloud detection. This is an example of truly interdisciplinary work where statistical machine learning merges statistics with computational sciences. This was an iterative process with feedbacks and inputs from MISR team at every step. The goal was to solve the scientific problem with the streaming data

OCR for page 32
 APPENDIX B constraint, and an uncertainty measure was given based on posterior probability. VALIDATION OF CLOUD PROPERTY MEASUREMENTS FROM MULTIPLE INSTRUMENTS Jay Mace, Uniersity of Utah The influence of clouds and precipitation on the radiation and hydrol- ogy of Earth depend fundamentally on the evolution of the vertical profile of microphysics. The understanding of this evolution is possible through observation-based models. The validity of the observations, however, is dependant on how well the vertical profile of microphysics is derived from remote sensing data in a statistically meaningful way. This talk dem- onstrates how to compare and validate microphysical cloud properties using multiple instruments including satellite measurements, ground- based remote sensing measurements, and aircraft measurements. Cloud properties play an important role in the climate system. Reduc- ing the uncertainties in climate models is necessary to improve our under- standing of feedback mechanisms in the climate system, including feed- back between the hydrologic cycle and cloud aerosol precipitation. The ability to infer cloud properties using remote sensing will aid in reducing these uncertainties in the climate models and allow us to gain a better understanding of the climate system and feedback mechanisms. There are multiple phases in the evolutions of clouds and cloud-size distributions. The interaction of clouds and their evolution occurs at the particle level, and therefore the particle size distribution is a critical component to the cloud problem. The evolution of clouds from aerosols to precipitation fundamentally occurs at the particle level and therefore, to understand feedback mechanisms, we must drill down to the particle size distribu- tion (PSD). Statistical distributions are used to estimate PSD in climate models. Integrating across the PSD estimates how many aerosol particles are nucleating to create a cloud. This is one of the ill-defined quantities derived from remote sensing data. The mass of a cloud is also of interest. Some clouds with very dif- ferent properties (e.g., type, water content, ice content) can exhibit the same radar reflectivity. Remote sensing algorithms are required that can identify differences in extinction for different clouds with similar proper- ties. A simple PSD characterization requires 3-4 parameters, which is the essence of the cloud problem. With satellite data, we seldom have more than one or two independent pieces of information to describe a given cloud volume or cloud column, which makes it fundamentally impossible to derive the details of the more complicated size distribution. Many instruments are used in conjunction to measure clouds, includ -

OCR for page 32
0 APPENDIX B ing in situ aircraft, ground sites, and satellite sensor suites. In situ aircraft measurement within the cloud volume is the common tool used to pro- vide “ground truth” against which satellite measurements are compared. Ground sites through the ARM program have upward-looking remote sensors that provide detailed profiles and have the advantage of working over long periods of time. However, these sites are constrained in cover- age because they only measure one point. Satellite sensor suites move beyond single-instrument measurements, but the measurement profiles are limited, and they are very expensive. Each of these techniques has associated uncertainties. For aircraft, uncertainties are associated with sample volume relevance and artifacts, such as those caused by the shattering of ice crystals on aircraft surfaces. Uncertainty is also associated with the algorithm used to determine cloud properties from remote sensing at ground sites. Like all algorithms, the algorithms used for cloud properties include key assumptions, such as the mass of ice crystal as a function of particle size. The retrieval technique also adds to the error. Algorithms incorporate assumptions about physi - cal parameters, which contribute to the uncertainties in the algorithms in subtle ways. An example of comparing ground-based and satellite data is seen in examining Moderate Resolution Imaging Spectroradiometer (MODIS) derived ice water paths. Ice water paths may be derived from a validated radar algorithm and compared with the MODIS data, showing reasonable agreement. However, one is a snapshot spatially averaged measurement and the other is a time average point measure. For cirrus clouds, a spatial average is generated by averaging the MODIS measurements over a rect- angle that is oriented along the mean wind at the cloud level, and then averaging the ground-based radar data over a period of time when the cloud layer remains uniform. Comparing these measurements over a long period of time will aid in determining the error characteristics of the satel- lite. The uncertainty in the data, however, combined with the uncertainty in the science requires techniques to quantitatively assess these errors, and more sophisticated approaches would be helpful. While techniques to quantitatively assess error in datasets exist, more sophisticated approaches would be helpful. A systematic approach to collecting and managing aircraft data to define empirical relationships is suggested. Combining multiple measurements in instrument suites and continuing and improving in situ measurements are critical for validation. Several elements are required to make progress in this area. Using single instruments is outdated, and a suite of sensors that provide multiple independent measurement profiles needs to be continued and improved upon. A long-term systematic global in situ measurement program is a critical adjunct to remote sensors.

OCR for page 32
 APPENDIX B UNCERTAINTY ISSUES ASSOCIATED WITH REMOTELY SENSED DATASETS FOR AEROSOLS Lorraine Remer, National Aeronautics and Space Administration A number of uncertainties are associated with deriving aerosols from the Moderate Resolution Imaging Spectroradiometer (MODIS) satellite. While many varieties of substances are considered aerosols, the diffuse and smooth aerosols are the most important when studying the climate system, as they reflect and absorb sunlight. It is difficult to distinguish between different types of aerosols from clouds in remote sensing retriev- als. The satellite measures the aerosol optical depth (AOD), which are measures of the column-integrated extinction, the amount of light that is either scattered or absorbed as it passes through the aerosol layer giving an indication of the amount of aerosol. A satellite measures not just the radiation scattered from the aerosol layer, but it also collects some radia- tion that made it through the aerosol layer from the earth’s surface. The surface effect needs to be removed from the satellite signal to estimate the extinction, which requires assumptions about the aerosol and the surface, leaving room for error. A more direct way of measuring AOD is to use a sunphotometer on the ground to measure the transmitted light directly. Combining measurements of the sunlight at the top of the atmosphere with the amount of sunlight at the surface, the extinction can be deter- mined. This approach uses fewer assumptions and under the best condi- tions MODIS can retrieve AOD to within ±0.03, and a well-calibrated sunphotometer can measure it within ±0.01. The widespread network of sunphotometers called Aerosol Robotics Network (AERONET) retrieves data globally. The primary challenge with this technique is the mismatch between the spatially varying MODIS data and temporally varying sun - photometer data. There are select areas with coincident coverage in mea- surements between MODIS and AERONET. AERONET is a land-based network, and matching an overpass with an observation is rare. The location of the ground-based observation within the satellite grid square is a consideration in validation process. Collocating AOD observations does not address uncertainty issues with other retrieved parameters like aerosol particle size. Collocating with sunphotometers takes advantage of the ground instruments’ cloud-clear- ing algorithm. Comparing the spatial statistics with the temporal statistics increases the number of points that can be used from AERONET. With these results, information about the uncertainties of the remote measure - ments of AOD can be determined. Another challenge is using the validated retrievals to generate daily and monthly mean maps of aerosol distributions that improve our under- standing of climate. A complication arises when the weighting of the data

OCR for page 32
 APPENDIX B is considered in the calculation of the monthly means. With few retrievals available for a particular grid square, there is a greater likelihood of high uncertainty. Different methods of weighting and averaging the data result in significantly different distribution maps and mean optical depths. Not all retrievals carry the same level of confidence. There are a number of options and methods to consider when analyzing the means from satellite and sunphotometer data, and finding a suitable solution is largely situ - ational and dependent on the specific application under consideration. SPATIAL STATISTICS WITH AN EMPHASIS ON AEROSOL DATA Noel Cressie, Ohio State Uniersity Statistical science is a paradigm that incorporates uncertainty into the description, explanation, and prediction of processes and parameters in the world. Spatio-temporal statistics incorporates space and time into statistical models of all aspects of the uncertainty. There are a multitude of statistical techniques that can be applied to climate datasets, and hier- archical statistical modeling is one such. The uncertainty can be modeled in a hierarchical manner where, at each level of uncertainty, a statistical model is used. There are at least two levels in the hierarchy, resulting in data models and process models, the latter being a representation of a physical process of interest. Fundamental to hierarchical statistical mod - els is the use of conditional distributions. This results in a product form for the joint distribution; then Bayes’ Theorem can be used to obtain the posterior distribution of unknown processes or parameters (e.g., aerosol optical depth in a modeling of data from the MISR instrument on NASA’s Terra satellite). Kriging is a spatial regression technique applied to datasets to filter out noise and fill gaps, and it can be shown to arise from a Gaussian- Gaussian hierarchical statistical model. Kriging methodology requires estimation of the spatial covariances in the dataset, from which a math- ematical formula can be constructed to fill in the gaps in the dataset. This formula is simply the mean of the posterior distribution referred to earlier, with suitable substitution of estimates of unknown parameters. If we build a spatio-temporal model at the finest scale, even at a scale where there is a lack of data, and then aggregate up from that scale, the relation- ships will allow computation of any covariance at the scale of interest. This approach also allows one to determine the kriging standard error, which is a measure of the uncertainty in the smoothed, gap-filled prod- uct. Algorithms that can be used with the data, such as nearest-neighbor smoothing and inverse-distance weighting, do not allow the estimation of uncertainty that kriging allows. One of the limitations of kriging is that it cannot be applied to

OCR for page 32
 APPENDIX B large datasets because it does not scale well computationally. However, fixed rank kriging (FRK), a rather new statistical method (Cressie and Johannesson, 2006) using a spatial mixed-effects model, is a technique that can be applied to large, remote sensing datasets. This technique is computationally feasible and is able to deal with the dimension reduction that other spatial and temporal statistical methods have not been able to address. The “fixed” adjective refers to a fixed number of basis functions; as the number of bases grows, more information can be gleaned about the finer-scale variability. If the finer-scale variability is not present, then fewer basis functions can be used in this method. Importantly, the spatial covariances for FRK do not have to be stationary (Shi and Cressie, 2007; Cressie and Johannesson, 2008). The covariances determined by FRK can also be used in a cross-validation technique to estimate the uncertainty of the data that lie within a grid box. AEROSOL AND CLOUD REPRESENTATION IN GLOBAL MODELS Joyce Penner, Uniersity of Michigan Global climate modelers continually work to improve climate models by analyzing observational data to gain better insights into the phys- ics of atmospheric processes. The challenges associated with comparing climate models and data are similar to validation studies that compare satellite retrievals with ground-based measurements. Global climate models (GCMs) use a grid that is based on how long a period is to be represented by the simulation, typically 100 years, and how many times the model is run. Within each grid cell, the model physics and dynamics are represented and provide information about various attributes of the atmosphere, such as aerosol and cloud concentrations. The resolution, however, is too coarse to resolve clouds. The coarse grid resolution requires parameterization of physical pro- cesses. Scientists must approximate many of the prognostic variables needed to draw conclusions about significant parts of the climate system, such as clouds and their effects on the radiation budget. Some of the pro- cesses that must be parameterized are convection, turbulence, radiation, and microphysics, all of which influence the complex cloud distributions that we observe, but which cannot be represented in detail in GCMs. The water mass of clouds, cloud fraction, and mixing processes, for example, have to be predicted and these predictions include uncertainties. There are many challenges associated with parameterization. In both the verti- cal and horizontal planes, the cloud cover can be thinner than what the resolution allows. When combining these two, it becomes necessary to make assumptions about the variation in the horizontal coverage in the vertical direction. Adding more complexity, clouds within one horizon -

OCR for page 32
 APPENDIX B tal grid layer might have varying densities and vertical positions. The assumptions made about cloud distribution and radiation effects have consequences in the overall output of the model. Climate models are validated by comparing the results to satellite data (as well as in situ data). In GCMs, clouds and the feedback from clouds, resulting from changes in the temperature, are one of the greatest sources of uncertainty. This is due to the lack of fine-scale grid models that can properly represent the sub-grid-scale features that are required to accurately predict climate changes. Aerosol models represent various processes, such as emissions that form secondary aerosols, chemistry, aerosol microphysics, transport, and dry and wet deposition. The uncertainty associated with these models comes partly from variability in the sources of aerosols, given that there are multiple aerosol species. In addition, the representation of the other processes differs from model to model. Validation of the chemical compo- sition of the aerosols in the models cannot come from comparisons with satellite data or ground-based remote sensing data like that of AERONET, because the measured aerosol optical depth (AOD) is a composite of the effects of the different aerosol types we attempt to model. So while mea - surements of the global average aerosol optical depth can improve, this will not completely resolve the differences between various models and observations. Convection also has large implications for the prediction of the vertical distribution of a particular aerosol. The impact of an aerosol such as black carbon varies depending on its vertical distribution. If the black carbon is located near the surface, it will tend to heat the surface whereas if it is primarily located in the free troposphere, it will act to cool the surface. The vertical distribution of the rest of the aerosols that make up the total AOD also affects their impact, since AOD can increase by aerosol water uptake and the uptake will generally be larger if the aerosol is located in the boundary layer where the relative humidity is generally higher than it is in the free troposphere. Thus the AOD and the effect of black carbon are dependent on how the vertical transport is treated. There is a significant difference in the vertical distribution of aerosols when aerosols from different GCMs are compared with each other. In addition there are significant differences in AOD between different versions of the same GCMs when different horizontal resolutions are compared.This is due to the changes in the predicted relative humidity in higher resolution GCMs. The sources of uncertainty associated with aerosol models come from emissions, wet/dry removal, chemical production, and vertical transport. Scientists must address the biases and potential errors connected to grid resolution and also with the shortcomings in modeling some specific pro - cesses. Fortunately, it may be possible to use the extensive satellite data

OCR for page 32
 APPENDIX B to address some of these issues and to make improvements in modeling the processes that are not well parameterized. DATA ASSIMILATION AS A HIERARCHICAL STATISTICAL PROCESS, INTERACTING DYNAMICALLY WITH MODELING Christopher Wikle, Uniersity of Missouri A Bayesian approach to climate models has an advantage by address- ing processes that are non-Gaussian and nonlinear (i.e., geophysical pro - cesses in the climate system). From a statistical perspective, data assimi - lation is combining data with prior knowledge from sources such as mathematical models, other datasets, expert opinion, and others to gain an understanding of the true state of the system. A Bayesian perspective to retrospective data assimilation, combining information to create datas- ets for use in climate model initial conditions, is explored in this talk. For example, a wind dataset that can be replicated over time and space has uncertainty associated with the satellite observations and these uncertain- ties influence the weighted combination of the prior mean and the mean data as well as the outcome of the posterior data distributions. In order to minimize these uncertainties, physical properties of the system need to be improved and applied to the statistical representation. The fundamental challenges in data assimilation techniques include model complexity, model uncertainty, state process dimensionality, and data volume, which are all interrelated. The Bayesian hierarchical model methodology uses building blocks by separating the data and process variables (e.g., temperature, wind) in the model from the model parameters, which are quantities introduced in the model development, such as measurement error, to better understand the associated uncertainties. A critical question is quantifying the uncer- tainty in the process model. A common solution is to include an additive error term, but this does not represent all model uncertainty. Another option is to treat the model output as data to be used in a statistical model, however, in this approach the underlying dynamics of interacting state processes are ignored. A spectrum of models between deterministic and stochastic is the most useful and should be dependant on the goal of the analysis, the type of data, and the type of prior information that is avail - able in the study. The advantage is that the models can accommodate more complex dynamics, however the interpretability of the parameters suffers. Data volume is increasingly becoming problematic for better under- standing the physical processes that are interacting with the climate system and for quantifying the uncertainty in remotely sensed datasets. Datasets are large and, at times, of poor quality and may include unknown biases

OCR for page 32
 APPENDIX B and errors. This challenge requires the statistics and earth sciences com- munities to work together in this era of large datasets. Given that earth scientists and statisticians could learn a great deal from each other, it is unfortunate that they face poor communication across the two disciplines. Finally, funding is not adequate to support the type of collaboration that is needed to foster growth and improvement in these disciplines. THE PRACTICAL AND INSTITUTIONAL BARRIERS FOR MAKING PROGRESS ON DEVELOPING AND IMPROVING STATISTICAL TECHNIqUES FOR PROCESSING, VALIDATING, AND ANALYZING REMOTELY SENSED CLIMATE DATA Doug Nychka, National Center for Atmospheric Research A few significant obstacles that are most commonly felt by statisti - cians and geoscientists who deal with processing, validating, and ana - lyzing remotely sensed climate data include resistance to new ideas by members of the community, lack of funding for analysis following a satellite launch, and the need for more work on combining models with observations through data assimilation. This talk addresses these barriers from case studies in the community. A colleague working in the field of GPS meteorology noted that there are opportunities for statisticians and geoscientists to work together toward a common objective, but that community resistance can be prob- lematic. As an example, an atmospheric sounding project encountered strong community resistance which was not overcome until personal bridge-building was carried out between people who wanted to imple - ment various outputs from the proposed data analysis and people that were able to build and launch the satellite. In this circumstance, the bar- rier was lifted and progress was made. A second colleague, who works with the Measurements in Pollution of the Troposphere (MOPITT) instrument, raised a concern about the diffi- culty of obtaining sustained data-analysis efforts. The analysis work must be re-justified every few years after the initial funding period. Although there can be funding available to build and launch a new satellite-based instrument, there is less support to analyze the resultant data or to inte- grate one set of satellite retrevials with other sets of observations and data products. In addition, the expertise of those in the field may not be used to the maximum extent possible when it comes to analysis, perhaps because of the shortage of funding for an analysis study. One way to improve the chances of obtaining funding could be a virtual simulation facility that would allow scientists to test new instrument designs with the conditions represented by various global atmospheric/ocean models and with the

OCR for page 32
 APPENDIX B kinds of statistical methodology that would be used for retrievals. The most ambitious evaluation would be using data assimilation to combine a numerical model with the synthetic satellite measurements and other more standard geophysical data to assess the added benefit of the new instrument. This would be an efficient way to verify the importance of new instruments because it would account for the data analysis of the measurements and their coincidence or uniqueness with other types of data. A third scientist highlighted both opportunities and barriers that scientists face when they attempt to reconcile satellite data and model results. It is well known that climate is a long-term average of complicated geophysical processes. However, results from satellite data are on a much shorter time scale and are not suitable for directly determining climato - logical averages. When these data are compared to model results over a specific time period, the atmospheric component of a climate model is essentially being used to forecast weather. The use of weather forecasting techniques such as data assimilation to improve geophysical models is an emerging interdisciplinary approach that falls outside of traditional methods in climate science. This presents a conceptual challenge to scien - tists because the short-time-scale process information must be reconciled with the performance of the model in simulating long-term climate. Also they must learn how to use a new and more complicated statistical tool. The ensemble data assimilation methods used in this case study also allow for characterizing the statistical uncertainty of the analysis and so add another layer to the interpretation of the scientific results. This can also be seen as an opportunity to make significant improvements to the models and the physics behind them. Using data assimilation in this way creates an opportunity to use multiple instruments and parameters from remotely sensed data to improve upon model physics and dynamics. It essentially blurs the line between climate and weather models, which can be a beneficial way to improve both.