Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
B Modern Statistical Methocis and Weather Moclifi~cation Research Christopl~erK. [;Yikle Department of Statistics, University of A~Iissouri-Colun~b~a July 24, 2003 INTRODUCTION As discussed in the reports statistical science is important in the design, analysis and verification of weather modification experiments Given the complexity of the problem, the necessity to include statisticians in the planning and analysis of such experiments was recognized early in the history of weather modification. Indeed, many excellent and well-known statisticians have collaborated on such experiments over the years. In addition to improvements in deterministic modeling, fundamental science, and technology there have been tremendous strides in the statistical sciences over the past two decades as well. Given the importance of statistics to weather modification experiments, this is indeed a significant and relevant development. The aforementioned revolution in statistical methodology and computation has led to many new perspectives that were not available in past weather modifications research programs. For example, one will never be able to "randomize" effectively all sources of uncontrollable bias in weather modification experiments. Consequently, sophisticated statistical models have to be considered to explore potential significant effects. That is, one can now compare "treatment'' and "control'' environments from a spatio-te~nporal perspective rather than some potentially inappropriate summary over space/time/va~iate. Complicated (realistic) spatio-temporal statistical methodologies were either not available or could not be implemented in realistic settings until the 1990s. A simple analogy is that R. A. Fisher was aware of the effects of spatial dependence in nearby field plots in agricultural experi~nents. The computational and modeling technology did not exist at the time to adequately model such effects. Consequently, randomization was utilized to mitigate the effects of spatial correlation. However, just as blocking designs can improve efficiency over randomization one can get more etf~ciellt estimates by n~odelir~g the spatial (and spatio-te~nporal) effects (e.g., see Cressie, 19934. ' This appendix was added by request of the Committee to supplement the statistical discussion in the main body of the report. 107
10(9 A PPEATDIX B Statistical modeling theory has advanced significantly since the last major weather ~nodificatio~ initiative. In particular, in addition to advancements in spatial and spatio-ten~poral approaches, methodologies such as generalized additive models and generalized linear mixed remodels have proven to be quite powerful, and relevant. For example, the generalized linear mixed model fiamework allows for a broad class of data distributions (i.e.' one is not restricted to normality) and considers some function of the expected mean response to be the sum of a deterministic (i.e.! regression) component and a (correlated) random component, if needed. Thus, in addition to knower covariate effects in the deterministic component, unknown spatial' temporal or spatio-temporal effects can be considered explicitly as the random effects in this framework. This is critical as discussed above since weather modification experiments occur over space and time. Thus, this framework provides a natural way to incorporate the advancements in spatial statistics within a broader model-based analysis. Estimation for these models is performed by relatively computer intensive approximate numerical procedures. For an overview see McCulloch and Sear]e (2001) Perhaps an even more "revolutionary' development in statistics was the realization that Markov Chairs Monte Carlo (MCMC) methods could be used to implement Bayesian statistical models. Inspired by the use of such methods in image analysis by Geman and Geman (1984,, Gelfand and Smith (1990) realized that MCMC can be used as a general approach in which to implement Bayesian statistical models. This led to a dramatic increase in the types and complexity of,uroblems that can be modeled in this context. For an overview of the approach see Robert and Case]]a (19993. This development is critical to the science of weather modification for a couple of reasons. Firsts the Bayesian paradigm provides a natural statistical framework in which to explicitly account for ALL sources of uncertainty, be they data, model, of parameter uncertainties (e.g' Berliner ] 9961. Second, such models can be used to incorporate very complicated spatial and temporal dependence in the generalized linear mixed model fiamework discussed above with relative ease (e.g., Diggle et al., 1998~. Furthermore, one can include complicated physical insight (i.e., model physics) directly into this framework (Wikle et al.' 90011. This methodology is outlined in greater detail in the following section. HIERARCHICAL BAYESIAN MODELS The use of Bayesian ideas in weather modification is not new (e.g., see Olsen 1975), yet such ideas have not entered the mainstream of weather modification research. This is unfortunate, as the Bayesian paradigm is ideal for combining different sources of information (e.g., physics and data) and accounting for uncertainty. Common meteorological procedures such as found in data assimilation have long been recognized as inherently Bayesian in nature (e.g., Lorenc and Hammon, 19884. In addition, it has recently been recognized that one of the fundamental approaches to characterizing uncertainty in climate change assessment is Bayesian (e g., Berliner et al. 2000; Leroy 1998~. However, traditionally it has been difficult to model the full data, process, parameter distributions in general Tom the Bayesian perspective. Recently, it has been shown that hierarchical approaches to such models provide an ideal framework in which
A PPEArDIX B 109 to account for all such uncertainties in geophysical processes (e.g., Royle et al. 1999; Wikle et al., 20014. The hierarchical Bayesian statistical paradigm is based in probability theory (e.g., Berger, 1985; Bernardo and Smith' 19941. Assume we are interested in some process Y and we have observational data for this process, denoted by z . Furthermore there are parameters associated with our physical-statistical representation of the Y process, as well as the statistical model for the observations. The collection of Close parameters is denoted by ~ . A Bayesian hierarchical analysis develops a joint probability model for ah these variables as the product of a sequence of distributions; formally, Ez, Y. 9] - Liz ~ Y,03tY ~ 01~], (1) whence the be ackets ~ ~ denote probability distribution and vertical bars ~ identify conditional dependencies for a given process upon other processes and/or parameters. For example, Adz ~ Y. 0] denotes the distribution of the data z conditional on the process Y and parameters 9. The process distribution is then given by tY ~ Wand the parameter distribution by t0] . Learning about the unknown quantities of interest (e.g., Yand relies on the probability relationship (Bayes's Theorem): tY, ~ ~ z] oc Liz ~ Y. ARTY ~ 01ft)], (2) Pliers the constant of proportionality arises by integrating the right-hard side of (2) with r espect to Y and ~ . We can make use of physical relationships to aid in the specifications of the "prior distributions" typic] and t04. O'er ultimate interest is with the left-hand side (LHS) of (2), the so-called "posterior distribution." This distribution of the process and parameters given the data updates the prior formulations in light of the observed data. For instance, as shown by Royle et al. ( 19991, if the process consists of winds zip, v, and pressure P. we can exploit the geostrophic relationship, which would allow us to write a stochastic model for the wind field given the pressure field, tu'v ~ P. P] . Note that this is a stochastic relationship (i.e., a distribution)? which quantifies a source of variability with respect to deviations from the gradient relationship (e.g., ~ oc UP / By, v oc UP / fix ). We can model additional uncertainty by specifying distributions for the parameters ~ as well. For example, the geostrophic model suggests a parameter (to be included as an element of the vector ~ ~ that is proportional to the inverse product of the density times the Coriolis term. One Knight specify this as the prior expected value. A variance about ~ 1 ~ 1 1 .1 . , . . . , . .. 1 , , .. , .. .. , .. . . A. this expected value is then prescribed to generate a distribution tot this parameter. the net result is that with relatively simple physical and stochastic representations in tile sequence of conditional models (e.g., RHS of ~24), we can obtain a posterior distribution for u and v that has verb complicated spatial structure; one that, through the quantification of uncertainty, can "adapt" to a wide variety of observations and our prior knowledge of the geophysical system. Each stage of the hierarchical model (i.e., data, process, and parameter stages) can be further partitioned into subeomponents. This is critical in that it allows for inclusion of
110 A PPENDIX B many complications that are extremely difficult to accour~t for in traditional statistical implementations. Each stage is further discussed below. Data Models Datasets commonly considered for atmospheric processes are complicated and usually exhibit substantial spatial, temporal, or spatio-temporal dependence. The major advantage of modeling the conditional distribution of the data given the true process is that substantial simplifications in model form are possible. For examples let Za be data observed for some process Y. and let 0~ be parameters. The data model is written, (Za ~ Y,0~li. Usually, this conditional distribution is much simpler than the unconditional distribution of (z~] since most of the complicated structure comes front the process Y . Often, this model simply represents measurement e~^ror. Note that in this general framework the measurement error need not be adcl~tive. Furthermore, and perhaps more import-aptly! this framework can also accommodate data that is at a different resolution in space and/or time than tl~e process. This Framework also provides a natural way to combine datasets. For example, assume float Za and z, represent data front two different sources (e.g., rain gauge and tadar measurements of precipitation). Again, let Y be the process of interest (e.g., the tree precipitation process) and 0`,, 0~ be para~neters. In this case, the data remodel is often written (Z61'ZC ~ Y'0a'~c~l Ezra ~ Y?61~EZC ~ Y'~c] (~3~) Thus, conditioned on the true process, the data are assumed to be independent. Of course, taxis does not suggest that the two datasets are unconditionally independent. Rather, the majority ofthe dependence among the datasets is due to the process, Y. This assumption of independence is exactly that, an assumption. Although often very reasonable, it must be assessed critically for each problem. The conditional partitioning of the datasets in (3) is often similarly applied to multivariate models. That is, say our processes of interest are denoted Ya arid Y? with associated observations Zc, and zc. . One might write (Zu ~ ZC ~ Ya ~ Y ~ Flu ~ Tic ~ Lou ~ Y! ~ Ecu ~ (Zc ~ Yc ~ Tic ~ (4) Again, Ellis represents the assumption that given the true processes of interest, the datasets al e independent. Such an assumption must be evaluated and is not required ire hierarchical analysis, but it is often very reasonable and can lead to dramatic simplifications in the computations. Process Models It is usually the case that developing the process distribution is the Almost critical step in constructing the hierarchical model. This distribution is often further factored
A PPE,\7DIX B 111 hierarchically into a series of submodels. For example, assume the process of interest is composed of two subprocesses, Y arid Yc. Perhaps Y. represents precipitation for a geographical region and Ye might represent the state oil the atmospheric circulation over tile same region. Furtl~er~nore, define parameters By = Lily ,P, ~ that describe these two processes. One might consider the decomposition t('Y`'~']=~YC'Hy]tYe.~67~. -a ~ (5) Ells is just a fact of probability theory and can always be written. However, it may be the case that one can assume the parameters are conditionally independent in which case the right hand side of (5) can be written as tYu ~ Iffy ]tY Any ]. The challenge is the specification of these component distributions. Indeed, most of the effort in the development of hierarchical models is related to constructing these distributions. It is often the case, however, that there is very good scientific insight that can suggest appropriate conditioning older and possible models for the component distributions. For example, it is probably more seasonable to condition precipitation on the atmospheric circulation state variables, rather than the alternative. Similarly, Ya might represent the process of interest at time t and Yc the same process at the previous time, t - ~ . Natural deterministic models for process evolution could suggest the form of such models. Parameter Models The parameter distributions may require significant modeling effort. As is the case with the data and process models, the joint distribution of parameters is often partitioned into a product of marginal distributions. For examples consider the data model (4) and process model (5~. One must specify the parameter distribution tHa'Hc?9Y ,: ]. Often, one can malice reasonable independence assumptions regarding this distribution, e g ~ tH~ ~ dc ? EYE ~ 0~' ~ = t0u ~ t0C ~ t0~' ~ Icy ~ · Of course, this assumption must be justified. There are usually appropriate submodels for parameters as well' leading, to other levels of the model hierarchy. In many cases, for complicated processes. there is substantial scientific insight that can go into developing the parameter models (e.g., NVikle et al., 2001~. In other cases, one does not know much about the parameter distribution, suggesting '~vague priors" or data-based estimates be used. That is, it is often usefu] to blink empirically at first and perform exploratory data analysis in order to develop understanding about the process. The emphasis in this case is on model building. The development of parameter distributions has often been the focus of objections due to its implied subjectiveness. Of course, the formulation of the data and process models are quite subjective as well' but those choices have not generated as much concern? probably because such subjectiveness is just as much a part of classical model building as it is the Bayesiar~ approach. One must recognize that a strength of the hierarchical (Bayesian) approach is the quantification of such subjective judgment. Hierarchical models provide a coherent probabilistic framework ilk which to incorporate explicitly in the model the uncertainty related to j udgment, scientific reasoning' subjective decisions' and experience.
112 A PPE,N7DIX B EXPERIMENTAL DESIGN As indicated in the report, the proper statistical design of weather modification experi~nents is paramount. Advances in statistical modeling, some of which were outlined above? should be considered in this aspect of the problem as well. For example, there has been a significant amount of work considering the design of efficient monitoring networks in cases where the underlying process of interest is spatial. A nice recent review of such work can be found in Muller (2000~. In addition, in the context of spatio-temporal processes' work has been done to consider how one might gain efficiency by allowing monitoring networks to be dynamic in time (e.~., Wikle and Royle, 1999 J. Finally, fleece has been recent work related to utilizing the advantages of the Bayesian paradigm in the context of experimental design (e.g. Besag and Higdon, 1999J. Weather modification research could benefit from these advances. For example, experimental data from past weather modification experiments could be used to develop understanding of spatio- temporal dependencies in the atmospheric variables and constituents of interest. This understanding (prior knowledge) could then be expressed formally in terms of a statistical model. At that point, one could utilize a decision theoretic framework to optimize specific objectives. For example, one might be interested in determining the optimal location for rain gauges in order to maximize the ability to detect a significant diffidence in seeded precipitation over a given spatial region. It may be, in this example, that such a network would be optimized by allowing some monitors to be fixed and others to vary location at different times, depending on the underlying dynamical environment. The underlying framework presented here would suggest the optimal locations for such monitors. In each please ofthis analysis, modern model-based statistical methods could be used. Although such a model-based design perspective is advantageous, one could still use the model building and data analysis approach suggested here to analyze results front past experiments or from new experiments that were not designed from this perspective CONCLUSION In addition to new technological advances in the atmosphere ic sciences' substantial advances also have occurred in the statistical sciences over the past three decades. These developmentswhich have not yet been applied to weather modification -- could greatly improve the design analysis? and verification of experiments. With the appropriate combination of statistical, computational, and scientific advances, many of the uncertainties ill establishing the validity of weather modification research and operational results could be diminished. REFERENCES Berger, J. O., 1985. Statistical Decision Theory and Bayesian Analysis. New York: Springer-Verlag. Berliner, L. M., 1996. Hierarchical Bayesian tine series models. In Maximum Entropy and Bayesian Methods, K. Hanson and R. Silver (Eds.), Kl~wer Academic Publishers, 15-22.
A PPEIN7DIX B 113 Berliner, L. M., R. A. Levine, and D. J. Sleep. 2000. Bayesian climate change assessment. J. Climate 1 3:3 805-3 820. Bernardo, J. M., and A. F. M. Smith. ] 994. Bayesian Theory. New York: Wiley. Besag, J.' and D. Higdo~. 1999. Bayesian analysis of agricultural field experiments (with discussions. J. R. Stat. Soc. B 61 :691-746. Cressie, N. A. C. 1993. Statistics for Spatial Data, Revised Edition, Wiley, New York. Diggle, P. J., J. A. Tawn, and R. A. Moyeed. 1998. Model-based geostatistics (with discussion). Appl. Stat. 47:299-350. (relend A. 17. and A. F. M. Smith. 1990. Sampling-based approaches to calculating marginal densities. J. Ash. Stat. Assoc. 85:398-409. Geman, S., arid D. Geman. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. 6:72]-741. Leroy, S. S. 1998. Detecting climate signals: Some Bayesian aspects. J. Climate 1 1:640- 651. Lorenc, A., and O. l~a~nmon. 1988. Objective quality control of observations using Bayesian methods. Theory and a practical implementation. Q. J. Roy. Meteorol. Soc.114:515-543. McCulloch, C. Ed and S. R. Searle. 2001. Generalized, Linear, and Mixed Models. New York:Wiley. Muller, W. G. 2000. Collecting Spatial Data, 2nd Ed. Physica Verlag. Olsen, A. R. 1975. Bayesian and classical statistical methods applied to randomized Deadlier modification experiments. J. Appl. Meteorol. 14:970-973. Robert, C. P., and G. Casella. 1999. Monte Carlo Statistica] Metl~ods. New York: Springer. Royle' J. A., L. M. Berliner, C. K. Wikle, and R. Milliff. 1999. A hierarchical spatial model for constructing wind fields Prom scatte~ometer data in the Labrador Sea. Case Studies in Bayesian Statistics? eds. C. Gatsonis et al., pp.376-382. Springer- Verlag,. Wikle, C. K.' and J. A. Royle. 1999. Space-tine models and dynamic design of envi~onmentalmonitoring networks J Agri Biol Environ Stat 4:489-507 Wikle, C K., R. F. Milliff, D NychLa, arid L M. Berliner 2001 Spatiotemporal hierarchical Bayesian modeling: Tropical ocean surface winds J Am. Stat. Assoc 96:3 82-397