5


Model Validation and Prediction

5.1 INTRODUCTION

From a mathematical perspective, validation is the process of assessing whether or not the quantity of interest (QOI) for a physical system is within some tolerance—determined by the intended use of the model—of the model prediction. Although “prediction” sometimes refers to situations where no data exist, in this report it refers to the output of the model in general.

In simple settings validation could be accomplished by directly comparing model results to physical measurements for the QOI and computing a confidence interval for the difference, or carrying out a hypothesis test of whether or not the difference is greater than the tolerance (see Oberkampf and Roy, 2010, Chapter 12). In other settings, a more complicated statistical modeling formulation may be required to combine simulation output, various kinds of physical observations, and expert judgment to produce a prediction with accompanying prediction uncertainty, which can then be used for the assessment. This more complicated formulation can also produce predictions for system behavior in new domains where no physical observations are available (see Bayarri et al., 2007a; Wang et al., 2009; or the case studies of this chapter).

Assessing prediction uncertainty is crucial for both validation (which involves comparison with measured data) and prediction of yet-unmeasured QOIs. This uncertainty typically comes from a number of sources, including:

•  Input uncertainty—lack of knowledge about parameters and other model inputs (initial conditions, forcings, boundary values, and so on);

•  Model discrepancy—the difference between model and reality (even at the best, or most correct, model input settings);

•  Limited evaluations of the computational model; and

•  Solution and coding errors.

In some cases, the verification effort can effectively eliminate the uncertainty due to solution and coding errors, leaving only the first three sources of uncertainty. Likewise, if the computational model runs very quickly, one could evaluate the model at any required input setting, eliminating the need to estimate what the model would have produced at an untried input setting.

The process of validation and prediction, explored in previous publications (e.g., Klein et al., 2006; NRC, 2007, Chapter 4), is described in this chapter from a more mathematical perspective. The basic



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 52
5 Model Validation and Prediction 5.1 INTRODUCTION From a mathematical perspective, validation is the process of assessing whether or not the quantity of inter- est (QOI) for a physical system is within some tolerance—determined by the intended use of the model—of the model prediction. Although “prediction” sometimes refers to situations where no data exist, in this report it refers to the output of the model in general. In simple settings validation could be accomplished by directly comparing model results to physical measure - ments for the QOI and computing a confidence interval for the difference, or carrying out a hypothesis test of whether or not the difference is greater than the tolerance (see Oberkampf and Roy, 2010, Chapter 12). In other settings, a more complicated statistical modeling formulation may be required to combine simulation output, various kinds of physical observations, and expert judgment to produce a prediction with accompanying predic - tion uncertainty, which can then be used for the assessment. This more complicated formulation can also produce predictions for system behavior in new domains where no physical observations are available (see Bayarri et al., 2007a; Wang et al., 2009; or the case studies of this chapter). Assessing prediction uncertainty is crucial for both validation (which involves comparison with measured data) and prediction of yet-unmeasured QOIs. This uncertainty typically comes from a number of sources, including: • Input uncertainty—lack of knowledge about parameters and other model inputs (initial conditions, forc- ings, boundary values, and so on); • Model discrepancy—the difference between model and reality (even at the best, or most correct, model input settings); • Limited evaluations of the computational model; and • Solution and coding errors. In some cases, the verification effort can effectively eliminate the uncertainty due to solution and coding errors, leaving only the first three sources of uncertainty. Likewise, if the computational model runs very quickly, one could evaluate the model at any required input setting, eliminating the need to estimate what the model would have produced at an untried input setting. The process of validation and prediction, explored in previous publications (e.g., Klein et al., 2006; NRC, 2007, Chapter 4), is described in this chapter from a more mathematical perspective. The basic 52

OCR for page 52
53 MODEL VALIDATION AND PREDICTION process includes identifying and representing key sources of uncertainty; identifying physical observations; experiments, or other information sources for the assessment; assessing prediction uncertainty; assessing the reliability or quality of the prediction; supplying information on how to improve the assessment; and com - municating results. Identifying and representing uncertainties typically involves sensitivity analysis to determine which features or inputs of the model affect key model outputs. Once they are identified, one must determine how best to represent these important contributors to uncertainty—parametric representations of input conditions, forcings, or physical modeling schemes (e.g., turbulent mixing of fluids). In addition to parametric forms, some analyses might assess the impact of alternative physical representations/schemes within the model. If solution errors or other sources of model discrepancy are likely to be important contributors to prediction uncertainty, their impact must also be captured in some way. The available physical observations are key to any validation assessment. In some cases these data are observa- tional, provided by nature (e.g., meteorological measurements, supernova luminosities); in other cases, data come from a carefully planned hierarchy of controlled experiments—e.g., the Predictive Engineering and Computational Sciences (PECOS) case study in Section 5.9. In addition to physical observations, information may come from the literature or expert judgment that may incorporate historical data or known physical behavior. Estimating prediction uncertainty requires the combination of computational models, physical observations, and possibly other information sources. Exactly how this estimation is carried out can range from very direct, as in the weather forecasting example in Figure 5.1, to quite complicated, as described in the case studies in this chapter. In these examples, some physical observations are used to refine or constrain uncertainties that contribute to prediction uncertainty. Estimating prediction uncertainty is a vibrant research topic whose methods vary depend- ing on the features of the problem at hand. For any prediction, assessing the quality, or reliability, of the prediction is crucial. This concept of prediction reliability is more qualitative than is prediction uncertainty. It includes verifying the assumptions on which an estimate is based, examining the available physical measurements and the features of the computational model, and applying expert judgment. For example, well-designed sets of experiments can lead to stronger statements regarding the quality and reliability of more extrapolative predictions, as compared to observational data from a single source. Here the concept of “nearness” of the physical observations to the predictions of the intended use of the model becomes relevant, as does the notion of the domain of applicability for the prediction. However, FIGURE 5.1 Daily maximum temperatures for Norman, Oklahoma (left), and histograms of next-day prediction errors (right) using two prediction models. TheFigure 5.1a.eps residuals from the persistence model, predicting tomorrow’s high Figure 5.1b.eps top histogram shows bitmap temperature with today’s high temperature. The bottom histogram shows residuals from the NationalbitmapService (NWS) Weather forecast. Ninety percent of the actual temperatures are within ±14oF for the persistence-model forecasts and within ±6°F for the NWS forecasts. The greater accuracy of the NWS forecasts is due to NWS’s use of computational models and additional meteorological information. The assessment of these two forecast methods is relatively straightforward because of the large number of comparisons of model forecast to measurement. SOURCE: Data from Brooks and Doswell (1996).

OCR for page 52
54 ASSESSING THE RELIABILITY OF COMPLEX MODELS while most practitioners recognize that this concept and notion are important, rigorous mathematical definitions and quantifications remain an unsolved problem. In some validation applications, an opportunity exists to carry out additional experiments to improve the prediction uncertainty and/or the reliability of the prediction. Estimating how different forms of additional infor - mation would improve predictions or the validation assessment can be an important component of the validation effort, guiding decisions about where to invest resources in order to maximize the reduction of uncertainty and/ or an increase in reliability. Communicating the results of the prediction or validation assessment includes both quantitative aspects (the predicted QOI and its uncertainty) and qualitative aspects (the strength of the assumptions on which the assessment is based). While the communication component is not fundamentally mathematical, effective communication may depend on mathematical aspects of the assessment. The various tasks mentioned in the preceding paragraphs give a broad outline of validation and prediction. Exactly how these tasks are carried out depends on features of the specific application. The list below covers a number of important considerations that will have an impact on the methods and approaches for carrying out validation and prediction: • The amount and relevance of the available physical observations for the assessment, • The accuracy and uncertainty accompanying the physical observations, • The complexity of the physical system being modeled, • The degree of extrapolation required for the prediction relative to the available physical observations and the level of empiricism encoded in the model, • The computational demands (run time, computing infrastructure) of the computational model, • The accuracy of the computational model’s solution relative to that of the mathematical model (numerical error), • The accuracy of the computational model’s solution relative to that of the true, physical system (model discrepancy), • The existence of model parameters that require calibration using the available physical observations, and • The availability of alternative computational models to assess the impact of different modeling schemes or physics implementations on the prediction. These considerations are discussed throughout this chapter, which describes key mathematical issues asso - ciated with validation and prediction, surveying approaches for constraining and estimating different sources of prediction uncertainty. Specifically, the chapter briefly describes issues regarding measurement uncertainty (Section 5.2), model calibration and parameter estimation (Section 5.3), model discrepancy (Section 5.4), and the quality of predictions (Section 5.5), focusing on their impact on prediction uncertainty. These concepts are illuminated by two simple examples (Boxes 5.1 and 5.2) that extend the ball-drop example in Chapter 1, and by two case studies (Sections 5.6 and 5.9). Leveraging multiple computational models (Section 5.7) and multiple sources of physical observations (Section 5.8) is also covered, as is the use of computational models for aid in dealing with rare, high-consequence events (Section 5.10). The chapter concludes with a discussion of promising research directions to help address open problems. 5.1.1 Note Regarding Methodology Most of the examples and case studies presented in this chapter use Bayesian methods (Gelman et al., 1996) to incorporate the various forms of uncertainty that contribute to the prediction uncertainty. Bayesian methods require a prior description of uncertainty for the uncertain components in a formulation. The resulting estimates of uncertainty—for parameters, model discrepancy, and predictions—will depend on the physical observations and the details of the model formulation, including the prior specification. This report does not go into such details but points to references on modeling and model checking from a Bayesian perspective (Gelman et al., 1996; Gelfand and Ghosh, 1998). While the Bayesian approach is prevalent in the VVUQ literature, effectively dealing with many

OCR for page 52
55 MODEL VALIDATION AND PREDICTION Box 5.1 The Ball-Drop Experiment Using a Variety of Balls In addition to the measurements of drop times for the bowling ball, we now have measurements for a basketball and baseball as well. The measured drop times are normally distributed about the true time, with a standard deviation of 0.1 seconds. The QOI is the drop time for the softball—an untested ball—at a height of 100 m. This QOI is an extrapolation in two ways: no drops over 60 m have been carried out; no measurements have been obtained for a softball. The conceptual/mathematical model (Figure 5.1.1(b)) accounts for acceleration due to gravity g and air resistance using a standard model. Air resistance depends on the radius and density of the ball (Rball, ρball), as well as the density of the air (ρair). Figure 5.1.1(a) shows various balls and their position in radius- density space. It is assumed that air density is known. In addition to depending on the descriptors of the ball (Rball, ρball), the model also depends on two parameters—the acceleration of gravity g and a dimensionless friction coefficient CD—which need to be constrained with measurements. Initial ranges of 8 ≤ g ≤ 12 and 0.2 ≤ CD ≤ 2.0 are specified for the two model parameters. Measured drop times from heights of 20, 40, and 60 m are obtained for the basketball and baseball; measured drop times from heights of 10, 20, . . . , 60 m are obtained for the bowling ball. These measurements constrain the uncertainty of the parameters to the ellipsoidal region shown in Figure 5.1.1(c). Figure 5.1.1(d) shows initial and constrained prediction uncertainties for the four different balls using the mathematical model in Figure 5.1.1(b). The light lines correspond to the parameter settings depicted by the points in Figure 5.1.1(c). The dark region shows prediction uncertainty induced by the constrained uncertainty for the parameters. A prediction (with uncertainty) for the softball is given by the spread of the dark region of the rightmost frame. However, the model has never been tested (a) (b) against drops higher than bowling 60 m. It has also never golf been directly compared to any softball drops. From Figure 5.1.1(a), one could argue that the softball is light bowling (c) at the interior of the (Rball, density baseball ρball)-space spanned by softball the basketball, baseball, tennis and bowling ball, leading one to trust the predic- basketball tion (and uncertainty) for the softball at 40 m, or radius even 100 m. However, the softball differs from (d) these other balls in more ways than just radius and density (e.g., sur- face smoothness). How should one modify pre- dictions and uncertain- ties to account for these flavors of extrapolation? This is an open question in V&V and UQ research. FIGURE 5.1.1

OCR for page 52
56 ASSESSING THE RELIABILITY OF COMPLEX MODELS Box 5.2 Using an Emulator for Calibration and Prediction with Limited Model Runs Physical measurements (black dots) and prior prediction uncertainty (green lines) for the bowling ball drop time as a function of height (as shown in Box 1.1). The experimentally measured drop times for drops of 10, 20, . . . , 50 m are shown in Figure 5.2.1(a); the uncertainty due to prior uncertainty for gravity g is also shown in the inset figure. If the number of computer model runs were lim- θ=g ited—perhaps due to computational constraints— then an ensemble of runs could be carried out at different (x, θ) input settings. Figure 5.2.1(b) shows model runs carried out over a statistical design of 20 input settings. Here x denotes height and θ de- notes the model parameter g. The modeled drop times at these input settings are given by the height of the circle plotting symbols in Figures 5.2.1(a) and (b). With these 20 computer model runs, a Gauss- θ=g ian process is used to produce a probabilistic pre- diction of the model output at untried input settings (x, θ), as shown in Figure 5.2.1(c). This emulator is used to facilitate the computations required to estimate the posterior distribution for θ, which is constrained by the physical observations. The Bayesian model formulation, with an emulator to assist with limited model runs, pro- duces a posterior distribution for the unknown parameter θ (g, given by the blue lines of the inset in Figure 5.2.1(d)), which then can be propagated through the emulator to produce constrained, pos- terior prediction uncertainties (blue lines). FIGURE 5.2.1

OCR for page 52
57 MODEL VALIDATION AND PREDICTION issues discussed here, the use of these methods in the examples and case studies in this chapter should not be seen as an exclusive endorsement of Bayesian methods over other approaches for calculating with and representing uncertainty, such as likelihood (Berger and Wolpert, 1988), Dempster-Shafer theory (Shafer, 1976), possibility theory (Dubois et al., 1988), fuzzy logic (Klir and Yuan, 1995), probability bounds analysis (Ferson et al., 2003), and so on. The committee believes that relevance of the main issues discussed in this chapter is not specific to the details of how uncertainty is represented. 5.1.2 The Ball-Drop Example Revisited To elaborate these ideas, an extension of the simple ball-drop example from Box 1.1 in Chapter 1 is used; the experiment here includes multiple types of balls (Box 5.1). Drop times for balls of various radii and densities are considered. The basic model that assumes only acceleration due to gravity is clearly insufficient when considering balls of various sizes and densities, suggesting the need for a model that explicitly accounts for drag due to air friction. This new model describes initial conditions for a single experiment, with the radius of the ball Rball and the density of the ball ρball. The model also has two parameters—the acceleration due to gravity, g, and a friction constant, CD—that can be further constrained, or calibrated, using experimental measurements. Of course, treating the acceleration due to gravity g as something uncertain may not be appropriate in a serious application, since this quantity has been determined experimentally with very high accuracy. The motivation for treating g as uncertain is to illustrate issues regarding uncertain physical constants, which are common in many applications. Using measured drop times for three balls—a bowling ball, baseball, and basketball—the object is to predict the drop time for a softball at 100 meters (m). Hence the QOI is the drop time for a softball dropped from a height of 100 m. Drops are conducted from a 60 m tower. The required prediction is an extrapolation in two ways: no drops over 60 m have been carried out, and no drop-time measurements have been obtained for the softball. Sec - tion 5.5 looks more closely at how validation and UQ approaches depend on the availability of measurements and the degree of extrapolation associated with the prediction. Initially, the uncertainty about the uncertain model parameters is that 8 < g < 12, and 0.2 < CD < 2, which is given by the equation in Figure 5.1.1(b). Model predictions can be made using various ( g, CD) values over this region (the dots in Figure 5.1.1(c)); the resulting drop-time predictions are given by the light lines in Figure 5.1.1(d). This uncertainty is obtained by simple forward propagation of the uncertainty in g and CD, as described in Sec- tion 4.2 in Chapter 4. If the validation assessment were a question of whether or not the model can predict the 100 m softball drop time to within ± 2 seconds, or whether the drop time will be larger than 10 seconds, this preliminary assessment might be sufficient. If more accuracy is required, the uncertainty in the parameters ( g, CD) can be further constrained using the observed drop times for the different balls, as given by the ellipse in Figure 5.1.1(c), showing a 95 percent probability range for (g, CD). This process of constraining parameter uncertainties using experimental measurements is called model calibration, or parameter estimation, and is discussed in more detail in Section 5.3. Physical measurements are uncertain, each giving an imperfect interrogation of the physical system, and this uncer - tainty affects how tightly these measurements constrain parameter uncertainty. Measurement uncertainty also plays an important role in the comparison of model prediction to reality. This topic is discussed briefly in Section 5.2. Although the ball-drop example used here does not show any evidence of a systematic discrepancy between model and reality, such discrepancies are common in practice. Once identified and quantified, systematic model discrepancy can be accounted for to improve the model-based predictions (e.g., a computational-model predic - tion that is systematically 10 percent too low for a given QOI can simply be adjusted up by 10 percent to predict reality more accurately). Section 5.4 discusses the related idea of making the best predictions that one can with an imperfect model (and quantifying their uncertainties), embedded within a statistical framework aided by subject- matter knowledge and available measurements. The relevant body of knowledge in the ball-drop example consists of measurements from three basketball drops, three baseball drops, and six bowling-ball drops, along with the mathematical and computational models. The friction term in the model is an effective physics model, slowing the ball as it drops and attempting to capture small-scale effects of airflow around the ball. Experience suggests that the friction constant depends on the veloc - ity and smoothness of the ball, as well as on properties of the air. Ideally, part of the assessment of the uncertainty

OCR for page 52
58 ASSESSING THE RELIABILITY OF COMPLEX MODELS about the QOI (softball drop time from 100 m) will include at least a qualitative assessment of the appropriateness of using this form of friction model, with a single value for CD, for these drops. This notion of assessing the reli- ability, or quality, of a model-based prediction is discussed in Section 5.5. More generally, the body of knowledge could include a variety of information sources, ranging from experi - mental measurements to expert judgment, to results from related studies. Some of these information sources may be used explicitly, constraining parameter uncertainties, estimating variances, or describing prediction uncertainties. Other information sources might lend evidence to support assumptions used in the analysis, such as the adequacy of the model for predictions that move away from the conditions in which experimental measurements are available. Ideally, the domain of applicability for this model in predicting drop times of various balls will also be speci - fied. For example, given the current body of knowledge, a conservative domain of applicability might include only basketballs, baseballs, and bowling balls dropped from heights between 10 m and 60 m. In this case, one would not be willing to use the model-based uncertainty given in Box 5.1 to characterize the drop time for a softball at 40 m, let alone 100 m. A more liberal definition of the domain of applicability might be any ball with a radius-density combination in the interior of the basketball-baseball-bowling ball triangle in Figure 5.1.1(a). Alternatively, one might also consider what perturbations of a basketball, say, would be included in this domain of applicability. Should predictions and uncertainties for a slightly smaller basketball be trusted? What about a slightly less dense basketball? At what density should the predictions and uncertainties no longer be trusted? Put differently, can we assess what perturbations of a basketball are sufficiently “near” to the tested basketball to result in accurate predictions and uncertainty estimates? Often, a sensitivity analysis (SA) can help address the question—this example informing trust in model-based predictions and uncertainties for balls as density decreases. One might also consider conditions that are not accounted for in the model. For example, should the drop times of a rubber basketball differ from those of a leather one? Does ball texture affect drop time? Without additional experiments, such model-applicability issues must necessarily be addressed with expert judgment or other information sources. Quantifying the impact of such issues remains an unsolved problem. In general, the domain of applicability describes the conditions over which the predictions and uncertainties derived from a computational model are reliable. This should include descriptors of the initial conditions that are accounted for in the model, as well as those that are not. It might also include descriptors of the geometric and/ or physical complexity of the system for which the prediction is being made. Such considerations are crucial for designing a series of validation experiments to help map out this domain of applicability. Defining this domain of applicability depends on the available body of knowledge, including subject-matter expertise, and involves a number of qualitative features about the inference being made. 5.1.3 Model Validation Statement In summary, validation is a process, involving measurements, computational modeling, and subject-matter expertise, for assessing how well a model represents reality for a specified QOI and domain of applicability. Although it is often possible to demonstrate that a model does not adequately reproduce reality, the generic term validated model does not make sense. There is at most a body of evidence that can be presented to suggest that the model will produce results that are consistent with reality (with a given uncertainty). Finding: A simple declaration that a model is “validated” cannot be justified. Rather, a validation statement should specify the QOIs, accuracy, and domain of applicability for which it applies. The body of knowledge that supports the appropriateness of a given model and its ability to predict the QOI in question, as well as the key assumptions used to make the prediction, is important information to include in the reporting of model results. Such information will allow decision makers to better understand the adequacy of the model, as well as the key assumptions and data sources on which the reported prediction and uncertainty rely. The degree to which available physical data are relevant to the prediction of interest is a key concept in the V&V literature (Easterling, 2001; Oberkampf et al., 2004; Klein et al., 2006). How one uses the available body of

OCR for page 52
59 MODEL VALIDATION AND PREDICTION knowledge to help define this domain of validity is part of how the argument for trust in model-based prediction is constructed. This topic is explored further in Section 5.5. 5.2 UNCERTAINTIES IN PHYSICAL MEASUREMENTS Throughout this chapter, reference is continually made to learning about the computational model and its uncertainties through comparing the predictions of the computational model to available physical data relevant to the QOI. A complication that typically arises is that the physical measurements are themselves subject to uncertain- ties and possibly bias. In the ball-drop example in Box 5.1, for instance, there were three multiple observations for each type of ball drop, and these were believed to be normally distributed, centered at the true drop time and with standard deviation of 0.1 seconds. The uncertainty in the physical measurements was part of the reason that the parameters in the example were constrained only to the ellipse in Figure 5.1.1(c) and not to a smaller area. Although the characterization of such measurement uncertainty is often a crucial part of a VVUQ analysis, the issue is not highlighted in this report because such characterization is the standard domain of statistics, and vast methodology and experience exist for characterizing such uncertainty (Youden, 1961, 1972; Rabinovich, 1995; Box et al., 2005). However, there are several issues that must be kept in mind when obtaining physical data for use in VVUQ analyses. For experiments that have not yet been performed, the design of the experiment for collecting the physical data should be developed in cooperation with the VVUQ analyst and the decision maker to provide maximum VVUQ benefit when practical. Experimental data are often expensive (as when each data point arises from crashing a prototype vehicle, for instance) and should be chosen to provide optimal information from the perspective of the desired calibration, VVUQ analysis, and/or the prediction for the computational model. One particularly relevant consideration in the context of VVUQ is the desirability of replications 1 of the physical measurements—that is, of obtaining repeat measurements under the same conditions (same model-input values). This might seem counterintuitive from the perspective of the computational model; if the analyst is trying to judge how well the model predicts reality, observing reality at as many input values as possible would seem logical. When the physical data are subject to measurement error, however, the picture changes, because it is first crucial to learn how well the physical data represent reality. If the physical data do not constrain reality signifi - cantly at any input values, little has been learned that will help in judging the fidelity of the computational model with respect to reality. If the measurement error of the physical data and variability of the physical system are known (e.g., the data has a known standard deviation) and are judged to be small enough to adequately constrain reality, then replicate observations are perhaps not needed. However, it is wise to view the presumption of known standard deviation with healthy skepticism. When the magnitude of the measurement error is derived from the properties of measurement apparatus and theoretical considerations, it is common to miss important sources of variation and bias that are present in the measurement process. Hence, resources may be better spent obtaining replicate observations, rather than attempting to account for every possible source of uncertainty present in a single measurement/experiment. One may be able to afford only enough physical data with replications to adequately constrain reality at a few input values, but knowing reality, with accurately quantified uncertainty, at a few input values is often better than having a vague idea about reality at many input values. One does not always have control over the process of obtaining physical measurements. They may have been based on historical experiments or observations, for which important details may be unknown. They may have arisen from auxiliary inverse-problem analyses (e.g., inferring a quantity such as temperature or contaminant concentration from remotely sensed signals). This inexactness can be problematic from a number of perspectives, including the possibility that uncertainties in the physical data may have been estimated poorly, or not given at 1 Here we mean genuine replicates as described in Box and Draper (1987, p. 71): “Replicate runs must be subject to all the usual setup errors, sampling errors, and analytical errors which affect runs made at different conditions. Failure to achieve this will typically cause underestimation of the error and will invalidate the analysis.”

OCR for page 52
60 ASSESSING THE RELIABILITY OF COMPLEX MODELS all. In such cases it may be fruitful to include this auxiliary inverse problem as part of the validation and predic - tion process. A significant issue that can arise is possible bias in the physical data, wherein a common error induces a similar effect on all of the measurements. In the ball-drop example, for instance, a bias in the physical observa - tions would be present if the stopwatch used to time all of the drops were systematically slow. Similarly, if each ball were released with a slight downward velocity, then measured drop times would be systematically too short. The methodological issue of how to incorporate uncertainty in the physical data into the UQ analysis is also important. Standard statistical techniques can allow one to summarize the physical data in terms of the constraints that they place on reality, but a VVUQ analysis requires interfacing this uncertainty with the computational model, especially if calibration is also being done based on the physical data. Bayesian analysis (discussed in Section 5.3) has the appeal of providing a direct methodology for such incorporation of uncertainty. 5.3 MODEL CALIBRATION AND INVERSE PROBLEMS Many applications in VVUQ use physical measurements to constrain uncertain parameters in the computa - tional model. A simple example is given in Figure 5.1.1(c), in which measured drop times are used to reduce the uncertainty in the two model parameters—g and CD. This basic task of model calibration is a standard problem in statistical inference. Model calibration applications may involve parameters ranging from one or two, as in Box 5.1, to thousands or millions, as is often the case when one is inferring heterogeneous fields (material properties, initial conditions, or source terms—e.g., Akçelik et al., 2005). The problem of estimating from observations the uncertain parameters in a simulation model is fundamentally an inverse problem. The forward problem seeks to predict output observables (such as seismic ground motion at seismometer locations) given the parameters (such as the heterogeneous elastic-wave speeds and density throughout a region of interest) by solving the governing equations (such as the elastic-wave equations). The forward problem is usually well posed (the solution exists, is unique, and is stable to perturbations in inputs), causal (later-time solutions depend only on earlier-time solutions), and local (the forward operator includes derivatives that couple nearby solutions in space and time). The inverse problem reverses this relationship, however, by seeking to determine parameter values that are con- sistent with particular measurements. Solving inverse problems can be very challenging for the following reasons: (1) the mapping from observations (i.e., measurements) to parameters may not be one to one, particularly when the number of parameters is large and the number of measurements is small; (2) small changes in the measurement value may lead to changes in many or all parameters, particularly when the forward model is nonlinear; and (3) typically, all that is available to the analyst is a computational model that approximately solves the forward problem. In simple model calibration, or inverse problems, post-calibration parameter uncertainty can be described by a “best estimate” of uncertainty determined by a covariance matrix, characterizing variance and correlations in the parameter uncertainties. When the solution to the inverse problem is not unique, and/or when the measurement errors have a nonstandard form, determining even a best estimate can be problematic. The popular approach to obtaining a unique “solution” to the inverse problem in these circumstances is to formulate it as an optimization problem—minimize the sum of two terms: the first is a combination of the misfit between observed and predicted outputs in an appropriate norm, and the second is a regularization term that penalizes unwanted features of the parameters. This is often called Occam’s approach—find the “simplest” set of parameters that is consistent with the measured data. The inverse problem thus leads to a nonlinear optimization problem in which the forward simulation model is embedded in the misfit term. When the forward model takes the form of partial differential equations (PDEs) or some other expensive model, the result is an optimization problem that may be extremely large scale in the state variables (displacements, temperatures, pressure, and so on), even when the number of inversion parameters is small. More generally, uncertain parameters can be taken from numbers on a continuum (such as initial or boundary conditions, heterogeneous material parameters, or heterogeneous sources) that, when discretized, result in an inverse problem that is very large scale in the inversion parameters as well. An estimation of parameters using the regularization approach to inverse problems as described above will yield an estimate of the “best” parameter values that minimize the combined misfit and penalty function.

OCR for page 52
61 MODEL VALIDATION AND PREDICTION However, in UQ, the analyst is interested not just in point estimates of the best-fit parameters but also in a com - plete statistical description of all parameter values that are consistent with the data. The Bayesian approach does this by reformulating the inverse problem as a problem in statistical inference , incorporating uncertainties in the measurements, the forward model, and any prior information about the parameters. The solution of this inverse problem is the set of so-called posterior probability densities of the parameters, describing updated uncertainty in the model parameters (Kaipio and Somersalo, 2005; Tarantola, 2005). Thus the resulting uncertainty in the model parameters can be quantified, taking into account uncertainties in the data, uncertainties in the model, and prior information. The term parameter is used here in the broadest sense and includes initial and boundary conditions, sources, material properties and other coefficients of the model, and so on; indeed, Bayesian methods have been developed to infer uncertainties in the form of the model as well (so-called structural uncertainties or model inade- quacy are discussed in Section 5.4). The Bayesian solution of the inverse problem proceeds as follows. Let the relationship between model predic - tions of observable outputs y and uncertain input parameters θ be denoted by y = f(θ, e) where e represents noise due to measurement and/or modeling errors. In other words, given the parameters θ, the function f (θ) invokes the solution of the forward problem to yield y, the predictions of the observables. Suppose that the analyst has a prior probability density πpr(θ), which encodes the prior information about the unknown parameters (i.e., independent of information from the present observations). Suppose further that the analyst can build—using the computational model—the likelihood function π (yobs|θ), which describes the conditional probabil- ity that the parameters θ gave rise to the actual measurements yobs. Then Bayes’s theorem expresses the posterior probability density of the parameters, πpost, given the data yobs, as the conditional probability πpost(θ): = π(θ‫׀‬y) ∝ πpr(θ)π(yobs‫׀‬θ) (5.1) The expression (5.1) provides the statistical solution of the inverse problem as a probability density for the model parameters θ. Although it is easy to write down expressions for the posterior probability density such as expression 5.1, making use of these expressions poses a challenge owing to the high dimensionality of posterior probability den - sity (which is a surface of dimension equal to the number of parameters), and because the solution of the forward problem is required at each point on this surface. Straightforward grid-based sampling is out of the question for anything other than a few parameters and inexpensive forward simulations. Special sampling techniques, such as Markov chain Monte Carlo (MCMC) methods, have been developed to generate sample ensembles that typically require many fewer points than are required for grid-based sampling (Kaipio and Somersalo, 2005; Tarantola, 2005). Even so, MCMC approaches become intractable as the complexity of the forward simulations and the dimension of the parameter spaces increase. The combination of a high-dimensional parameter space and a forward model that takes hours to solve makes standard MCMC approaches computationally infeasible. As discussed in Chapter 4, one of the keys to overcoming this computational bottleneck lies in examining the details of the forward model and effectively exploiting its structure in order to reduce implicitly or explicitly the dimension of both the parameter space and the state space. The motivation for doing so is that the data are often informative about just a fraction of the “modes” of the parameter field, because the inverse problem is ill-posed. Another way of saying this is that the Jacobian of the parameter-to-observable map is typically a compact operator and thus can be represented effectively using a low-rank approximation—that is, it is often sparse with respect to some basis (Flath et al., 2011). The remaining dimensions of parameter space, which cannot be inferred from the data, are typically informed by the prior; however, the prior does not require the solution of expensive forward problems and is thus usually much cheaper to compute. Compactness of the parameter-to-observable map suggests that the state space of the forward problem can be reduced as well. Note that although generic, regularizing priors (e.g., Besag et al., 1995; Kaipio et al., 2000; Oliver et al., 1997) make posterior exploration possible, giving useful point estimates, they may not adequately describe the uncertainty in the actual field. This is common when the

OCR for page 52
62 ASSESSING THE RELIABILITY OF COMPLEX MODELS physical field exhibits roughness or discontinuities that are not allowed under the prior model used in the analysis. In such cases, the uncertainties produced from such an analysis will not be appropriate at small spatial scales. Such difficulties can be overcome by specifying more realistic priors. A number of current approaches to model reduction for inverse problems show promise. These range from Gaussian process (GP) response-surface approximation of the parameter-to-observable map (Kennedy and O’Hagen, 2001); to projection-type forward-model reductions (Galbally et al., 2010; Lieberman et al., 2010); to polynomial chaos (PC) approximations of the stochastic forward problem (Badri Narayanan and Zabaras, 2004; Ghanem and Doostan, 2006; Marzouk and Najm, 2009); to low-rank approximation of the Hessian of the log- posterior (Flath et al., 2011; Martin et al., in preparation2). Approaches that exploit multiple model resolutions have also proven effective for speeding up MCMC in the presence of a computationally demanding forward model (Efendiev et al., 2009; Christen and Fox, 2005). An alternative to using the standard MCMC methods on the computer model directly is to use an emulator (see Section 4.1.1, Computer Model Emulation) in its place. In many cases, this approach alleviates the computational bottleneck caused by solving the inverse problem by applying MCMC to the computer model directly. Box 5.2 shows how an emulator can reduce the number of computer model runs for the bowling ball drop application in Box 5.1. Here the measured drop times are governed by the unknown parameters, q (the acceleration due to gravity g, for this example), and also quantities, x, that can be measured or adjusted in the physical system. For this example x denotes drop height, but more generally x might describe system geometry, initial conditions, or boundary con - ditions. The relationship between observable outputs and uncertain input parameters q, at a particular x, is now denoted by yobs = η(x,q) + e (5.2) where e denotes the measurement error. The computer model is exercised at a limited number of input configura - tions (x,θ), shown by the dots in Figures 5.2.1(a), (b), and (c). Next, an emulator of the computational model can be constructed and used in place of the simulator (Figure 5.2.1(b)). Alternately, the construction of the emulator and estimation of θ can be done jointly using a hierarchical model that specifies, say, a GP model for η( ) and treats the estimation of θ as a missing-data problem. Inferences about the parameter θ, for example, can be made using its posterior probability distribution, usually sampled by means of MCMC (Higdon et al., 2005; Bayarri et al., 2007a). The physical observations and the computational model can be combined to estimate the parameter θ, thereby constraining the predictions of the computational model. Looking again at Figure 5.2.1(c), the probability density function (PDF) (shown by the solid curve in the center) shows the updated uncertainty for θ after combining the computational model with the physical observations. Clearly, the physical observations have greatly improved the knowledge of the unknown parameter, reducing the prediction uncertainty in the drop time for a bowling-ball drop of 100 m. Finding: Bayesian methods can be used to estimate parameters and provide companion measures of uncertainty in a broad spectrum of model calibration and inverse problems. Methodological challenges remain in settings that include high-dimensional parameter spaces, expensive forward models, highly nonlinear or even discontinuous forward models, and high-dimensional observables, or in which small probabilities need to be estimated. Recommendation: Researchers should understand both VVUQ methods and computational modeling to more effectively exploit synergies at their interface. Educational programs, including research programs with graduate- education components, should be designed to foster this understanding. 2 Martin, J., L.C. Wilcox, C. Burstedde, and O. Ghattas, A Stochastic Newton MCMC Method for Large Scale Statistical Inverse Problems with Application to Seismic Inversion. SIAM Journal on Scientific Computing, to appear.

OCR for page 52
75 MODEL VALIDATION AND PREDICTION uncertainty using multimodel ensembles can deal with the relative paucity of physical observations and can capture key sources of uncertainty that may be missed using more traditional parametric variations within a single compu - tational model, but are justified only under assumptions that are often not met in practice. Additional research will likely improve the state of the art in combining predictions from multimodel ensembles. Such research includes improved methods for constructing ensembles of models, analysis of interdependence among models, assessment of confidence in particular models and their predictive power, and use of information-theoretic and statistical means for developing robust and reliable methods for model comparison, selection, and averaging/pooling. 5.8 EXPLOITING MULTIPLE SOURCES OF PHYSICAL OBSERVATIONS In many applications, multiple sources of physical observations may be available for the validation/prediction assessment. In engineering applications, the data sources might conform to a validation hierarchy (see Figure 5.7 in Section 5.9.5), whereas in other applications these different data sources might include different sensing modali - ties (e.g., infrared, visible, seismic) or different data sources (e.g., pressure measurements or well cores). It may also be appropriate to use output from high-quality simulations as surrogates for physical observations (e.g., direct numerical simulation of turbulent flow using resolved Navier-Stokes equations may inform about predictions using coarser, Reynolds-averaged, Navier-Stokes simulations). There is the opportunity to make use of these various sources of physical observations to address key issues such as model calibration, model discrepancy, prediction uncertainty, and assessing the quality of the prediction. There is also opportunity to use what is learned from such analyses to inform how to select additional observations or to design additional experiments. For a given collection of physical observations, there is the question of how best to use these sources for vali - dation and prediction. For example, should low-level experiments in a validation hierarchy be used for calibration, saving the more integrated experiments for assessing the model? Or should both calibration and assessment be done together? Different strategies will require different approaches, which may affect the quality of the predictions. Multiple sources of physical observations provide an opportunity to assess a prediction and the accompanying prediction uncertainty. One way to exploit this opportunity is to identify collections of experiments, or observation sources, that can be used to assess the quality of a “surrogate” prediction that has important commonalities with the QOI prediction. The characteristics that define an appropriate surrogate, if they exist, will depend on features of the domain space. Does a candidate surrogate prediction depend on the physical process in a way similar to the QOI prediction? Does the surrogate have similar sensitivities to model inputs? Does the model discrepancy function (if there is one) adequately capture uncertainty for these predictions? Should the same model discrepancy function transfer to the QOI? Exactly how best to use multiple sources of physical data to improve the quality and accuracy of predictions is an active VVUQ research area. In cases where the validation effort will call for additional experiments, the methodologies of validation and prediction can be used to help assess the value of additional experiments and might also suggest new types of experiments to address weaknesses in the assessment. Ideas from the design of experiments from statistics (Wu and Hamada, 2009) are relevant here, but the design of validation experiments involves additional complications that make this an open research topic. The computational demands of the computational model are a complicat - ing factor, as is the issue of dealing with model discrepancies. Also, some of the key requirements for additional experiments—such as improving the reliability of the assessment or improving communication to stakeholders or decision makers—are not easily quantified. The experimental planning enterprise is considered from a broader perspective in Chapter 6. 5.9 PECOS CASE STUDY 5.9.1 Overview The Center for Predictive Engineering and Computational Sciences, called the PECOS Center, at the University of Texas at Austin is part of the Predictive Science Academic Alliance Program (PSAAP) of the Department of Energy’s National Nuclear Security Administration. The PECOS Center is engaged in developing VVUQ processes

OCR for page 52
76 ASSESSING THE RELIABILITY OF COMPLEX MODELS to gain an understanding of the reentry of a space capsule (e.g., NASA’s proposed Orion vehicle) into Earth’s atmosphere. Of primary interest is the performance of the thermal protection system (TPS), which protects the vehicle from the extreme thermal environment arising from travel through the atmosphere at speeds of Mach 20 or higher, depending on the trajectory. Vehicles that use ablative heat shields (e.g., Orion and Apollo) are being simulated to predict the rate at which the ablator is being consumed. TPS consumption is a critical issue in the design and operation of a reentry vehicle—if the entire heat shield is consumed, the vehicle will burn up. TPS consumption is governed by a range of physical phenomena, including high speed and turbulent fluid flow, high-temperature aero-thermo-chemistry, radiative heating, and the response of complex materials (the ablator). Thus, a numerical simulation of reentry vehicles requires models of these phenomena. The reentry vehicle simulations share a number of complicating characteristics with many other high-conse - quence computational science applications. These complicating characteristics include the following: • The QOIs are not accessible for direct measurement under the conditions in which the predictions are to be made; • The predictions involve multiple interacting physical models; • Experimental data available for calibrating and validating models are difficult to obtain, include significant uncertainty, are sparse, and often describe physical conditions not directly related to the predictions; and • The best-available models for some of the physical phenomena are known to include sizable errors. These characteristics greatly complicate the assessment of prediction reliability and the application of VVUQ techniques. 5.9.2 Verification As described above, there are two components of the verification of computer simulation: (1) ensuring that the computer code used in the simulation correctly implements the intended numerical discretization of the model (code verification) and (2) ensuring that the errors introduced by the numerical discretization are sufficiently small (solution verification). 5.9.3 Code Verification There are many aspects of ensuring the correct implementation of a mathematical model in a computer code. Many of these are just good software engineering practices, such as exhaustive model development and user documentation, modern software design, configuration control, and continuous unit and regression testing. Commonly understood to be important but less commonly practiced, these processes are an integral part of the PECOS software environment. To ensure that an implementation is actually producing correct solutions, one wants to compare results to known, preferably analytic, solutions. Unfortunately, analytic solutions are not generally available, which is the reason for the use of the method of manufactured solutions (MMS), in which source terms are added to the equa- tions to make a prespecified “solution” exact (Steinberg and Roache, 1985; Roache, 1998; Knupp and Salari, 2003; Long et al., 2010; and Oberkampf and Roy, 2010). Although MMS is a widely recognized approach, it is not commonly used. One reason is that it is much more difficult to implement for complex problems than it appears. First, even for systems of moderate complexity (e.g., three-dimensional compressible Navier-Stokes) there can be many hundreds of source terms, and it is clearly necessary that the evaluation of these terms be done with high reliability. Thus, constructing analytic solutions is itself a software engineering and reliability challenge. Second, the introduction of the source terms into the code being tested must be done with minimal (preferably no) changes to the code, so that the tests are relevant to the code as it will be used. Unfortunately, this introduction of the source terms may not be possible in codes that have not been designed for it. Finally, it is necessary that manufactured solutions have characteristics similar to those of the problems that the codes will be used to solve. This is important so that bugs are not masked by the fact that the terms in which they occur may be insignificant in a manufactured solution that is too simple.

OCR for page 52
77 MODEL VALIDATION AND PREDICTION FIGURE 5.6 Dependence of the L2 error in the Spalart-Allmaras (SA) turbulence model manufactured solution on the grid Figure 5.6.eps size, under uniform refinement. Shown are the original test, the test after the correction of a bug in the SA equations, and that after subsequent correction of a bug in streamline-upwind/Petrov Galerkin regularization. The theoretical convergence is bitmap second order. At the PECOS Center, to make MMS useful for the verification of reentry vehicle codes, a highly reliable software library for implementing manufactured solutions (the Manufactured Analytic Solution Abstraction, or MASA) and a library of manufactured solutions, using symbolic manipulation software (e.g., Maple), have been developed. These manufactured solutions have been imported into MASA. MASA and associated solutions have been publicly released.4 Further, part of the PECOS Center software development process involves developing and documenting a verification plan (usually involving MMS) before development begins, so that codes are designed to enable MMS. These efforts have paid off by exposing a number of subtle but important bugs in PECOS software. An example is shown in Figure 5.6, in which the convergence with grid refinement of the Spalart-Allmaras (SA) turbulence model5 equations to a manufactured solution is shown. In the initial test, the solution error did not converge to zero with uniform grid refinement, which led to the discovery of a bug in the implementation of the SA equations. When this bug was fixed, the error did reduce with refinement, but not at the theoretically expected rate of h2. The slower rate was caused by a long-standing bug in the implementation of streamline-upwind/Petrov- Galerkin (SUPG) stabilization in the LibMesh finite-element infrastructure in which the model was implemented. 5.9.4 Solution Verification The question in solution verification is whether a numerical solution to a set of model equations is “close enough” to the exact solution. A “close enough” standard is necessary because, although discretization errors can generally be made arbitrarily small through refinement of the discretization, it is neither practical nor necessary to drive these errors to the level of round-off error. Generally, the models are used to predict certain output QOIs, and one wants to ensure that these quantities are within some tolerance of those from the exact solution of the models. 4 See https://red.ices.utexas.edu/projects/software/wiki/MASA. Accessed March 19, 2012. 5 For a definition and details of this model, see http://turbmodels.larc.nasa.gov/spalart.html. Accessed March 24, 2012.

OCR for page 52
78 ASSESSING THE RELIABILITY OF COMPLEX MODELS An acceptable numerical error tolerance depends on the circumstances. At the PECOS Center, since the numerical discretization errors are under the control of the analyst, the view is taken that they should be made sufficiently small to be negligible compared to other sources of uncertainty. This avoids the need to model the uncertainty arising from such errors. It is important to identify the QOIs for which predictions are being made because the numerical discretization requirements for predicting some quantities (e.g., high-order derivatives) are much more stringent than for other quantities. Solution verification, then, requires that the discretization error in the QOIs be estimated. The common practice of comparing solutions on two grids to check how much they differ is not sufficient. In simple situations it is pos - sible to refine the discretization uniformly (e.g., half the grid spacing everywhere) and then to apply Richardson extrapolation. to develop an error estimate. A more general technique, and the one used at the PECOS Center, is adjoint-based a posteriori error estimation (Bangerth and Rannacher, 2003). Once one has an estimate of the errors in the QOIs, it may be necessary to refine the discretization to reduce this error. Adjoint-based error estimators also provide an indicator of where (in space and/or time) the discretization errors are contributing most to errors in the QOIs. Goal-oriented adaptivity (Bangerth and Rannacher, 2003; Oden and Prudhomme, 1998; Prudhomme and Oden, 1999; Strouboules et al., 2000) uses this adjoint information to drive adaptive refinement of the discretization. At the PECOS Center, the simulation codes used to make predictions of the ablator consumption rate (the QOI) have been developed to perform adjoint-based error estimation and goal-oriented refinement. For example, a hypersonic flow code (FIN-S) supporting goal-oriented refinement was built on the LibMesh infrastructure (Kirk et al., 2006). Adaptivity is used to reduced the estimated error in the QOIs to below specified tolerances, thereby accomplishing solution verification. 5.9.5 Validation Data and associated models of data uncertainty are critical to predictive simulation. They are needed for the calibration of physical models and inadequacy models and for the validation of these models. At the PECOS Center, the calibration, validation, and prediction processes are closely related, interdependent, and at the heart of uncertainty quantification in computational modeling. A number of complications arise from the need to pursue validation in the context of a QOI. First, note that in most situations the QOI in the prediction scenario is not accessible for observation, since otherwise, a predic - tion would generally not be needed. This inability to observe the QOI can arise for many reasons, such as legal or ethical restrictions, lack of instrumentation, limitations of laboratory facilities to reproduce the prediction scenario, cost, or that the prediction is about the future. At the PECOS Center, the QOI is the consumption rate of an ablative heat shield at peak heating for a particular trajectory of a reentry vehicle. It is experimentally unobservable because the conditions are not accessible in the laboratory and because flight tests are expensive, making it impractical to test every trajectory of interest. Validation tests are of course posed by comparing to observations the outputs of the model for some observ - able quantity. The central challenge is to determine what the mismatch between observations and the model, and the relevant prediction uncertainties, imply about predictions of unobserved QOIs. Because the QOIs cannot be observed, the only access that one has to them is through the model, and so this assessment can be done only in the context of the model. Another complication arises when the system being modeled is complicated with many parts or encompasses many interacting physical phenomena. In this case, the validation process is commonly hierarchical, with valida - tion tests of models for subcomponents or individual physical phenomena based on relatively simple (inexpensive) experiments. As an example, in the reentry vehicle problem being pursued at the PECOS Center, the individual physical phenomena include aero-chemistry, turbulence, thermal radiation, surface chemistry, and ablator material response. Combinations of subcomponents or physical phenomena are then tested against more complicated, less- abundant multiphysics experiments. Finally, in the best circumstances, one has some experimental observations available for the complete system, allowing a validation test for the complete model. The hierarchical validation process can be envisioned as a validation pyramid shown in Figure 5.7.

OCR for page 52
79 MODEL VALIDATION AND PREDICTION FIGURE 5.7 The prediction pyramid depicting the increasing complexity of the physical scenarios (S c, Sv, and Sp ) accompa- Figure 5.7.eps nied by the decreasing availability of data (dc and dv) for calibration and validation of complex multiphysics models, with the prediction quantity of interest (Qp) residing at the highest bitmap pyramid. level of the The hierarchical nature of multiphysics validation poses further challenges. First, the QOIs are generally accessible only through the model of the full system, so that single-physics models do not have access to the QOI, making QOI-aware validation difficult. Generally, surrogate QOIs are devised for single-physics models— a surrogate QOI being as closely related to the full system QOI as possible. For example, in the validation of boundary-layer turbulence models for the reentry vehicle simulations pursued at the PECOS Center, the turbulent wall heat flux is identified as a surrogate QOI, since it is directly related to, and is a driver for, the ablation rate. Multiphysics validation tests performed at higher levels of the pyramid are important because they generally test the models for the coupling between the single-physics models. But the fact that data are generally scarce at these higher levels means that these coupling models are commonly not as rigorously tested as the simpler models are, affecting the overall quality of the final prediction. 5.10 RARE, HIGH-CONSEQUENCE EVENTS Large-scale computational models play a role in the assessment and mitigation of rare, high-consequence events. By definition, such events occur very infrequently, which means that there is little measured data from them. Thus, the issues that complicate extrapolative predictions are almost always present in predictions involving rare events. Still, computational models play a key role in safety assessments for nuclear reactors by the Nuclear Regulatory Commission (Mosleh et al., 1998) and in assessing safety risks in subsurface contaminant transport at Department of Energy facilities (Neuman and Wierenga, 2003). Computational models also play a role in charac - terizing the causes and consequences of potential natural disasters such as earthquakes, tsumanis, severe storms, avalanches, fires, or even meteor impacts. The behavior of engineered systems (e.g., bridges, buildings) under extreme conditions, or simply as a result of aging and normal wear and tear, can also fall under this heading of rare, high-consequence events. In many cases, such as probabilistic risk assessment (Kumamoto and Henley, 1996) applied to nuclear reactor safety, computational models are used to evaluate the consequences of identified scenarios, helping to quantify the

OCR for page 52
80 ASSESSING THE RELIABILITY OF COMPLEX MODELS risk—the product of the chance of an event and its consequences. This is also true of assessments of the risks from large meteor impacts, for which computer models simulate the consequences of impacts under different conditions (Furnish et al., 1995). Although it is difficult to assess confidence in such extrapolative predictions, their results can be integrated into a larger risk analysis to prioritize threats. In such analyses, it may be a more efficient use of resources to further scrutinize the model results only for the threats with highest priority. Computational models can also be used to seek out combinations of initial conditions, forcings, and even parameter settings that give rise to extreme, or high-consequence, events. Assessing the chances of such events comes after their discovery. Many of the methods described in Chapters 3 and 4 are relevant to this task, but now with a focus on finding aberrant behavior rather than inferring settings that match measurements. This may involve exploring how a physical system can be “stretched” to produce (as yet) unseen, extreme behavior, perhaps induced by interactions among different processes. This is the opposite of designing, or engineering, a system to ensure that interactions among the various processes are minimized. Calculating such extreme behavior may tax a model to the point that its ability to reproduce reality is questionable. Methods for assessing and improving confidence in such model predictions are challenging and largely open problems, as they are for extrapolative predictions. Once a high-consequence event is identified, computational models can be viable tools for assessing its prob - ability. Such events are rare, and so standard approaches such as Monte Carlo simulation are infeasible because large numbers of model runs would be required to estimate these small probabilities. There are rich lines of current research in this area. Oakley and O’Hagan (2004) use a combination of emulation and importance sampling for assessing small probabilities in infrastructure management. Picard (2005) biases a particle-based code to produce more extreme events, statistically adjusting for this bias in producing estimates. In addition to response-surface approaches, one might also use a combination of high- and low-fidelity models to seek out and estimate rare- event probabilities. Another possible multifidelity strategy would be to use a low-fidelity model to seed promising boundary conditions to a high-fidelity, localized model (Sain et al., 2011). Embedding computational models in standard statistical approaches is another promising direction. For example, Cooley (2009) combines computer model output and extreme value theory from statistics to estimate the frequency of extreme rainfall events. Bayarri et al. (2009b) utilize a computer model to identify the catastrophic region in input space for extreme pyroclastic volcanic flows and statistical modeling of the input distributions to compute the probability of the extreme events. A better understanding of complex dynamical systems could help in the search for precursors to extreme events or important changes in system dynamics (Scheffer et al., 2009). Computational models will likely have a role in such searches—even when the models are known to have shortcomings in their representation of such complex systems. Currently, computational models are being used to help inform monitoring efforts, helping to provide early warnings of events ranging from groundwater contamination to a terrorist attack. Finally, bounding and “worst-case” approaches, if not too conservative, can provide actionable information about rare, high-consequence events. Recent work by Lucas et al. (2009) uses concentration-of-measure inequali - ties to bound the probability of extreme outcomes, without having to specify fully the distribution of the input uncertainties. Also, more traditional decision-theoretic approaches (e.g., minimax decision rules [Berger, 1985]; worst-case priors [Evans and Stark, 2002]) may be useful for dealing with rare, high-consequence events. One could imagine embedding these ideas into a computational model, using a worst-case value for a reaction coefficient, a permeability field, a boundary condition, or even how a physical process is represented in the computational model. 5.11 CONCLUSION This chapter discusses numerous tasks that contribute to validation and prediction from the perspective of mathematical foundations, pointing out areas of potential fruitful research. As noted, details of these tasks depend substantially on the features of the application—the maturity, quality, and speed of the computational model; the available physical observations; and their relation to the QOI. The concept of embedding the computational model within a mathematical/statistical framework that can account for and model relevant uncertainties, including those caused by initial and boundary conditions, input parameters, and model discrepancy is also described. Some applications involve making predictions and uncertainty estimates in settings for which physical obser- vations are plentiful. In even mildly extrapolative settings, obtaining these estimates and assessing their reliability remains an open problem. The NRC (2007) report on the use of models in environmental regulatory decision

OCR for page 52
81 MODEL VALIDATION AND PREDICTION making states, “When model results are to be extrapolated outside of conditions for which they have been evalu - ated, it is important that they have the strongest possible theoretical basis, explicitly representing the processes that will most affect outcomes in the new conditions to be modeled, and embodying the best possible parameter estimates” (p. 129). The findings and recommendation below relate to making extrapolative predictions. Finding: Mathematical considerations alone cannot address the appropriateness of a model prediction in a new, untested setting. Quantifying uncertainties and assessing their reliability for a prediction require both statistical and subject-matter reasoning. Finding: The idea of a domain of applicability is helpful for communicating the conditions for which predictions (with uncertainty) can be trusted. However, the mathematical foundations have not been established for defining such a domain or its boundaries. Finding: Research and development on methods for assessing uncertainties of model-based predictions in new, untested conditions (i.e., “extrapolations”) will likely require expertise from mathematics, statistics, computa - tional modeling, and the science and engineering areas relevant to a given application. Specific needs in assessing uncertainties in prediction include: • Approaches for specifying and estimating model discrepancy terms that leverage physical understand- ing, features of the application, and known strengths and deficiencies of the computational model for the application; • Computational models developed with VVUQ in mind, which might include the need for availability of derivative information; a faster, lower-fidelity representation of the model (perhaps with specified dis - crepancy); or embedding physically motivated discrepancy terms within the model that can produce more reliable prediction uncertainties for the QOI and that can be calibrated with available physical observations; • A framework for efficiently exploiting a hierarchy of available experiments—allocating experiments for calibration, assessing prediction accuracy, assessing the reliability of predictions, and suggesting new experiments within the hierarchy that would improve the quality of estimated prediction uncertainties; • Guidelines for reporting predictions and accompanying prediction uncertainties, including disclosure of which sources of uncertainty are accounted for, which are not, what assumptions these estimates rely on, and the reliability or quality of these assumptions; and • Compelling examples of VVUQ done well in problems with different degrees of complexity. A similar conclusion was reached by the National Science Foundation (NSF) Division of Mathematics and Physical Sciences (MPS), which in its May 2010 advisory committee report recommended as follows: MPS should encourage interdisciplinary interaction between domain scientists and mathematicians on the topic of uncertainty quantification, verification and validation, risk assessment, and decision making. (NSF, 2010) The above ideas are particularly relevant to the modeling of complex systems where even a slight deviation from physically tested conditions may change features of the system in many ways, some of which are incorporated in the model and some of which are not. The field of VVUQ is still developing, making it too soon to offer any specific recommendations regarding particular methods and approaches. However, a number of principles and accompanying best practices are listed below regarding validation and prediction from the perspective of mathematical foundations. • Principle: A validation assessment is well-defined only in terms of specified QOIs and the accuracy needed for the intended use of the model. —Best practice: Early in the validation process, specify the QOIs that will be addressed and the required accuracy. —Best practice: Tailor the level of effort in assessment and estimation of prediction uncertainties to the needs of the application.

OCR for page 52
82 ASSESSING THE RELIABILITY OF COMPLEX MODELS • Principle: A validation assessment provides direct information about model accuracy only in the domain of applicability that is “covered” by the physical observations employed in the assessment. —Best practice: When quantifying or bounding model error for a QOI in the problem at hand, systemati - cally assess the relevance of supporting data and validation assessments (which were based on data from different problems, often with different QOIs). Subject-matter expertise should inform this assessment of relevance (as discussed above and in Chapter 7). —Best practice: If possible, use a broad range of physical observation sources so that the accuracy of a model can be checked under different conditions and at multiple levels of integration. —Best practice: Use “holdout tests” to test validation and prediction methodologies. In such a test some validation data are withheld from the validation process, the prediction machinery is employed to “predict” the withheld QOIs, with quantified uncertainties, and finally the predictions are compared to the withheld data. —Best practice: If the desired QOI was not observed for the physical systems used in the validation process, compare sensitivities of the available physical observations with those of the QOI. —Best practice: Consider multiple metrics for comparing model outputs against physical observations. • Principle: The efficiency and effectiveness of validation and prediction assessments are often improved by exploiting the hierarchical composition of computational and mathematical models, with assessments beginning on the lowest-level building blocks and proceeding to successively more complex levels. —Best practice: Identify hierarchies in computational and mathematical models, seek measured data that facilitate hierarchical validation assessments, and exploit the hierarchical composition to the extent possible. —Best practice: If possible, use physical observations, especially at more basic levels of the hierarchy, to constrain uncertainties in model inputs and parameters. • Principle: Validation and prediction often involve specifying or calibrating model parameters. —Best practice: Be explicit about what data/information sources are used to fix or constrain model parameters. —Best practice: If possible, use a broad range of observations over carefully chosen conditions to produce more reliable parameter estimates and uncertainties, with less “trade-off” between different model parameters. • Principle: The uncertainty in the prediction of a physical QOI must be aggregated from uncertainties and errors introduced by many sources, including discrepancies in the mathematical model, numerical and code errors in the computational model, and uncertainties in model inputs and parameters. —Best practice: Document assumptions that go into the assessment of uncertainty in the predicted QOI, and also document any omitted factors. Record the justification for each assumption and omission. —Best practice: Assess the sensitivity of the predicted QOI and its associated uncertainties to each source of uncertainty as well as to key assumptions and omissions. —Best practice: Document key judgments—including those regarding the relevance of validation studies to the problem at hand—and assess the sensitivity of the predicted QOI and its associated uncertainties to reasonable variations in these judgments. —Best practice: The methodology used to estimate uncertainty in the prediction of a physical QOI should also be equipped to identify paths for reducing uncertainty. • Principle: Validation assessments must take into account the uncertainties and errors in physical observa- tions (measured data). —Best practice: Identify all important sources of uncertainty/error in validation data—including instru - ment calibration, uncontrolled variation in initial conditions, variability in measurement setup, and so on—and quantify the impact of each. —Best practice: If possible, use replications to help estimate variability and measurement uncertainty. —Remark: Assessing measurement uncertainties can be difficult when the “measured” quantity is actually the product of an auxiliary inverse problem—that is, when it is not measured directly but is inferred from other measured quantities.

OCR for page 52
83 MODEL VALIDATION AND PREDICTION Finally, it is worth pointing out that there is a fairly extensive literature in statistics focused on model assess - ment that may be helpful if adapted to the model validation process. Basic principles such as model diagnostics (Gelman et al., 1996; Cook and Weisberg, 1999), visualization and graphical methods (Cleveland, 1984; Anselin, 1999), hypothesis testing and model selection (Raftery, 1996; Bayarri and Berger, 2000; Robins et al., 2000; Lehmann and Romano, 2005), cross-validation and the use of holdout tests (Hastie et al., 2009) could play central roles in validation and prediction, as they do for statistical model checking. 5.12 REFERENCES Akçelik, V., G. Biros, A. Draganescu, O. Ghattas, J. Hill, and B. Van Bloeman Waanders. 2005. Dynamic Data-Driven Inversion for Terascale Simulations: Real-Time Identification of Airborne Contaminants, in Proceedings of SC2005. AIAA (American Institute for Aeronautics and Astronautics). 1998. Guide for the Verification and Validation of Computational Fluid Dynamics Simulations. Reston, Va.: AIAA. Anselin, L., 1999. Interactive Techniques and Exploratory Spatial Data Analysis. Geographical Information Systems: Principles, Techniques, Management and Applications 1:251-264. Badri Narayanan, V.A., and N. Zabaras. 2004. Stochastic Inverse Heat Conduction Using a Spectral Approach. International Journal for Numerical Methods Engineering 60:1569-1593. Bangerth, W., and R. Rannacher. 2003. Adaptive Finite Element Methods for Differential Equations. Basil, Switzerland: Birkhauser Verlag. Bayarri, M.J., and J.O. Berger. 2000. P Values for Composite Null Models. Journal of the American Statistical Association 95(452):1269-1276. Bayarri, M.J., J.O. Berger, M.C. Kennedy, A. Kottas, R. Paulo, J. Sacks, J.A. Cafeo, C.H. Lin, and J. Tu. 2005. Bayesian Validation of a Com- puter Model for Vehicle Crashworthiness. Technical Report 163. Research Triangle Park, N.C: National Institute of Statistical Sciences. Bayarri, M.J., J. Berger, R. Paulo, J. Sacks, J. Cafeo, J. Cavendish, C. Lin, and J. Tu. 2007a. A Framework for Validation of Computer Models. Technometrics 49:138-154. Bayarri, M.J., J. Berger, G. Garcia-Donato, F. Liu, J. Palomo, R. Paulo, J. Sacks, D. Walsh, J. Cafeo, and R. Parthasarathy. 2007b. Computer Model Validation with Functional Output. Annals of Statistics 35:1874-1906. Bayarri, M.J., J.O. Berger, M.C. Kennedy, A. Kottas, R. Paulo, J. Sacks, J.A. Cafeo, C.H. Lin, and J. Tu. 2009a. Predicting Vehicle Crashworthi- ness: Validation of Computer Models for Functional and Hierarchical Data. Journal of the American Statistical Association 104:929-943. Bayarri, M.J., J.O. Berger, E.S. Calder, K. Dalbey, S. Lunagomez, A.K. Patra, E.B. Pitman, E.T. Spiller, and R.L. Wolpert. 2009b. Using Statisti- cal and Computer Models to Quantify Volcanic Hazards. Technometrics 5:402-413. Berger, J.O. 1985. Statistical Decision Theory and Bayesian Analysis. New York: Springer. Berger, J.L., and R.L. Wolpert. 1988. The Liklihood Principle. Lecture notes available at http://books.google.com/books?hl=en&lr=&id= 7fz8JGLmWbgC&oi=fnd&pg=PA1&dq=berger+and+wolpert+the+likelihood+principle&ots=iTkq2Ekz_Z&sig=qKnLby2avTKEP_ unAWSJ_BUI#v=onepage&q=berger%20and%20wolpert%20the%20likelihood%20principle&f=false. Accessed March 20, 2012. Besag, J., P.J. Green, D.M. Higdon, and K. Mengerson. 1995. Bayesian Computation and Stochastic Systems. Statistical Science 10:3-66. Box, G., and N. Draper. 1987. Empirical Model Building and Response Surfaces. New York: Wiley. Box, G.E.P., J.S. Hunter, and W.G. Hunter. 2005. Statistics for Experimenters: Design Innovation, and Discovery, Volume 2. New York: Wiley Online Library. Brooks, H.E., and C.A. Doswell III. 1996. A Comparison of Measures-Oriented and Distributions-Oriented Approaches to Forecast Verifica- tion. Weather Forecasting 11:288-303. Buser, C.M., H.R. Kunsch, D. Luth, M. Wild, and C. Schar. 2009. Bayesian Multi-Model Projection of Climate: Bias Assumptions and Interan- nual Variability. Climate Dynamics 33(6):849-868. Christen, J.A., and C. Fox. 2005. Markov Chain Monte Carlo Using an Approximation. Journal of Computational and Graphical Statistics 14(4):795-810. Cleveland, W.S. 1984. Elements of Graphing Data. Belmont, Calif.: Wadsworth. Cook, R.D., and S. Weisberg. 1999. Applied Regression Including Computing and Graphics. New York: Wiley Online Library. Cooley, D. 2009. Extreme Value Analysis and the Study of Climate Change. Climatic Change 97(1):77-83. Dubois, D., H. Prade, and E.F. Harding. 1988. Possibility Theory: An Approach to Computerized Processing of Uncertainty. New York: Plenum Press. Easterling, R.G. 2001. Measuring the Predictive Capability of Computational Models: Principles and Methods, Issues and Illustrations. SAND2001-0243. Albuquerque, N.Mex.: Sandia National Laboratories. Efendiev, Y., A. Datta-Gupta, X. Ma, and B. Mallick. 2009. Efficient Sampling Techniques for Uncertainty Quantification. In History Matching Using Nonlinear Error Models and Ensemble Level Upscaling Techniques. Washington, D.C.: Water Resources Research and American Geophysical Union. Evans, S.N., and P.B. Stark. 2002. Inverse Problems as Statistics. Inverse Problems 18:R55. Evensen, G. 2009. Data Assimilation: The Ensemble Kalman Filter. New York: Springer Verlag. Ferson, S., V. Kreinovich, L. Ginzburg, D.S. Myers, and K. Sentz. 2003. Constructing Probability Boxes and Dempster-Shafer Structures. Albuquerque, N.M.: Sandia National Laboratories.

OCR for page 52
84 ASSESSING THE RELIABILITY OF COMPLEX MODELS Flath, H.P., L.C. Filcox, V. Akçelik, J. Hill, B. Van Bloeman Waanders, and O. Glattas. 2011. Fast Algorithms for Bayesian Uncertainty Quan- tification in Large-Scale Linear Inverse Problems Based on Low-Rank Partial Hessian Approximations. SIAM Journal on Scientific Com- puting 33(1):407-432. Fuentes, M., and A.E. Raftery. 2004. Model Validation and Spatial Interpolation by Combining Observations with Outputs from Numerical Models via Bayesian Melding. Journal of the American Statistical Association, Biometrics 6:36-45. Furnish, M.D., M.B Boslough, and G.T. Gray. 1995. Dynamical Properties Measurements for Asteroid, Comet and Meteorite Material Appli- cable to Impact Modeling and Mitigation Calculations. International Journal of Impact Engineering 17(3):53-59. Galbally, D.K., K. Fidkowski, K. Willcox, and O. Ghattas. 2010. Nonlinear Model Reduction for Uncertainty Quantification in Large-Scale Inverse Problems. International Journal for Numerical Methods in Engineering 81:1581-1608. Gelfand, A.E., and S.K. Ghosh. 1998. Model Choice: A Minimum Posterior Predictive Loss Approach. Biometrica 85(1):1-11. Gelman, A., X.L. Meng, and H. Stern. 1996. Posterior Predictive Assessment of Model Fitness via Realized Discrepancies. Statistica Sinica 6:733-769. Ghanem, R., and A. Doostan. 2006. On the Construction and Analysis of Stochastic Predictive Models: Characterization and Propagation of the Errors Associated with Limited Data. Journal of Computational Physics 217(1):63-81. Gneiting, T., and A.E. Raftery. 2005. Weather Forecasting with Ensemble Methods. Science 310(5746):248-249. Goldstein, M., and J.C. Rougier. 2004. Probabilistic Formulations for Transferring Inferences from Mathematical Models to Physical Systems. SIAM Journal on Scientific Computing 26(2):467-487. Hastie, T., R. Tibshirani, and J.H. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York: Springer. Higdon, D., M. Kennedy, J.C. Cavendish, J.A. Cafeo, and R.D. Ryne. 2005. Combining Field Data and Computer Simulations for Calibration and Prediction. SIAM Journal on Scientific Computing 26(2):448-466. Higdon, D., J. Gattiker, B.Williams, and M. Rightley. 2008. Computer Model Calibration Using High-Dimensional Output. Journal of the American Statistical Association 103(482):570-583. Hills, R., and T. Trucano. 2002. Statistical Validation of Engineering and Scientific Models: A Maximum Likelihood Based Metric. SAND2001- 1789. Albuequerque, N. Mex.: Sandia National Laboratories. Hills, R.G., K.J. Dowding, and L. Swiler. 2008. Thermal Challenge Problem: Summary. Computer Methods in Applied Mechanics and Engi- neering 197:2490-2495. Hoeting, J.A., D. Madilgan, A.E. Raftery, and C.T. Volinsky. 1999. Bayesian Model Averaging: A Tutorial. Statistical Science 15:382-401. Kaipio, J.P., and E. Somersalo. 2005. Statistical and Computational Inverse Problems. New York: Springer. Kaipio, J.P., V. Kolehmainen, I. Somersalo, and M. Vauhkonen. 2000. Statistical Inversion and Monte Carlo Sampling Methods in Electrical Impedance Tomography. Inverse Problems 16:1487. Kennedy, M.C., and A. O’Hagan. 2001. Bayesian Calibration of Computer Models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 63:425-464. Kersting, A.B., D.W. Efurd, D.L. Finnegan, D.J. Rokop, D.K. Smith, and J.L.Thompson. 1999. Migration of Plutonium in Ground Water at the Nevada Test Site. Nature 397(6714):56-59. Kirk, B., J. Peterson, R. Stogner, and G. Carey. 2006. A C++ Library for Parallel Adaptive Mesh Refinement/Coarsening Simulations. Engineer- ing with Computers 22(3-4):237-254. Klein, R., S. Doebling, F. Graziani, M. Pilch, and T. Trucano. 2006. ASC Predictive Science Academic Alliance Program Verification and Valida- tion Whitepaper. UCRL-TR-220711. Livermore, Calif.: Lawrence Livermore National Laboratory. Klir, G.J., and B. Yuan. 1995. Fuzzy Sets and Fuzzy Logic. Upper Saddle River, N.J.: Prentice Hall. Knupp, P., and K. Salari. 2003. Verification of Computer Codes in Computational Science and Engineering. Boca Raton, Fla.: Chapman and Hall/CRC. Knutti, R., R. Furer, C. Tebaldi, J. Cermak, and G.A. Mehl. 2010. Challenges in Combining Projections in Multiple Climate Models. Journal of Climate 23(10):2739-2758. Kumamoto, H., and E.J. Henley. 1996. Probabalistic Risk Assessment and Management for Engineers and Scientists. New York: IEEE Press. Lehmann, E.L., and J.P. Romano. 2005. Testing Statistical Hypotheses. New York: Springer. Lieberman, C., K. Willcox, and O. Ghattas. 2010. Parameter and State Model Reduction for Large-Scale Statistical Inverse Problems. SIAM Journal on Scientific Computing 32:2523-2542. Loeppky, J., D. Bingham, and W.J. Welch. 2011. Computer Model Calibration or Tuning in Practice. Technometrics. Submitted for publication. Long, K., R. Kirty, and B. Van Bloemen Waanders. 2010. Unified Embedded Parallel Finite Element Computations via Software-Based Frechet Differentiation. SIAM Journal on Scientific Computing 32(6):3323-3351. Lorenc, A.C. 2003. The Potential of the Ensemble Kalman Filter for NWP—A Comparison with 4D-Var. Quarterly Journal of the Royal Me- teorological Society 129:3183-3203. Lucas, L.J., H. Owhadi, and M. Ortiz. 2009. Rigorous Verification, Validation, Uncertainty Quantification and Certification Through Concentra- tion-of-Measure Inequalities. Computer Methods in Applied Mechanics and Engineering 57(51-52):4591-4609. Marzouk, Y.M., and H.N. Najm. 2009. Dimensionality Reduction and Polynomial Chaos Acceleration of Bayesian Inference in Inverse Prob- lems. Journal of Computational Physics 228:1862-1902. Meehl, G.A., C. Covey, T. Delworth, M. Latif, B. McAvaney, J.F.B. Mitchell, B. Stouffer, and K.E. Taylor. 2007. The WCRP CMIP3 Multi- model Dataset. Bulletin of the American Meteorological Society 88:1388-1394.

OCR for page 52
85 MODEL VALIDATION AND PREDICTION Mosleh, A., D.M. Rasmuson, F.M. Marshall, and U.S. Nuclear Regulatory Commission. 1998. Guidelines on Modeling Common-Cause Fail- ures in Probabilistic Risk Assessment. Washington, D.C.: Safety Programs Division, Office for Analysis and Evaluation of Operational Data, U.S. Nuclear Regulatory Commission. Naevdal, G., L. Johnsen, S. Aanonsen, and D. E. Vefring. 2005. Reservoir Monitoring and Continuous Model Updating Using Ensemble Kalman Filter. Society of Petroleum Engineers Journal 10(1):66-74. NRC (National Research Council). 2007. Models in Environmental Regulatory Decision Making, Washington, D.C.: National Academies Press. NSF (National Science Foundation). 2010. Minutes of the Advisory Committee Meeting. April 1-2, 2010. Available at http://www.nsf.gov/ attachments /117978/public/MPSAC_April_1-2_2010_Minutes_Final.pdf. Accessed March 20, 2012. Neuman, S.P. and J.W. Wierenga. 2003. A Comprehensive Strategy of Hydrogeologic Modeling and Uncertainty Analysis for Nuclear Facilities and Sites. Washington, D.C.: U.S. Nuclear Regulatory Commission. Oakley, J.E., and A. O’Hagan. 2004. Probabilistic Sensitivity Analysis of Complex Models: A Bayesian Approach. Journal of the Royal Statisti- cal Society: Series B (Statistical Methodology) 66(3):751-769. Oberkampf, W.L., and C. Roy. 2010. Verification and Validation in Scientific Computing. Cambridge, U.K.: Cambridge University Press. Oberkampf, W.L., and T.G. Trucano. 2000. Validation Methodology in Computational Fluid Dynamics. American Institute of Aeronautics and Astronautics, AIAA 200-2549, Fluids 2000 Conference, Denver, Colo. Oberkampf, W.L., T.G. Trucano, and C. Hirsch. 2004. Verification, Validation, and Predictive Capability in Computational Engineering and Physics. Applied Mechanical Reviews 57:345. Oden, J.T., and S. Prudhomme. 1998. A Technique for A Posteriori Error Estimation of h-p Approximations of the Stokes Equations. Advances in Adaptive Computational Methods in Mechanics 47:43-63. Oliver, D.S., B.C. Luciane, and A.C. Reynolds. 1997. Markov Chain Monte Carlo Methods for Conditioning a Permeability Field to Pressure Data. Mathematical Geology 29:61-91. Picard, R.R. 2005. Importance Sampling for Simulation of Markovian Physical Processes. Technometrics 47(2):202-211. Prudhomme, S., and J.T. Oden. 1999. On Goal-Oriented Error Estimation for Elliptic Problems: Application to Pointwise Errors. Computation Methods in Applied Mechanics and Engineering 176:313-331. Rabinovich, S. 1995. Measurement Errors, Theory and Practice. New York: The American Institute of Physics. Raftery, A.E. 1996. Hypothesis Testing and Model Selection via Posterior Simulation. Pp. 163-168 in Practical Markov Chain Monte Carlo. London, U.K.: Chapman and Hall. Roache, P. 1998. Verification and Validation in Computational Science and Engineering. Socorro, N.Mex.: Hermosa Publishers. Robins, J.M., A. van der Vaart, and V. Ventura. 2000. Asymptotic Distribution of P Values in Composite Null Models. Journal of the American Statistical Association 95(452):1143-1156. Rougier, J., M. Goldstein, and L. House. 2010. Assessing Model Discrepancy Using a Multi-Model Ensemble. University of Bristol Statistics Department Technical Report #08:17. Bristol, U.K.: University of Bristol. Sain, S.R., R. Furrer, and N. Cressie. 2011. A Spatial Analysis of Multivariate Output from Regional Climate Models. Annals of Applied Sta- tistics 5(1):150-175. Scheffer, M., J. Bascompte, W.A. Brock, V. Brovkin, S.R. Carpenter, V. Dakos, H. Held, H.E.H. Van Nes, M. Rietkerk, and G. Sugihara. 2009. Early-Warning Signals for Critical Transitions. Nature 461(7260):53-59. Shafer, G. 1976. A Mathematical Theory of Evidence. Princeton, N.J.: Princeton University Press. Smith, R.L., C. Tebaldi, D. Nychka, and L.O. Mearns. 2010. Bayesian Modeling of Uncertainty in Ensembles of Climate Models. Journal of the American Statistical Association 104(485):97-116. Steinberg, S., and P. Roache. 1985. Symbolic Manipulation and Computational Fluid Dynamics. Journal of Computational Physics 57(2):251-284. Strouboules, F., I. Babuska, D.K. Dalta, K. Copps, and S.K. Gangarai. 2000. A Posteriori Estimation and Adaptive Control of the Error in the Quantity of Interest. Part 1: A Posterioric Estimations of the Error in the Von Mises Stress and the Stress Intensity Factor. Computational Methods in Applied Mechanics and Engineering 181:261-294. Tarantola, A. 2005. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia, Pa.: SIAM. Tebaldi, C., and R. Knutti. 2007. The Use of the Multi-Model Ensemble in Probabilistic Climate Projections. Philosophical Transactions of the Royal Society, Series A 365:2053-2075. Tebaldi, C., R.L. Smith, D. Nychka, and L.O. Mearns. 2005. Quantifying Uncertainty in Projections of Regional Climate: A Bayesian Approach to the Analysis of Multimodel Ensembles. Journal of Climate 18:1524-1540. Thornton, J. 2011. No Testing Allowed: Nuclear Stockpile Stewardship Is a Simulation Challenge. Mechanical Engineering-CIME 133(5):38-41. Tonkin, M., and J. Doherty. 2009. Calibration-Constrained Monte Carlo Analysis of Highly Parameterized Models Using Subspace Techniques. Water Resources Research 45(12):w00b10. Wan, E.A., and R. Van Der Merwe. 2000. The Unscented Kalman Filter for Nonlinear Estimation. Pp.153-158 in Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC/IEEE, Lake Louise, Alta., Canada. Wang, S., W. Chen, and K.L. Tsui. 2009. Bayesian Validation of Computer Models. Technometrics 51(4):439-451. Welch, G., and G. Bishop. 1995. An Introduction to the Kalman Filter. Technical Report 95-041. Chapel Hill: University of North Carolina. Wu, C.F.J., and M. Hamada. 2009. Experiments: Planning, Analysis, and Optimization. New York: Wiley. Youden, W.J. 1961. Uncertainties in Calibration. Precision Measurement and Calibration: Statistical Concepts and Procedures 1:63. Youden, W.J. 1972. Enduring Values. Technometrics 14(1)1-15.