Read "Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification" at NAP.edu

Page 95 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

7

Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification

The role of verification, validation, and uncertainty quantification (VVUQ) in computational science and engineering has increased significantly in recent years. As high-quality computational modeling becomes available in more application areas, the role played by VVUQ will continue to grow. Previous chapters have addressed VVUQ as it has evolved to date in the computational modeling of complex physical systems. In this chapter, the committee discusses next steps in the evolution of VVUQ. This summary of its responses to the statement of task, includes the committee’s identification of principles and current best practices and its recommendations for VVUQ research and development, as well as recommendations for educational changes.

7.1 VVUQ PRINCIPLES AND BEST PRACTICES

As was noted in Chapter 1, the committee has confined its considerations of principles and best practices to the mathematical science aspects of VVUQ. The principles and best practices presented here are, loosely, restricted to those aspects and do not emphasize nonmathematical issues of physical science, communication of results, and so forth. Historically, methodologies for VVUQ have evolved separately in different application areas and fields. As a result, different application areas can have different approaches. A number of recent workshops and conferences have assembled researchers from varied application areas and perspectives, aiming for a cross-fertilization of ideas and a better understanding of the connections, commonalities, and differences among the varied VVUQ practices. As time passes, the relationships among the various practices developed in different settings will become clearer, as will the understanding of best practices for different kinds of applications. However, it is premature to try to identify a single set of methods or algorithms that are the best tools to accomplish the best practices identified below. Today, it appears that some methods and algorithms are better for some applications and others are better for other applications. Therefore, the committee identifies principles and best practices but stops short of prescribing implementation methodologies.

This section begins with some overarching remarks, moves to principles and practices for verification, and then addresses principles and practices for validation and prediction. As previous chapters have emphasized, VVUQ analyses are not well defined unless the quantities of interest (QOIs) are well defined. Defining the QOIs from the start allows a VVUQ process to produce more meaningful results than will be produced if the focus is on the “solution” in general. For example, suppose a given model accurately captures the average, or large-scale, features of a physical system but not the small-scale features. If only large-scale features are important in the given

Page 96 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

application, the appropriately defined QOI should be sensitive to large-scale but not small-scale behavior. In this case the VVUQ analysis may find that the model is sufficiently accurate (e.g., uncertainties in the predicted QOI are sufficiently small) to provide actionable information. However, if small-scale details are important, the QOI should be defined accordingly, and the VVUQ analysis (of the same model applied to the same physical system) may find that the model is too inaccurate to be of value.

Leveraging work from previous VVUQ analyses should be done with caution. Since VVUQ results are specific to particular QOIs in particular settings, transferring results to new QOIs and settings can be difficult to justify. However, one can consider applying VVUQ to a model over a broad set of conditions and QOIs if physical data are available to support such wide-ranging assessments of model accuracy and there is a firm theoretical understanding of the physical phenomena being modeled. It can be argued that an example of such a situation is the Monte Carlo N-Particle transport code,¹ a particle-transport code that incorporates a large body of knowledge and has been tested against measurements derived from thousands of experiments spanning many particle types and a broad range of conditions.

Within the VVUQ enterprise, the level of rigor employed should be commensurate with the importance and needs of the application and decision context. Some applications involve high-consequence decisions and therefore require a substantial VVUQ effort; others do not.

7.1.1 Verification Principles and Best Practices

Here the committee summarizes key verification principles, along with best practices associated with each principle. Chapter 3 provides more detail.

• Principle: Solution verification is well defined only in terms of specified quantities of interest, which are usually functionals of the full computed solution.

—Best practice: Clearly define the QOIs for a given VVUQ analysis, including the solution verification task. Different QOIs will be affected differently by numerical errors.

—Best practice: Ensure that solution verification encompasses the full range of inputs that will be employed during UQ assessments.

• Principle: The efficiency and effectiveness of code and solution verification can often be enhanced by exploiting the hierarchical composition of codes and mathematical models, with verification performed first on the lowest-level building blocks and then on successively more complex levels.

—Best practice: Identify hierarchies in computational and mathematical models and exploit them for code and solution verification. It is often worthwhile to design the code with this approach in mind.

—Best practice: Include in the test suite problems that test all levels in the hierarchy.

• Principle: Verification is most effective when performed on software developed under appropriate software quality practices.

—Best practice: Use software configuration management and regression testing and strive to understand the degree of code coverage attained by the regression suite.

—Best practice: Understand that code-to-code comparisons can be helpful, especially for finding errors in the early stages of development but that in general they do not by themselves constitute sufficient code or solution verification.

—Best practice: Compare against analytic solutions, including those created by the method of manufactured solutions—a technique that is helpful in the verification process.

• Principle: The goal of solution verification is to estimate, and control if possible, the error in each QOI for the problem at hand. (Ultimately, of course, one would want to use UQ to facilitate the making of decisions in the face of uncertainty. So it is desirable for UQ to be tailored in a way to help identify ways to reduce uncertainty, bound it, or bypass the problem, all in the context of the decision at hand. The use of VVUQ for uncertainty management is discussed in Section 6.2, “Decisions Within VVUQ Activities”.)

_____________________

¹ See mcnp-green.lanl.gov. Accessed September 7, 2011.

Page 97 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

—Best practice: When possible in solution verification, use goal-oriented a posteriori error estimates, which give numerical error estimates for specified QOIs. In the ideal case the fidelity of the simulation is chosen so that the estimated errors are small compared to the uncertainties arising from other sources.

—Best practice: If goal-oriented a posteriori error estimates are not available, try to perform self-convergence studies (in which QOIs are computed at different levels of refinement) on the problem at hand, which can provide helpful estimates of numerical error.

— Remark: In the absence of a posteriori or self-convergence results, the next best option may be to estimate numerical error in a given QOI in the problem at hand based on detailed assessments of numerical error in a similar QOI in a relevant reference problem. However, it is challenging to define reference problems that permit detailed assessments but are demonstrably relevant to the problem at hand. It can be risky to assume that numerical errors in the reference problem are representative of numerical errors in the problem at hand.

7.1.2 Validation and Prediction Principles and Best Practices

Although the questions involving solution verification are firmly grounded in mathematical and computational science, the questions that arise in validation and prediction require statistical and subject-matter (physics, chemistry, materials, etc.) expertise as well. They also require choices that involve judgment, for example in determining the relevance of validation studies to the prediction of a QOI in the problem at hand. This necessary application of judgment warrants a brief discussion here. The concept of a domain of applicability—a region of a domain space in which a validation assessment is judged to apply—is helpful in determining the relevance of a validation assessment to the prediction of a QOI in a given problem at hand. This concept can include features, or descriptors, that characterize the problem space (such as ball density, radius, and drop height in the ball-drop example) as forming axes that define a mathematical space. Each problem or experiment is associated with a point in the space; thus, the problems included in the validation assessment map to a collection of points in the domain space. The problem at hand maps to another point in the space. One can imagine basing a determination of relevance on the location of a particular problem point relative to the locations of the other points. For example, if the new point is surrounded by validation-problem points, the validation study might be judged to have high relevance.

This is an appealing notion, but any attempt to apply it with mathematical rigor must address significant complicating truths. If important features are omitted from the set that is chosen to form axes of the space, then two problems may look similar when they actually differ in important ways. This is illustrated by the “ball texture” in the ball-drop example in Section 1.6. However, if all potentially important features are included, the dimension of the space may become intractably large. In the ball-drop example, potential additional features could include ambient temperature, ambient pressure, ambient humidity, wind conditions, ball skin materials, ball interior structure, initial rotation applied to the ball as it is dropped, ball elasticity, ball coefficient of thermal expansion, and so on. Including a large set of features would help guard against the omission of those that may be important. However, this creates a high-dimensional domain space, which forces essentially any new problem to be “outside” the region enclosed by previous problems. This makes every prediction appear to be “extrapolative.” In oversimplified terms: if the domain space is low-dimensional, then subject-matter judgment is required to assess the impact of features that are not included, but if the domain space is high-dimensional, subject-matter expertise is required to assess the relevance of previous experience to an extrapolative prediction. Either way, subject-matter expertise must inform a judgment.

This discussion is not intended to attack the concept of a domain of applicability or to downplay its utility. Rather, it is intended to illustrate that mathematics alone cannot determine the relevance of past experience to the problem at hand but that judgment informed by subject-matter expertise is a necessary ingredient in making this determination.

In spite of variations in validation and prediction practices across fields, the inherent role of expertise and judgment, and the rapid evolution of improved methodologies, some general principles and best practices in validation and prediction have emerged that the committee believes will stand the test of time. They are summarized below. Chapter 5 provides more detail.

Page 98 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

• Principle: A validation assessment is well defined only in terms of specified QOIs.

— Best practice: Early in the validation process, specify the QOIs that will be addressed.

• Principle: A validation assessment provides direct information about model accuracy only in the domain of applicability that is “covered” by the physical observations employed in the assessment.

—Best practice: When quantifying or bounding model error for a QOI in the problem at hand, systematically assess the relevance of supporting validation assessments (which were based on data from different problems, often with different QOIs). Subject-matter expertise should inform this assessment of relevance (as discussed above and in Chapter 5).

—Best practice: If possible, use a broad range of physical observation sources so that the accuracy of a model can be checked under different conditions and at multiple levels of integration.

—Best practice: Use “holdout tests” to test validation and prediction methodologies. In such a test some validation data are withheld from the validation process, the prediction machinery is employed to “predict” the withheld QOIs, with quantified uncertainties, and finally the predictions are compared to the withheld data.

—Best practice: If the desired QOI was not observed for the physical systems used in the validation process, compare sensitivities of the available physical observations with those of the QOI.

—Best practice: Consider multiple metrics for comparing model outputs against physical observations.

• Principle: The efficiency and effectiveness of a validation assessment are often improved by exploiting the hierarchical composition of computational and mathematical models, with assessments beginning on the lowest-level building blocks and proceeding to successively more complex levels.

—Best practice: Identify hierarchies in computational and mathematical models, seek measured data that facilitate hierarchical validation assessments, and exploit the hierarchical composition to the extent possible.

—Best practice: If possible, use physical observations, especially at more basic levels of the hierarchy, to constrain uncertainties in model inputs and parameters.

• Principle: The uncertainty in the prediction of a physical QOI must be aggregated from uncertainties and errors introduced by many sources, including: discrepancies in the mathematical model, numerical and code errors in the computational model, and uncertainties in model inputs and parameters.

—Best practice: Document assumptions that go into the assessment of uncertainty in the predicted QOI, and also document any omitted factors. Record the justification for each assumption and omission.

—Best practice: Assess the sensitivity of the predicted QOI and its associated uncertainties to each important source of uncertainty as well as to key assumptions and omissions.

—Best practice: Document key judgments—including those regarding the relevance of validation studies to the problem at hand—and assess the sensitivity of predicted QOI and its associated uncertainties to reasonable variations in these judgments.

• Principle: Validation assessments must take into account the uncertainties and errors in physical observations (measured data).

—Best practice: Identify all important sources of uncertainty/error in validation data—including instrument calibration, uncontrolled variation in initial conditions, variability in measurement setup, and so on—and quantify the impact of each.

—Best practice: If possible, use replications to help estimate variability and measurement uncertainty.

—Remark: Assessing measurement uncertainties can be difficult when the “measured” quantity is actually the product of an auxiliary inverse problem—that is, when the quantity is not measured directly but is inferred from other measured quantities.

7.2 PRINCIPLES AND BEST PRACTICES IN RELATED AREAS

7.2.1 Transparency and Reporting

In the presentation of VVUQ results to stakeholders, including decision makers who may not be familiar with the analyses, it is important to state clearly the key underlying assumptions along with their potential impact on the

Page 99 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

predicted QOIs, their uncertainties, and other key outcomes. In particular, UQ analyses should state which uncertainties are accounted for and which are not and should give some assessment of the impact of those uncertainties not accounted for. It is also important that the presentation discuss a triage of assumptions, assessing which have the potential to alter the outcomes and assessing the sensitivity of key outcomes to these alternative assumptions. A good example of detailing model inadequacies that might affect overall assessment of anthropogenic impact on climate change is given in Chapter 8 of an Intergovernmental Panel on Climate Change report (Randall et al., 2007).

The use of plain language, suitable for the application at hand, is most effective for presentations. The use of terminology that has specific meanings in mathematics, statistics, or VVUQ can often lead to misconceptions or misunderstandings. As Oreskes et al. (1994) point out, words such as “verification” and “validation” carry common meanings that can be inappropriately attached to computational-model assessment. It is also important not to confuse the mathematical, computational, and subject-matter science that went into building a large-scale computational model with the VVUQ effort that assesses the appropriateness and accuracy of the model-based predictions.

Holdout tests provide a direct demonstration of a model’s ability to predict under new conditions and can be an effective tool for communicating certain VVUQ concepts and results. Holdout tests use the model to predict experimental or observational outcomes that were not used in the model calibration process. Once a computational model has been calibrated with a particular set of physical measurements, the holdout test allows one to see how the model predicts the system behavior in a new setting. Of course, assessing the degree of extrapolation in a given holdout test is still an open question, as the committee has discussed above.

7.2.2 Decision Making

Decision makers must have key information from the VVUQ process that is summarized and clearly communicated. This key information includes summaries of the body of knowledge behind the choice of models, evidence from the verification process, sensitivities of the calculated QOIs to uncertainties in key parameters, quantification (from validation studies) of the model’s ability to match relevant measured data, assessment of modeling challenges in the prediction problem relative to those in the validation problems, key assumptions behind the predictions and quantified uncertainties, sources of uncertainty that were neglected, and so on. If this information is summarized and communicated properly, results from the VVUQ process can play a unique and significant role in the efficient allocation of resources, management of the overall uncertainty budget, and generation of the soundest possible basis for high-consequence decisions in the presence of uncertainties.

The results of VVUQ analyses can also be used to make decisions regarding how to allocate resources for future VVUQ activities—computing hardware acquisition, experimental campaigns, model improvement efforts, and other efforts—to improve prediction accuracy or to improve confidence in model-based predictions. This decision task is made more difficult by the often high cost of employing available computational models and the inability of models to perfectly represent reality. A realistic assessment of model inadequacies/discrepancies is important for resource allocation because models can inform only about processes represented in the models. If better understanding of current model inadequacies is key to improving predictions, then additional validation data will likely be required. Hence approaches for resource allocation will necessarily require some form of qualitative assessments or judgment. Given the complexity of VVUQ activities, a carefully structured planning process can help to ensure that resources are used efficiently and that significant factors are addressed.

7.2.3 Software, Tools, and Repositories

Practitioners in VVUQ currently have available to assist them a limited set of software and repositories (for data, examples, and code). This is particularly true for the developing field of uncertainty quantification. A number of application-specific software projects have been developed—Dakota,² for engineering applications, and PEST,³

_____________________

² See Dakota.sandia.gov. Accessed September 7, 2011.

³ See pesthomepage.org. Accessed September 7, 2011.

Page 100 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

for environmental applications, are two notable examples. There is also software available to carry out specific computations involved in the VVUQ process (e.g., sensitivity analysis, response-surface modeling, logical-error checking for code verification, and so on.). A recently launched Department of Energy (DOE) effort is focused on developing software tools for UQ in the high-performance computing environment.

Such software as that described above can benefit practitioners and users. The more established efforts have documentation and a user community to help with their use. Although the learning curve is steep, and the framework and tools imposed by a particular software package may not be ideal for the application at hand, many of the utilities in current and developing software would be of use in many VVUQ efforts. Separate, usable libraries of functions and utilities could be used internally in other software efforts, which would make them more useful.

Nearly all of the available software treats the computational model as a black box that produces outputs for a given input setting. Such an approach has obvious advantages for general use—it requires no changes to existing computational models—but will be difficult to adapt to newer, intrusive approaches for UQ.

The VVUQ field would benefit from a collection of testbed examples that demonstrate software and VVUQ methods, provide examples of UQ analyses, and so on. Such a repository, perhaps managed by the Society for Industrial and Applied Mathematics, the American Statistical Association, or some other professional entity with a stake in VVUQ, would allow for the comparison and assessment of different methods and approaches so that practitioners could determine the most appropriate method(s) for their particular application. Such a repository would also help foster an understanding of the similarities in and differences among the various VVUQ approaches that have been developed in separate application areas.

7.3 RESEARCH FOR IMPROVED MATHEMATICAL FOUNDATIONS

This section discusses research directions that could improve the mathematical foundations of the VVUQ process. In the area of solution verification there is a need for methods that can accurately estimate numerical error in the computation of the problem at hand for mathematical models that are more complex than linear elliptic partial differential equations. In the area of validation and prediction, research needs are driven largely by (1) the computational burden presented by large-scale computational models, (2) the need to combine multiple sources of information, and (3) the challenges associated with assessing the quality of model-based predictions. In the area of uncertainty quantification, there is a need for improved methods for handling large numbers of uncertain inputs (the famous “curse of dimensionality”). There are promising directions for research at the interface of probabilistic/ statistical modeling, computational modeling, high-performance computing, and application knowledge, suggesting that future research efforts in VVUQ should include collaborative interdisciplinary activities.

7.3.1 Verification Research

The solution verification process aims to quantitatively estimate the impact of numerical error on a given QOI. “Goal-oriented” methods are of particular interest, because they seek to estimate the error not in some abstract mathematical norm of the solution but rather in a given, defined functional of the solution—a particular QOI. As is discussed in Chapter 3, methods exist for estimating tight two-sided bounds for numerical error in the solution of linear elliptic partial differential equations (PDEs), but research is needed to develop a similar level of maturity for estimating error given more complicated mathematical models. In particular, the following areas of research have the potential for important practical improvements in verification methods.

• Development of goal-oriented a posteriori error-estimation methods that can be applied to mathematical models that are more complicated than linear elliptic PDEs. There are many such models that are of significant practical interest, including features such as nonlinearities, multiple coupled physical phenomena, bridging of multiple scales, hyperbolic PDEs, and stochasticity.

• Development of theory that supports goal-oriented error estimates on complicated grids, including adaptive mesh grids.

• Development of algorithms for goal-oriented error estimates that scale well on massively parallel architectures, especially given complicated grids (including adaptive mesh grids).

Page 101 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

• Development of adaptive algorithms that can control numerical error given the kinds of complex mathematical models described above.

• Development of algorithms and strategies that efficiently manage both discretization error and iteration error, given the kinds of complex mathematical models described above.

• Development of methods to estimate error bounds when meshes cannot resolve important scales. An example is turbulent fluid flow.

• Further development of reference solutions, including “manufactured” solutions, for the kinds of complex mathematical models described above.

• For computational models that are composed of simpler components, including hierarchical models: development of methods that use numerical-error estimates from the simpler components, along with information about how the components are coupled, to produce numerical-error estimates for the overall model.

7.3.2 UQ Research

Although continued effort in improving methodology for building response surfaces and reduced-order models will likely prove fruitful in VVUQ, new research directions that consider VVUQ issues from a broader perspective are likely to yield more substantial gains in efficiency and accuracy. For example, response surface methods mentioned in Chapter 4 may consider both probabilistic descriptions of the input and the form of the mathematical/ computational model to describe output uncertainty, leading to efficiency gains over standard approaches.

Embedded, or intrusive, approaches, such as those that use adjoint information for verification, sensitivity analyses, or inverse problems, tackle the problem from a perspective that leverages computational modeling aspects of the application, often achieving substantial gains in computational efficiency. In large-scale problems some approaches have folded in considerations regarding the computing architecture as well. However, beyond these examples, there is little in the current literature on how to exploit capabilities of high-performance computing in the service of VVUQ. The committee expects that VVUQ methodological research, operating from this broader perspective, will continue to be fruitful in the future.

Some applications use a collection of hierarchically connected models. In some cases, outputs from one model serve as inputs to another. Examples include the modeling of nuclear systems, or the reentry vehicle application described in Section 5.9. In other cases, a hierarchy of low- to high-fidelity computational models is available for modeling a particular system. An example is the modeling of radiative heat transfer using gray diffusion (low), multigroup diffusion (medium), or multi-group transport (high). In other cases, an application uses models that span multiple scales. In materials science, for example, where different models simulate phenomena at different scales ranging from molecular to mesoscale to large scale, where bulk properties such as strength emerge. In regional climate modeling, global and regional models are coupled to produce regional climate forecasts. In all of these cases there is opportunity to develop efficient approaches for VVUQ analyses that take advantage of a hierarchical structure.

There are challenges in such approaches, however. Liu et al. (2009) point out some of the obstacles that arise in the routine application of methodologies to link models. Determining how best to allocate resources for VVUQ investigations—an optimization problem—is an important UQ-related task that could benefit from further research. Optimization may take place rather narrowly, as in determining the best initial conditions over which to carry out a sequence of experiments, or more broadly, as in deciding between improving a module of the computational model or carrying out a costly experiment for a large VVUQ effort. Any such question requires some form of optimization while accounting for many sources of uncertainty.

The preceding paragraphs discuss areas in which improvements are needed in UQ methodology, and more detail is provided in Chapter 4. Here the committee summarizes some research directions that have the potential to lead to significantly improved UQ methods.

• Development of scalable methods for constructing emulators that reproduce the high-fidelity model results at training points, accurately capture the uncertainty away from training points, and effectively exploit salient features of the response surface.

Page 102 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

• Development of phenomena-aware emulators, which would incorporate knowledge about the phenomena being modeled and thereby enable better accuracy away from training points (e.g., Morris, 1991).

• Exploration of model reduction for optimization under uncertainty.

• Development of methods for characterizing rare events, for example by identifying input configurations for which the model predicts significant rare events, and estimating their probabilities.

• Development of methods for propagating and aggregating uncertainties and sensitivities across hierarchies of models. (For example, how to aggregate sensitivity analyses across microscale, mesoscale, and macroscale models to give accurate sensitivities for the combined model remains an open problem.)

• Research and development in the compound area of (1) extracting derivatives and other features from large-scale computational models and (2) developing UQ methods that efficiently use this information.

• Development of techniques to address high-dimensional spaces of uncertain inputs. An important subset of problems is characterized by a large number of uncertain inputs that are correlated through subscale physical phenomena that are not included in the mathematical model being studied (an example of which is interaction coefficients in models involving particle transport).

• Development of algorithms and strategies, across the spectrum of UQ-related tasks, that can efficiently use modern and future massively parallel computer architectures.

• Development of optimization methods that can guide resource allocation in VVUQ while accounting for myriad sources of uncertainty.

7.3.3 Validation and Prediction Research

While many VVUQ tasks introduce questions that can be posed and answered (in principle) within the realm of mathematics, validation and prediction introduce questions whose answers require judgments from the realm of subject-matter expertise. It is challenging to quantify the effect of such judgments on VVUQ outcomes—that is, to translate them into the mathematical realm. This effort comes under the heading of assessing the quality of model-based predictions, which is a key research direction for improving the mathematical foundations of VVUQ.

For validation, “domain of applicability” is recognized as an important concept, but how one defines this domain remains an open question. For predictions, characterizing how a model differs from reality, particularly in extrapolative regimes, is a pressing need. While the literature has offered simple additive discrepancy models, as well as embedded, physically motivated discrepancy models (as in Box 5.1), advances in linking a model to reality will likely broaden the domain of applicability and improve confidence in extrapolative prediction.

Although multimodel ensembles offer an attractive pathway for assessing uncertainty due to model inadequacy, approaches to date have largely used ensembles of convenience, limiting their usefulness. While something is usually better than nothing, more rigorously constructed ensembles of models, designed so that reality is included within their spans, could ultimately provide a better foundation for assessing uncertainty. In a similar vein, some have advocated the use of more highly parameterized models to improve the chances of covering reality (Doherty and Welter, 2010), giving more realistic prediction uncertainties.

The use of large-scale computational models in searching out rare, high-consequence events, or estimating their probability, is particularly susceptible to discrepancies between model and reality. In such situations, models are almost always an extrapolation from available data, often extreme extrapolations. Here, too, research is needed.

The preceding paragraphs discuss areas in which improvements are needed in validation and prediction methodology, and more detail is provided in Chapter 5. Here the committee summarizes some research directions that have the potential to lead to significantly improved validation and prediction methods.

• Development of methods and strategies to quantify the effect of subject-matter judgments, which necessarily are involved in validation and prediction, on VVUQ outcomes.

• Development of methods that help to define the domain of applicability of a model, including methods that help to quantify the notions of near neighbors, interpolative predictions, and extrapolative predictions.

• Development of methods that incorporate mathematical, statistical, scientific, and engineering principles to produce estimates of uncertainty in “extrapolative” predictions.

Page 103 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

• Development of methods or frameworks that help with the all-important problem of relating model-to-model differences, for models in an ensemble, to the discrepancy between models and reality.

• Development of methods to assess model discrepancy and other sources of uncertainty in the case of rare events, especially when validation data do not include such events.

Much of the research in this area should be a joint venture between subject-matter experts, mathematical/ statistical experts, and computational modelers. The committee believes that the traditional funding plan in which funding is separated by field (mathematical sciences, computational science, basic science) is not ideal for making progress in the area of extrapolative predictions.

7.4 EDUCATION CHANGES FOR THE EFFECTIVE INTEGRATION OF VVUQ

The previous sections outline the current practices and future directions of VVUQ for large-scale computational simulations. Although scientists, engineers, and policy makers should of course use current best practices, there are important issues that have to be addressed to bring this about: (1) how to get the main concepts of VVUQ into the hands of those who need them so that the best practices become commonplace and (2) how to prepare the next generation of researchers. This section discusses educational changes in the mathematical sciences community that aim to integrate VVUQ and lay the foundation for improved methods and practices for the future.

As is discussed throughout this report, several broad tasks are included in VVUQ, and these tasks are likely to be performed by individuals with different areas of expertise. It is important that those involved understand these broad tasks and their implications. For instance, it is unlikely that a policy maker will carry out the task of code verification, but it is important that the person making the decisions understand the difference between a code that has gone through a VVUQ process and a code that has not. Conversely, it is equally important that computational modelers are cognizant of the potential uses of the computer code and that the predictive limitations of the computational model are clearly spelled out.

A report of the National Academy of Engineering (NAE), The Engineer of 2020: Visions of Engineering in the New Century, includes in its Executive Summary the vision of “improving our ability to predict risk and adapt systems” (NAE, 2004, p. 3). The same report describes the role of future engineers as continuing “to create solutions that minimize the risk of complete failure” (p. 24). In its report Vision 2025, the American Society of Civil Engineers (ASCE, 2006) describes civil engineers as (1) managers of risk and uncertainty caused by natural events, accidents, and other threats and (2) leaders in discussions and decisions shaping public environmental and infrastructure policy. It is reasonable to believe that similar characterizations enter the vision of other engineering and scientific disciplines.

A scientist or an engineer may have a part in several of the VVUQ tasks. Education plays an important role in making the best practices of VVUQ routine, and education and training should be targeted at the correct audiences, as is discussed further below.

7.4.1 VVUQ at the University

The development and implementation of VVUQ have been motivated by drivers similar to those underlying the NAE and ASCE visions. At the present time, topics in VVUQ are discussed at research conferences. Select topics come up in a few (usually graduate) engineering, statistics, and computer science courses, but a more encompassing view of VVUQ is not yet a standard part of the education of most undergraduate or graduate students. Because of the need to assess and manage risk and uncertainty within traditional mathematics-based modeling and to have confidence in the models for decision making, the educational objectives for VVUQ that could impact all undergraduate and graduate students in engineering, statistics, and the physical sciences should include the following:

• Probabilistic thinking,

• Science- and engineering-based modeling, and

• Numerical analysis and scientific computing.

Page 104 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

Note that item 1 is included in some science and engineering programs, although it is often not required, and items 2 and 3 are not normally included in most probability and statistics or computer-science programs. With respect to items 1 and 2, it is necessary to identify mathematical tools relevant to applying probability and science together to address practical problems. With respect to items 2 and 3, it is necessary to understand how uncertainty can be introduced into deterministic physical laws and how evidence should be weighted to make model-based decisions.

It is reasonable to view VVUQ as motivating the need for an intellectual watershed merging items 1 through 3. VVUQ sits at the confluence of statistics, physics/engineering, and computing, which are themes that are usually discussed separately. To appreciate these distinctions, note that uncertainty is intimately associated with both observations and computational models. It is not about physical processes themselves but rather about one’s interpretation (as embodied in mathematical models, assumptions, and uncertainty in data) of these processes. (If a physical process is random, it introduces another kind of uncertainty, which can be addressed within the mathematical model.) This perspective also holds for more empirically based models from fields such as operations research, psychology, and economics. Computational science is relevant to the extent that it permits the exploration of more detailed models, which helps to make better inferences about the real processes.

At the present time, undergraduate students are typically taught models of reality often without being introduced to the significance of the modeling process and without a critical assessment of associated assumptions and uncertainties. In engineering design courses, for example, students are most often introduced to a fait accompli in which a lack of knowledge and other uncertainties have already been integrated into a collection of safety factors. Students sometimes take advanced science and engineering courses before they have gained exposure to first principles of probability and statistics. Moreover, probability and statistics courses for engineering undergraduate students deal largely with data analysis (computing means, averages, point estimates, and confidence intervals) and do not introduce many concepts that are important to VVUQ.

A modern curriculum in UQ should equip its students with the foundation to reason about risks and uncertainty. This educational goal should include an understanding of the nature of risks associated with engineered and natural processes in an increasingly complex and interconnected world. Recent and ongoing events—including the nuclear reactor meltdown in Japan, the Deepwater Horizon blowout, engine failure in an Airbus 380 superjumbo jet, and the accelerated meltdown of ice sheets, among other examples—provide ready examples to motivate an understanding of the prevalence of risk. These problems are multifaceted and involve modeling themes from several traditional disciplines. A modern curriculum should foster an appreciation of the role that modeling and simulation could play in addressing such complex problems, providing clearer assessment of exposure, hazard, and risk and informing assessments of technical strategies for mitigating such hazards and risks. The curriculum should address effective communication of uncertainty and risk to decision makers, stakeholders, and UQ experts.

What might this mean for university programs? The required material to be integrated into an educational program will depend on the field. Students in engineering and science are routinely taught science- and engineering-based modeling and numerical methods and computing. Students of probability and statistics are taught probabilistic thinking and perhaps some numerical methods and computing. Decision makers (say, students of management) are likely to be introduced only to probabilistic thinking. The implications for different fields are briefly discussed below.

Recommendation: An effective VVUQ education should encourage students to confront and reflect on the ways that knowledge is acquired, used, and updated.

This recommendation can be achieved by assimilating relevant components of VVUQ as a fundamental scientific process into a minimal subset of core courses, sequenced in a manner that is conducive to the objective. Given the constraints of existing curricula, the alternative of integrating one or more new courses may not be feasible.

• Engineering and science. Any proposed educational program should respect the need for a logical sequence in knowledge acquisition. One can propose a route that first introduces the ubiquity of uncertainty

Page 105 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

throughout science and engineering. For example, this approach can be facilitated by the development of a number of examples that explain uncertainties associated with natural phenomena and engineering systems (e.g., the ball-drop examples in Chapters 1 and 5). This step can be followed by an introduction to probabilistic thinking, including classical as well as Bayesian statistics. Many of these ideas can likely be integrated into existing courses rather than requiring the introduction of new courses into an already-crowded curriculum. The engineering design process, as embodied in capstone design courses, can then be presented as a decision process aimed at selecting from competing alternatives, subject to various constraints. This formulation of the design process has the added benefit of articulating to the student the scientific distinctions between various design paradigms or procedures (usually presented in the form of design recipes). It may be that some programs already have such approaches, but they are not common.

• It is important to teach students to regularly confront uncertainty in input data and corresponding uncertainty in their stated answers. The committee encourages instructors in traditional courses to pose questions that include uncertainty in the input formation.

• Similar to engineers and scientists, students of probability and statistics should acquire training in mathematical modeling as well as in computational and numerical methods. Again, the path to doing so should build on the logical sequence of discipline-specific core training. The key is understanding how probabilistic thinking fits into the scientific process (e.g., how probability fits together with mathematical modeling) and also understanding the limits of computation.

Recommendation: The elements of probabilistic thinking, physical-systems modeling, and numerical methods and computing should become standard parts of the respective core curricula for scientists, engineers, and statisticians.

• Programs in management sciences. The intellectual framework represented by VVUQ seeks to assess the uncertainty in answers to a problem with respect to uncertainties in the given information. It is unlikely that students who are being trained as policy makers are going to be routinely interested in computational modeling, but it is important that they be educated in assessing the quality and reliability of the information that they are using to make decisions and also in assessing the inferential limits of the information. In the VVUQ context, this can mean, for example, understanding whether or not to trust a model that has undergone a VVUQ process, or understanding the distinction between predictions that have been informed by observations and those that have not.

It will be a challenge for individual university departments to take the lead in integrating VVUQ into their curricula. An efficient way of doing so would be to share the load among the relevant units. A way forward is emerging as a result of the DOE’s Predictive Science Academic Alliance Program (PSAAP). For example, at the University of Michigan’s Center for Radiative Shock Hydrodynamics (CRASH), both graduate and undergraduate students are included in the fundamental VVUQ steps as part of CRASH’s core mission. More importantly in the current context, as a result of PSAAP, the university is initiating an interdisciplinary Ph.D. program in predictive science and engineering. Students in the program have a home department but will also take courses and develop methodology relating to VVUQ. (A course in VVUQ has already been taught.) The computational science, engineering, and mathematics program at the Institute for Computational and Engineering Sciences at the University of Texas also has a similar graduate program. It is not hard to imagine a similar interdisciplinary program (perhaps a certificate program in predictive science) being rolled out to undergraduate students in engineering, physics, probability and statistics, and possibly management science.

Finding: Interdisciplinary programs incorporating VVUQ methodology are emerging as a result of investment by granting bodies.

Recommendation: Support for interdisciplinary programs in predictive science, including VVUQ, should be made available for education and training to produce personnel who are highly qualified in VVUQ methods.

Page 106 Cite

Suggested Citation:"7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification." National Research Council. 2012. Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification. Washington, DC: The National Academies Press. doi: 10.17226/13395.

×

7.4.2 Spreading the Word

VVUQ plays an important role, with far-reaching consequences, in making sense of the information provided by computational models, observations, and expert judgment. It is important to communicate the best practices of VVUQ to those creating and using computational models and also to instructors in university programs.

To this end, several activities could be undertaken. For example, to provide assistance to instructors, some model problems and solutions have to be made available. In this spirit, people with expertise in the areas of VVUQ can be encouraged to write an article or series of articles targeted to an educational journal, in which problems are introduced and solutions are outlined.

Recommendation: Federal agencies should promote the dissemination of VVUQ materials and the offering of informative events for instructors and practitioners.

This type of contribution would go a long way toward sharing important ideas and suggesting how they might be implemented in a classroom setting. Along the same lines, the NAE could perhaps devote a special issue of its quarterly publication The Bridge to this type of initiative. It is also important to build on existing resources, such as the American Statistical Association’s Guidelines for Assessment and Instruction in Statistics Education, which addresses the statistical component of VVUQ and highlights the need for a good understanding of data modeling, data analysis, data interpretation, and decisions. For existing practitioners, educational activities should be routinely included at conferences and also through the mathematical sciences institutes (e.g., the Statistical and Applied Mathematical Sciences Institute in Research Triangle Park, North Carolina, and the Mathematical Sciences Research Institute in Berkeley, California).

7.5 CLOSING REMARKS

This chapter attempts to peer into the future of VVUQ and to summarize the committee’s responses to its tasking. It identifies key principles that we found to be helpful and identifies best practices that the committee has observed in the application of VVUQ to difficult problems in computational science and engineering. It identifies research areas that promise to improve the mathematical foundations that undergird VVUQ processes. Finally, it discusses changes in the education of professionals and dissemination of information that should enhance the ability of future VVUQ practitioners to improve and properly apply VVUQ methodologies to difficult problems, enhance the ability of VVUQ customers to understand VVUQ results and use them to make informed decisions, and enhance the ability of all VVUQ stakeholders to communicate with each other. These observations and recommendations are offered in the hope that they will help the VVUQ community as it continues to improve VVUQ processes and broaden their applications.

7.6 REFERENCES

ASCE (American Society of Civil Engineers). 2006. Vision 2025. Available at http://www.asce.org/uploadedFiles/Vision_2025. Accessed September 7, 2011.

Doherty, J., and D. Welter. 2010. A Short Exploration of Structural Noise. Water Resources Research 46:W05525.

Liu, F., M.J. Bayarr, and J. Berger. 2009. Modularization in Bayesian Analysis, with Emphasis on Analysis of Computer Models. Bayesian Analysis 4:119-150.

Morris, M. 1991. Factorial Sampling Plans for Preliminary Computational Experiments. Technometrics 33(2):161-174.

NAE (National Academy of Engineering). 2004. The Engineer of 2020: Visions of Engineering in the New Century. Washington, D.C.: The National Academies Press.

Oreskes, N., K. Shrader-Frechette, and K. Berlitz. 1994. Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences. Science 263(5147):641-646.

Randall, D.A., R.A. Wood, S. Bony, R. Colman, T. Fichefet, J. Fyfe, V. Kattsov, A. Pitman, J. Shukla, J. Srinivasan, R.J. Stouffer, A. Sumi, and K.E. Taylor. 2007. Climate Models and Their Evaluation. Pp. 591-648 in Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. S.D. Solomon, D. Qin, M. Manning, Z. Chen, M.C. Marquis, K.B. Averyt, M. Tignor, and H.L. Miller (Eds.). Cambridge, U.K.: Cambridge University Press.

Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification (2012)

Chapter: 7 Next Steps in Practice, Research, and Education for Verification, Validation, and Uncertainty Quantification

Welcome to OpenBook!

Get Email Updates