The role of verification, validation, and uncertainty quantification (VVUQ) in computational science and engineering has increased significantly in recent years. As high-quality computational modeling becomes available in more application areas, the role played by VVUQ will continue to grow. Previous chapters have addressed VVUQ as it has evolved to date in the computational modeling of complex physical systems. In this chapter, the committee discusses next steps in the evolution of VVUQ. This summary of its responses to the statement of task, includes the committee’s identification of principles and current best practices and its recommendations for VVUQ research and development, as well as recommendations for educational changes.

**7.1 VVUQ PRINCIPLES AND BEST PRACTICES**

As was noted in Chapter 1, the committee has confined its considerations of principles and best practices to the mathematical science aspects of VVUQ. The principles and best practices presented here are, loosely, restricted to those aspects and do not emphasize nonmathematical issues of physical science, communication of results, and so forth. Historically, methodologies for VVUQ have evolved separately in different application areas and fields. As a result, different application areas can have different approaches. A number of recent workshops and conferences have assembled researchers from varied application areas and perspectives, aiming for a cross-fertilization of ideas and a better understanding of the connections, commonalities, and differences among the varied VVUQ practices. As time passes, the relationships among the various practices developed in different settings will become clearer, as will the understanding of best practices for different kinds of applications. However, it is premature to try to identify a single set of methods or algorithms that are the best tools to accomplish the best practices identified below. Today, it appears that some methods and algorithms are better for some applications and others are better for other applications. Therefore, the committee identifies principles and best practices but stops short of prescribing implementation methodologies.

This section begins with some overarching remarks, moves to principles and practices for verification, and then addresses principles and practices for validation and prediction. As previous chapters have emphasized, VVUQ analyses are not well defined unless the quantities of interest (QOIs) are well defined. Defining the QOIs from the start allows a VVUQ process to produce more meaningful results than will be produced if the focus is on the “solution” in general. For example, suppose a given model accurately captures the average, or large-scale, features of a physical system but not the small-scale features. If only large-scale features are important in the given

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 95

7
Next Steps in Practice, Research, and Education for
Verification, Validation, and Uncertainty Quantification
The role of verification, validation, and uncertainty quantification (VVUQ) in computational science and
engineering has increased significantly in recent years. As high-quality computational modeling becomes avail -
able in more application areas, the role played by VVUQ will continue to grow. Previous chapters have addressed
VVUQ as it has evolved to date in the computational modeling of complex physical systems. In this chapter, the
committee discusses next steps in the evolution of VVUQ. This summary of its responses to the statement of task,
includes the committee’s identification of principles and current best practices and its recommendations for VVUQ
research and development, as well as recommendations for educational changes.
7.1 VVUQ PRINCIPLES AND BEST PRACTICES
As was noted in Chapter 1, the committee has confined its considerations of principles and best practices to
the mathematical science aspects of VVUQ. The principles and best practices presented here are, loosely, restricted
to those aspects and do not emphasize nonmathematical issues of physical science, communication of results, and
so forth. Historically, methodologies for VVUQ have evolved separately in different application areas and fields.
As a result, different application areas can have different approaches. A number of recent workshops and confer-
ences have assembled researchers from varied application areas and perspectives, aiming for a cross-fertilization
of ideas and a better understanding of the connections, commonalities, and differences among the varied VVUQ
practices. As time passes, the relationships among the various practices developed in different settings will become
clearer, as will the understanding of best practices for different kinds of applications. However, it is premature to
try to identify a single set of methods or algorithms that are the best tools to accomplish the best practices identi -
fied below. Today, it appears that some methods and algorithms are better for some applications and others are
better for other applications. Therefore, the committee identifies principles and best practices but stops short of
prescribing implementation methodologies.
This section begins with some overarching remarks, moves to principles and practices for verification, and
then addresses principles and practices for validation and prediction. As previous chapters have emphasized,
VVUQ analyses are not well defined unless the quantities of interest (QOIs) are well defined. Defining the QOIs
from the start allows a VVUQ process to produce more meaningful results than will be produced if the focus is
on the “solution” in general. For example, suppose a given model accurately captures the average, or large-scale,
features of a physical system but not the small-scale features. If only large-scale features are important in the given
95

OCR for page 95

96 ASSESSING THE RELIABILITY OF COMPLEX MODELS
application, the appropriately defined QOI should be sensitive to large-scale but not small-scale behavior. In this
case the VVUQ analysis may find that the model is sufficiently accurate (e.g., uncertainties in the predicted QOI
are sufficiently small) to provide actionable information. However, if small-scale details are important, the QOI
should be defined accordingly, and the VVUQ analysis (of the same model applied to the same physical system)
may find that the model is too inaccurate to be of value.
Leveraging work from previous VVUQ analyses should be done with caution. Since VVUQ results are
specific to particular QOIs in particular settings, transferring results to new QOIs and settings can be difficult to
justify. However, one can consider applying VVUQ to a model over a broad set of conditions and QOIs if physi -
cal data are available to support such wide-ranging assessments of model accuracy and there is a firm theoretical
understanding of the physical phenomena being modeled. It can be argued that an example of such a situation is
the Monte Carlo N-Particle transport code,1 a particle-transport code that incorporates a large body of knowledge
and has been tested against measurements derived from thousands of experiments spanning many particle types
and a broad range of conditions.
Within the VVUQ enterprise, the level of rigor employed should be commensurate with the importance and
needs of the application and decision context. Some applications involve high-consequence decisions and therefore
require a substantial VVUQ effort; others do not.
7.1.1 Verification Principles and Best Practices
Here the committee summarizes key verification principles, along with best practices associated with each
principle. Chapter 3 provides more detail.
• Principle: Solution verification is well defined only in terms of specified quantities of interest, which are
usually functionals of the full computed solution.
—Best practice: Clearly define the QOIs for a given VVUQ analysis, including the solution verification
task. Different QOIs will be affected differently by numerical errors.
—Best practice: Ensure that solution verification encompasses the full range of inputs that will be
employed during UQ assessments.
• Principle: The efficiency and effectiveness of code and solution verification can often be enhanced by
exploiting the hierarchical composition of codes and mathematical models, with verification performed
first on the lowest-level building blocks and then on successively more complex levels.
—Best practice: Identify hierarchies in computational and mathematical models and exploit them for
code and solution verification. It is often worthwhile to design the code with this approach in mind.
—Best practice: Include in the test suite problems that test all levels in the hierarchy.
• Principle: Verification is most effective when performed on software developed under appropriate software
quality practices.
—Best practice: Use software configuration management and regression testing and strive to understand
the degree of code coverage attained by the regression suite.
—Best practice: Understand that code-to-code comparisons can be helpful, especially for finding errors
in the early stages of development but that in general they do not by themselves constitute sufficient
code or solution verification.
—Best practice: Compare against analytic solutions, including those created by the method of manufac -
tured solutions—a technique that is helpful in the verification process.
• Principle: The goal of solution verification is to estimate, and control if possible, the error in each QOI
for the problem at hand. (Ultimately, of course, one would want to use UQ to facilitate the making of
decisions in the face of uncertainty. So it is desirable for UQ to be tailored in a way to help identify ways
to reduce uncertainty, bound it, or bypass the problem, all in the context of the decision at hand. The use
of VVUQ for uncertainty management is discussed in Section 6.2, “Decisions Within VVUQ Activities”.)
1 See mcnp-green.lanl.gov. Accessed September 7, 2011.

OCR for page 95

97
NEXT STEPS IN PRACTICE, RESEARCH, AND EDUCATION FOR VVUQ
—Best practice: When possible in solution verification, use goal-oriented a posteriori error estimates,
which give numerical error estimates for specified QOIs. In the ideal case the fidelity of the simulation
is chosen so that the estimated errors are small compared to the uncertainties arising from other sources.
—Best practice: If goal-oriented a posteriori error estimates are not available, try to perform self-conver-
gence studies (in which QOIs are computed at different levels of refinement) on the problem at hand,
which can provide helpful estimates of numerical error.
—Remark: In the absence of a posteriori or self-convergence results, the next best option may be to esti -
mate numerical error in a given QOI in the problem at hand based on detailed assessments of numerical
error in a similar QOI in a relevant reference problem. However, it is challenging to define reference
problems that permit detailed assessments but are demonstrably relevant to the problem at hand. It
can be risky to assume that numerical errors in the reference problem are representative of numerical
errors in the problem at hand.
7.1.2 Validation and Prediction Principles and Best Practices
Although the questions involving solution verification are firmly grounded in mathematical and computational
science, the questions that arise in validation and prediction require statistical and subject-matter (physics, chemis -
try, materials, etc.) expertise as well. They also require choices that involve judgment, for example in determining
the relevance of validation studies to the prediction of a QOI in the problem at hand. This necessary application
of judgment warrants a brief discussion here. The concept of a domain of applicability—a region of a domain
space in which a validation assessment is judged to apply—is helpful in determining the relevance of a validation
assessment to the prediction of a QOI in a given problem at hand. This concept can include features, or descrip -
tors, that characterize the problem space (such as ball density, radius, and drop height in the ball-drop example)
as forming axes that define a mathematical space. Each problem or experiment is associated with a point in the
space; thus, the problems included in the validation assessment map to a collection of points in the domain space.
The problem at hand maps to another point in the space. One can imagine basing a determination of relevance
on the location of a particular problem point relative to the locations of the other points. For example, if the new
point is surrounded by validation-problem points, the validation study might be judged to have high relevance.
This is an appealing notion, but any attempt to apply it with mathematical rigor must address significant com -
plicating truths. If important features are omitted from the set that is chosen to form axes of the space, then two
problems may look similar when they actually differ in important ways. This is illustrated by the “ball texture” in
the ball-drop example in Section 1.6. However, if all potentially important features are included, the dimension of
the space may become intractably large. In the ball-drop example, potential additional features could include ambi -
ent temperature, ambient pressure, ambient humidity, wind conditions, ball skin materials, ball interior structure,
initial rotation applied to the ball as it is dropped, ball elasticity, ball coefficient of thermal expansion, and so on.
Including a large set of features would help guard against the omission of those that may be important. However,
this creates a high-dimensional domain space, which forces essentially any new problem to be “outside” the region
enclosed by previous problems. This makes every prediction appear to be “extrapolative.” In oversimplified terms:
if the domain space is low-dimensional, then subject-matter judgment is required to assess the impact of features
that are not included, but if the domain space is high-dimensional, subject-matter expertise is required to assess
the relevance of previous experience to an extrapolative prediction. Either way, subject-matter expertise must
inform a judgment.
This discussion is not intended to attack the concept of a domain of applicability or to downplay its utility.
Rather, it is intended to illustrate that mathematics alone cannot determine the relevance of past experience to the
problem at hand but that judgment informed by subject-matter expertise is a necessary ingredient in making this
determination.
In spite of variations in validation and prediction practices across fields, the inherent role of expertise and
judgment, and the rapid evolution of improved methodologies, some general principles and best practices in vali -
dation and prediction have emerged that the committee believes will stand the test of time. They are summarized
below. Chapter 5 provides more detail.

OCR for page 95

98 ASSESSING THE RELIABILITY OF COMPLEX MODELS
• Principle: A validation assessment is well defined only in terms of specified QOIs.
—Best practice: Early in the validation process, specify the QOIs that will be addressed.
• Principle: A validation assessment provides direct information about model accuracy only in the domain
of applicability that is “covered” by the physical observations employed in the assessment.
—Best practice: When quantifying or bounding model error for a QOI in the problem at hand, systemati -
cally assess the relevance of supporting validation assessments (which were based on data from dif -
ferent problems, often with different QOIs). Subject-matter expertise should inform this assessment of
relevance (as discussed above and in Chapter 5).
—Best practice: If possible, use a broad range of physical observation sources so that the accuracy of a
model can be checked under different conditions and at multiple levels of integration.
—Best practice: Use “holdout tests” to test validation and prediction methodologies. In such a test some
validation data are withheld from the validation process, the prediction machinery is employed to
“predict” the withheld QOIs, with quantified uncertainties, and finally the predictions are compared
to the withheld data.
—Best practice: If the desired QOI was not observed for the physical systems used in the validation
process, compare sensitivities of the available physical observations with those of the QOI.
—Best practice: Consider multiple metrics for comparing model outputs against physical observations.
• Principle: The efficiency and effectiveness of a validation assessment are often improved by exploiting
the hierarchical composition of computational and mathematical models, with assessments beginning on
the lowest-level building blocks and proceeding to successively more complex levels.
—Best practice: Identify hierarchies in computational and mathematical models, seek measured data that
facilitate hierarchical validation assessments, and exploit the hierarchical composition to the extent possible.
—Best practice: If possible, use physical observations, especially at more basic levels of the hierarchy,
to constrain uncertainties in model inputs and parameters.
• Principle: The uncertainty in the prediction of a physical QOI must be aggregated from uncertainties and
errors introduced by many sources, including: discrepancies in the mathematical model, numerical and
code errors in the computational model, and uncertainties in model inputs and parameters.
—Best practice: Document assumptions that go into the assessment of uncertainty in the predicted QOI,
and also document any omitted factors. Record the justification for each assumption and omission.
—Best practice: Assess the sensitivity of the predicted QOI and its associated uncertainties to each
important source of uncertainty as well as to key assumptions and omissions.
—Best practice: Document key judgments—including those regarding the relevance of validation studies
to the problem at hand—and assess the sensitivity of predicted QOI and its associated uncertainties to
reasonable variations in these judgments.
• Principle: Validation assessments must take into account the uncertainties and errors in physical observa-
tions (measured data).
—Best practice: Identify all important sources of uncertainty/error in validation data—including instru -
ment calibration, uncontrolled variation in initial conditions, variability in measurement setup, and so
on—and quantify the impact of each.
—Best practice: If possible, use replications to help estimate variability and measurement uncertainty.
—Remark: Assessing measurement uncertainties can be difficult when the “measured” quantity is actually
the product of an auxiliary inverse problem—that is, when the quantity is not measured directly but is
inferred from other measured quantities.
7.2 PRINCIPLES AND BEST PRACTICES IN RELATED AREAS
7.2.1 Transparency and Reporting
In the presentation of VVUQ results to stakeholders, including decision makers who may not be familiar with
the analyses, it is important to state clearly the key underlying assumptions along with their potential impact on the

OCR for page 95

99
NEXT STEPS IN PRACTICE, RESEARCH, AND EDUCATION FOR VVUQ
predicted QOIs, their uncertainties, and other key outcomes. In particular, UQ analyses should state which uncer-
tainties are accounted for and which are not and should give some assessment of the impact of those uncertainties
not accounted for. It is also important that the presentation discuss a triage of assumptions, assessing which have
the potential to alter the outcomes and assessing the sensitivity of key outcomes to these alternative assumptions.
A good example of detailing model inadequacies that might affect overall assessment of anthropogenic impact on
climate change is given in Chapter 8 of an Intergovernmental Panel on Climate Change report (Randall et al., 2007).
The use of plain language, suitable for the application at hand, is most effective for presentations. The use
of terminology that has specific meanings in mathematics, statistics, or VVUQ can often lead to misconceptions
or misunderstandings. As Oreskes et al. (1994) point out, words such as “verification” and “validation” carry
common meanings that can be inappropriately attached to computational-model assessment. It is also important
not to confuse the mathematical, computational, and subject-matter science that went into building a large-scale
computational model with the VVUQ effort that assesses the appropriateness and accuracy of the model-based
predictions.
Holdout tests provide a direct demonstration of a model’s ability to predict under new conditions and can be
an effective tool for communicating certain VVUQ concepts and results. Holdout tests use the model to predict
experimental or observational outcomes that were not used in the model calibration process. Once a computational
model has been calibrated with a particular set of physical measurements, the holdout test allows one to see how
the model predicts the system behavior in a new setting. Of course, assessing the degree of extrapolation in a given
holdout test is still an open question, as the committee has discussed above.
7.2.2 Decision Making
Decision makers must have key information from the VVUQ process that is summarized and clearly commu -
nicated. This key information includes summaries of the body of knowledge behind the choice of models, evidence
from the verification process, sensitivities of the calculated QOIs to uncertainties in key parameters, quantification
(from validation studies) of the model’s ability to match relevant measured data, assessment of modeling challenges
in the prediction problem relative to those in the validation problems, key assumptions behind the predictions and
quantified uncertainties, sources of uncertainty that were neglected, and so on. If this information is summarized
and communicated properly, results from the VVUQ process can play a unique and significant role in the efficient
allocation of resources, management of the overall uncertainty budget, and generation of the soundest possible
basis for high-consequence decisions in the presence of uncertainties.
The results of VVUQ analyses can also be used to make decisions regarding how to allocate resources for
future VVUQ activities—computing hardware acquisition, experimental campaigns, model improvement efforts,
and other efforts—to improve prediction accuracy or to improve confidence in model-based predictions. This
decision task is made more difficult by the often high cost of employing available computational models and the
inability of models to perfectly represent reality. A realistic assessment of model inadequacies/discrepancies is
important for resource allocation because models can inform only about processes represented in the models. If
better understanding of current model inadequacies is key to improving predictions, then additional validation data
will likely be required. Hence approaches for resource allocation will necessarily require some form of qualitative
assessments or judgment. Given the complexity of VVUQ activities, a carefully structured planning process can
help to ensure that resources are used efficiently and that significant factors are addressed.
7.2.3 Software, Tools, and Repositories
Practitioners in VVUQ currently have available to assist them a limited set of software and repositories (for
data, examples, and code). This is particularly true for the developing field of uncertainty quantification. A number
of application-specific software projects have been developed—Dakota,2 for engineering applications, and PEST,3
2 See Dakota.sandia.gov. Accessed September 7, 2011.
3 See pesthomepage.org. Accessed September 7, 2011.

OCR for page 95

100 ASSESSING THE RELIABILITY OF COMPLEX MODELS
for environmental applications, are two notable examples. There is also software available to carry out specific
computations involved in the VVUQ process (e.g., sensitivity analysis, response-surface modeling, logical-error
checking for code verification, and so on.). A recently launched Department of Energy (DOE) effort is focused on
developing software tools for UQ in the high-performance computing environment.
Such software as that described above can benefit practitioners and users. The more established efforts have
documentation and a user community to help with their use. Although the learning curve is steep, and the frame -
work and tools imposed by a particular software package may not be ideal for the application at hand, many of
the utilities in current and developing software would be of use in many VVUQ efforts. Separate, usable libraries
of functions and utilities could be used internally in other software efforts, which would make them more useful.
Nearly all of the available software treats the computational model as a black box that produces outputs for a
given input setting. Such an approach has obvious advantages for general use—it requires no changes to existing
computational models—but will be difficult to adapt to newer, intrusive approaches for UQ.
The VVUQ field would benefit from a collection of testbed examples that demonstrate software and VVUQ
methods, provide examples of UQ analyses, and so on. Such a repository, perhaps managed by the Society for
Industrial and Applied Mathematics, the American Statistical Association, or some other professional entity with
a stake in VVUQ, would allow for the comparison and assessment of different methods and approaches so that
practitioners could determine the most appropriate method(s) for their particular application. Such a repository
would also help foster an understanding of the similarities in and differences among the various VVUQ approaches
that have been developed in separate application areas.
7.3 RESEARCH FOR IMPROVED MATHEMATICAL FOUNDATIONS
This section discusses research directions that could improve the mathematical foundations of the VVUQ
process. In the area of solution verification there is a need for methods that can accurately estimate numerical
error in the computation of the problem at hand for mathematical models that are more complex than linear elliptic
partial differential equations. In the area of validation and prediction, research needs are driven largely by (1) the
computational burden presented by large-scale computational models, (2) the need to combine multiple sources of
information, and (3) the challenges associated with assessing the quality of model-based predictions. In the area
of uncertainty quantification, there is a need for improved methods for handling large numbers of uncertain inputs
(the famous “curse of dimensionality”). There are promising directions for research at the interface of probabilistic/
statistical modeling, computational modeling, high-performance computing, and application knowledge, suggesting
that future research efforts in VVUQ should include collaborative interdisciplinary activities.
7.3.1 Verification Research
The solution verification process aims to quantitatively estimate the impact of numerical error on a given QOI.
“Goal-oriented” methods are of particular interest, because they seek to estimate the error not in some abstract
mathematical norm of the solution but rather in a given, defined functional of the solution—a particular QOI. As
is discussed in Chapter 3, methods exist for estimating tight two-sided bounds for numerical error in the solution
of linear elliptic partial differential equations (PDEs), but research is needed to develop a similar level of maturity
for estimating error given more complicated mathematical models. In particular, the following areas of research
have the potential for important practical improvements in verification methods.
• Development of goal-oriented a posteriori error-estimation methods that can be applied to mathematical
models that are more complicated than linear elliptic PDEs. There are many such models that are of sig -
nificant practical interest, including features such as nonlinearities, multiple coupled physical phenomena,
bridging of multiple scales, hyperbolic PDEs, and stochasticity.
• Development of theory that supports goal-oriented error estimates on complicated grids, including adap-
tive mesh grids.
• Development of algorithms for goal-oriented error estimates that scale well on massively parallel archi-
tectures, especially given complicated grids (including adaptive mesh grids).

OCR for page 95

101
NEXT STEPS IN PRACTICE, RESEARCH, AND EDUCATION FOR VVUQ
• Development of adaptive algorithms that can control numerical error given the kinds of complex math-
ematical models described above.
• Development of algorithms and strategies that efficiently manage both discretization error and iteration
error, given the kinds of complex mathematical models described above.
• Development of methods to estimate error bounds when meshes cannot resolve important scales. An
example is turbulent fluid flow.
• Further development of reference solutions, including “manufactured” solutions, for the kinds of complex
mathematical models described above.
• For computational models that are composed of simpler components, including hierarchical models:
development of methods that use numerical-error estimates from the simpler components, along with
information about how the components are coupled, to produce numerical-error estimates for the overall
model.
7.3.2 UQ Research
Although continued effort in improving methodology for building response surfaces and reduced-order models
will likely prove fruitful in VVUQ, new research directions that consider VVUQ issues from a broader perspec -
tive are likely to yield more substantial gains in efficiency and accuracy. For example, response surface methods
mentioned in Chapter 4 may consider both probabilistic descriptions of the input and the form of the mathematical/
computational model to describe output uncertainty, leading to efficiency gains over standard approaches.
Embedded, or intrusive, approaches, such as those that use adjoint information for verification, sensitivity
analyses, or inverse problems, tackle the problem from a perspective that leverages computational modeling aspects
of the application, often achieving substantial gains in computational efficiency. In large-scale problems some
approaches have folded in considerations regarding the computing architecture as well. However, beyond these
examples, there is little in the current literature on how to exploit capabilities of high-performance computing in
the service of VVUQ. The committee expects that VVUQ methodological research, operating from this broader
perspective, will continue to be fruitful in the future.
Some applications use a collection of hierarchically connected models. In some cases, outputs from one model
serve as inputs to another. Examples include the modeling of nuclear systems, or the reentry vehicle application
described in Section 5.9. In other cases, a hierarchy of low- to high-fidelity computational models is available for
modeling a particular system. An example is the modeling of radiative heat transfer using gray diffusion (low),
multigroup diffusion (medium), or multi-group transport (high). In other cases, an application uses models that span
multiple scales. In materials science, for example, where different models simulate phenomena at different scales
ranging from molecular to mesoscale to large scale, where bulk properties such as strength emerge. In regional climate
modeling, global and regional models are coupled to produce regional climate forecasts. In all of these cases there
is opportunity to develop efficient approaches for VVUQ analyses that take advantage of a hierarchical structure.
There are challenges in such approaches, however. Liu et al. (2009) point out some of the obstacles that arise
in the routine application of methodologies to link models. Determining how best to allocate resources for VVUQ
investigations—an optimization problem—is an important UQ-related task that could benefit from further research.
Optimization may take place rather narrowly, as in determining the best initial conditions over which to carry out
a sequence of experiments, or more broadly, as in deciding between improving a module of the computational
model or carrying out a costly experiment for a large VVUQ effort. Any such question requires some form of
optimization while accounting for many sources of uncertainty.
The preceding paragraphs discuss areas in which improvements are needed in UQ methodology, and more
detail is provided in Chapter 4. Here the committee summarizes some research directions that have the potential
to lead to significantly improved UQ methods.
• Development of scalable methods for constructing emulators that reproduce the high-fidelity model results
at training points, accurately capture the uncertainty away from training points, and effectively exploit
salient features of the response surface.

OCR for page 95

102 ASSESSING THE RELIABILITY OF COMPLEX MODELS
• Development of phenomena-aware emulators, which would incorporate knowledge about the phenomena
being modeled and thereby enable better accuracy away from training points (e.g., Morris, 1991).
• Exploration of model reduction for optimization under uncertainty.
• Development of methods for characterizing rare events, for example by identifying input configurations
for which the model predicts significant rare events, and estimating their probabilities.
• Development of methods for propagating and aggregating uncertainties and sensitivities across hierar-
chies of models. (For example, how to aggregate sensitivity analyses across microscale, mesoscale, and
macroscale models to give accurate sensitivities for the combined model remains an open problem.)
• Research and development in the compound area of (1) extracting derivatives and other features from
large-scale computational models and (2) developing UQ methods that efficiently use this information.
• Development of techniques to address high-dimensional spaces of uncertain inputs. An important subset
of problems is characterized by a large number of uncertain inputs that are correlated through subscale
physical phenomena that are not included in the mathematical model being studied (an example of which
is interaction coefficients in models involving particle transport).
• Development of algorithms and strategies, across the spectrum of UQ-related tasks, that can efficiently
use modern and future massively parallel computer architectures.
• Development of optimization methods that can guide resource allocation in VVUQ while accounting for
myriad sources of uncertainty.
7.3.3 Validation and Prediction Research
While many VVUQ tasks introduce questions that can be posed and answered (in principle) within the realm
of mathematics, validation and prediction introduce questions whose answers require judgments from the realm
of subject-matter expertise. It is challenging to quantify the effect of such judgments on VVUQ outcomes—that
is, to translate them into the mathematical realm. This effort comes under the heading of assessing the quality of
model-based predictions, which is a key research direction for improving the mathematical foundations of VVUQ.
For validation, “domain of applicability” is recognized as an important concept, but how one defines this
domain remains an open question. For predictions, characterizing how a model differs from reality, particularly in
extrapolative regimes, is a pressing need. While the literature has offered simple additive discrepancy models, as
well as embedded, physically motivated discrepancy models (as in Box 5.1), advances in linking a model to reality
will likely broaden the domain of applicability and improve confidence in extrapolative prediction.
Although multimodel ensembles offer an attractive pathway for assessing uncertainty due to model inadequacy,
approaches to date have largely used ensembles of convenience, limiting their usefulness. While something is
usually better than nothing, more rigorously constructed ensembles of models, designed so that reality is included
within their spans, could ultimately provide a better foundation for assessing uncertainty. In a similar vein, some
have advocated the use of more highly parameterized models to improve the chances of covering reality (Doherty
and Welter, 2010), giving more realistic prediction uncertainties.
The use of large-scale computational models in searching out rare, high-consequence events, or estimating
their probability, is particularly susceptible to discrepancies between model and reality. In such situations, models
are almost always an extrapolation from available data, often extreme extrapolations. Here, too, research is needed.
The preceding paragraphs discuss areas in which improvements are needed in validation and prediction meth -
odology, and more detail is provided in Chapter 5. Here the committee summarizes some research directions that
have the potential to lead to significantly improved validation and prediction methods.
• Development of methods and strategies to quantify the effect of subject-matter judgments, which neces-
sarily are involved in validation and prediction, on VVUQ outcomes.
• Development of methods that help to define the domain of applicability of a model, including methods
that help to quantify the notions of near neighbors, interpolative predictions, and extrapolative predictions.
• Development of methods that incorporate mathematical, statistical, scientific, and engineering principles
to produce estimates of uncertainty in “extrapolative” predictions.

OCR for page 95

103
NEXT STEPS IN PRACTICE, RESEARCH, AND EDUCATION FOR VVUQ
• Development of methods or frameworks that help with the all-important problem of relating model-to-
model differences, for models in an ensemble, to the discrepancy between models and reality.
• Development of methods to assess model discrepancy and other sources of uncertainty in the case of rare
events, especially when validation data do not include such events.
Much of the research in this area should be a joint venture between subject-matter experts, mathematical/
statistical experts, and computational modelers. The committee believes that the traditional funding plan in which
funding is separated by field (mathematical sciences, computational science, basic science) is not ideal for making
progress in the area of extrapolative predictions.
7.4 EDUCATION CHANGES FOR THE EFFECTIVE INTEGRATION OF VVUQ
The previous sections outline the current practices and future directions of VVUQ for large-scale computational
simulations. Although scientists, engineers, and policy makers should of course use current best practices, there
are important issues that have to be addressed to bring this about: (1) how to get the main concepts of VVUQ into
the hands of those who need them so that the best practices become commonplace and (2) how to prepare the next
generation of researchers. This section discusses educational changes in the mathematical sciences community that
aim to integrate VVUQ and lay the foundation for improved methods and practices for the future.
As is discussed throughout this report, several broad tasks are included in VVUQ, and these tasks are likely to
be performed by individuals with different areas of expertise. It is important that those involved understand these
broad tasks and their implications. For instance, it is unlikely that a policy maker will carry out the task of code
verification, but it is important that the person making the decisions understand the difference between a code that
has gone through a VVUQ process and a code that has not. Conversely, it is equally important that computational
modelers are cognizant of the potential uses of the computer code and that the predictive limitations of the com -
putational model are clearly spelled out.
A report of the National Academy of Engineering (NAE), The Engineer of 2020: Visions of Engineering
in the New Century, includes in its Executive Summary the vision of “improving our ability to predict risk and
adapt systems” (NAE, 2004, p. 3). The same report describes the role of future engineers as continuing “to create
solutions that minimize the risk of complete failure” (p. 24). In its report Vision 2025, the American Society of
Civil Engineers (ASCE, 2006) describes civil engineers as (1) managers of risk and uncertainty caused by natural
events, accidents, and other threats and (2) leaders in discussions and decisions shaping public environmental and
infrastructure policy. It is reasonable to believe that similar characterizations enter the vision of other engineering
and scientific disciplines.
A scientist or an engineer may have a part in several of the VVUQ tasks. Education plays an important role
in making the best practices of VVUQ routine, and education and training should be targeted at the correct audi -
ences, as is discussed further below.
7.4.1 VVUQ at the University
The development and implementation of VVUQ have been motivated by drivers similar to those underlying the
NAE and ASCE visions. At the present time, topics in VVUQ are discussed at research conferences. Select topics
come up in a few (usually graduate) engineering, statistics, and computer science courses, but a more encompass -
ing view of VVUQ is not yet a standard part of the education of most undergraduate or graduate students. Because
of the need to assess and manage risk and uncertainty within traditional mathematics-based modeling and to have
confidence in the models for decision making, the educational objectives for VVUQ that could impact all under-
graduate and graduate students in engineering, statistics, and the physical sciences should include the following:
• Probabilistic thinking,
• Science- and engineering-based modeling, and
• Numerical analysis and scientific computing.

OCR for page 95

104 ASSESSING THE RELIABILITY OF COMPLEX MODELS
Note that item 1 is included in some science and engineering programs, although it is often not required, and
items 2 and 3 are not normally included in most probability and statistics or computer-science programs. With
respect to items 1 and 2, it is necessary to identify mathematical tools relevant to applying probability and science
together to address practical problems. With respect to items 2 and 3, it is necessary to understand how uncertainty
can be introduced into deterministic physical laws and how evidence should be weighted to make model-based
decisions.
It is reasonable to view VVUQ as motivating the need for an intellectual watershed merging items 1 through
3. VVUQ sits at the confluence of statistics, physics/engineering, and computing, which are themes that are
usually discussed separately. To appreciate these distinctions, note that uncertainty is intimately associated with
both observations and computational models. It is not about physical processes themselves but rather about one’s
interpretation (as embodied in mathematical models, assumptions, and uncertainty in data) of these processes. (If
a physical process is random, it introduces another kind of uncertainty, which can be addressed within the math -
ematical model.) This perspective also holds for more empirically based models from fields such as operations
research, psychology, and economics. Computational science is relevant to the extent that it permits the exploration
of more detailed models, which helps to make better inferences about the real processes.
At the present time, undergraduate students are typically taught models of reality often without being intro -
duced to the significance of the modeling process and without a critical assessment of associated assumptions and
uncertainties. In engineering design courses, for example, students are most often introduced to a fait accompli
in which a lack of knowledge and other uncertainties have already been integrated into a collection of safety fac -
tors. Students sometimes take advanced science and engineering courses before they have gained exposure to first
principles of probability and statistics. Moreover, probability and statistics courses for engineering undergraduate
students deal largely with data analysis (computing means, averages, point estimates, and confidence intervals)
and do not introduce many concepts that are important to VVUQ.
A modern curriculum in UQ should equip its students with the foundation to reason about risks and uncer -
tainty. This educational goal should include an understanding of the nature of risks associated with engineered and
natural processes in an increasingly complex and interconnected world. Recent and ongoing events—including the
nuclear reactor meltdown in Japan, the Deepwater Horizon blowout, engine failure in an Airbus 380 superjumbo
jet, and the accelerated meltdown of ice sheets, among other examples—provide ready examples to motivate an
understanding of the prevalence of risk. These problems are multifaceted and involve modeling themes from
several traditional disciplines. A modern curriculum should foster an appreciation of the role that modeling and
simulation could play in addressing such complex problems, providing clearer assessment of exposure, hazard,
and risk and informing assessments of technical strategies for mitigating such hazards and risks. The curriculum
should address effective communication of uncertainty and risk to decision makers, stakeholders, and UQ experts.
What might this mean for university programs? The required material to be integrated into an educational
program will depend on the field. Students in engineering and science are routinely taught science- and engi -
neering-based modeling and numerical methods and computing. Students of probability and statistics are taught
probabilistic thinking and perhaps some numerical methods and computing. Decision makers (say, students of
management) are likely to be introduced only to probabilistic thinking. The implications for different fields are
briefly discussed below.
Recommendation: An effective VVUQ education should encourage students to confront and reflect on the ways
that knowledge is acquired, used, and updated.
This recommendation can be achieved by assimilating relevant components of VVUQ as a fundamental scien -
tific process into a minimal subset of core courses, sequenced in a manner that is conducive to the objective. Given
the constraints of existing curricula, the alternative of integrating one or more new courses may not be feasible.
• Engineering and science. Any proposed educational program should respect the need for a logical
sequence in knowledge acquisition. One can propose a route that first introduces the ubiquity of uncertainty

OCR for page 95

105
NEXT STEPS IN PRACTICE, RESEARCH, AND EDUCATION FOR VVUQ
throughout science and engineering. For example, this approach can be facilitated by the development
of a number of examples that explain uncertainties associated with natural phenomena and engineering
systems (e.g., the ball-drop examples in Chapters 1 and 5). This step can be followed by an introduction
to probabilistic thinking, including classical as well as Bayesian statistics. Many of these ideas can likely
be integrated into existing courses rather than requiring the introduction of new courses into an already-
crowded curriculum. The engineering design process, as embodied in capstone design courses, can then
be presented as a decision process aimed at selecting from competing alternatives, subject to various
constraints. This formulation of the design process has the added benefit of articulating to the student the
scientific distinctions between various design paradigms or procedures (usually presented in the form of
design recipes). It may be that some programs already have such approaches, but they are not common.
• It is important to teach students to regularly confront uncertainty in input data and corresponding uncer-
tainty in their stated answers. The committee encourages instructors in traditional courses to pose ques -
tions that include uncertainty in the input formation.
• Similar to engineers and scientists, students of probability and statistics should acquire training in
mathematical modeling as well as in computational and numerical methods. Again, the path to doing so
should build on the logical sequence of discipline-specific core training. The key is understanding how
probabilistic thinking fits into the scientific process (e.g., how probability fits together with mathematical
modeling) and also understanding the limits of computation.
Recommendation: The elements of probabilistic thinking, physical-systems modeling, and numerical methods and
computing should become standard parts of the respective core curricula for scientists, engineers, and statisticians.
• Programs in management sciences. The intellectual framework represented by VVUQ seeks to assess the
uncertainty in answers to a problem with respect to uncertainties in the given information. It is unlikely
that students who are being trained as policy makers are going to be routinely interested in computational
modeling, but it is important that they be educated in assessing the quality and reliability of the informa -
tion that they are using to make decisions and also in assessing the inferential limits of the information.
In the VVUQ context, this can mean, for example, understanding whether or not to trust a model that has
undergone a VVUQ process, or understanding the distinction between predictions that have been informed
by observations and those that have not.
It will be a challenge for individual university departments to take the lead in integrating VVUQ into their
curricula. An efficient way of doing so would be to share the load among the relevant units. A way forward is
emerging as a result of the DOE’s Predictive Science Academic Alliance Program (PSAAP). For example, at the
University of Michigan’s Center for Radiative Shock Hydrodynamics (CRASH), both graduate and undergraduate
students are included in the fundamental VVUQ steps as part of CRASH’s core mission. More importantly in the
current context, as a result of PSAAP, the university is initiating an interdisciplinary Ph.D. program in predictive
science and engineering. Students in the program have a home department but will also take courses and develop
methodology relating to VVUQ. (A course in VVUQ has already been taught.) The computational science, engi -
neering, and mathematics program at the Institute for Computational and Engineering Sciences at the University
of Texas also has a similar graduate program. It is not hard to imagine a similar interdisciplinary program (perhaps
a certificate program in predictive science) being rolled out to undergraduate students in engineering, physics,
probability and statistics, and possibly management science.
Finding: Interdisciplinary programs incorporating VVUQ methodology are emerging as a result of investment
by granting bodies.
Recommendation: Support for interdisciplinary programs in predictive science, including VVUQ, should be made
available for education and training to produce personnel who are highly qualified in VVUQ methods.

OCR for page 95

106 ASSESSING THE RELIABILITY OF COMPLEX MODELS
7.4.2 Spreading the Word
VVUQ plays an important role, with far-reaching consequences, in making sense of the information provided
by computational models, observations, and expert judgment. It is important to communicate the best practices of
VVUQ to those creating and using computational models and also to instructors in university programs.
To this end, several activities could be undertaken. For example, to provide assistance to instructors, some
model problems and solutions have to be made available. In this spirit, people with expertise in the areas of VVUQ
can be encouraged to write an article or series of articles targeted to an educational journal, in which problems are
introduced and solutions are outlined.
Recommendation: Federal agencies should promote the dissemination of VVUQ materials and the offering of
informative events for instructors and practitioners.
This type of contribution would go a long way toward sharing important ideas and suggesting how they might
be implemented in a classroom setting. Along the same lines, the NAE could perhaps devote a special issue of
its quarterly publication The Bridge to this type of initiative. It is also important to build on existing resources,
such as the American Statistical Association’s Guidelines for Assessment and Instruction in Statistics Education,
which addresses the statistical component of VVUQ and highlights the need for a good understanding of data
modeling, data analysis, data interpretation, and decisions. For existing practitioners, educational activities should
be routinely included at conferences and also through the mathematical sciences institutes (e.g., the Statistical
and Applied Mathematical Sciences Institute in Research Triangle Park, North Carolina, and the Mathematical
Sciences Research Institute in Berkeley, California).
7.5 CLOSING REMARKS
This chapter attempts to peer into the future of VVUQ and to summarize the committee’s responses to its
tasking. It identifies key principles that we found to be helpful and identifies best practices that the committee has
observed in the application of VVUQ to difficult problems in computational science and engineering. It identifies
research areas that promise to improve the mathematical foundations that undergird VVUQ processes. Finally,
it discusses changes in the education of professionals and dissemination of information that should enhance the
ability of future VVUQ practitioners to improve and properly apply VVUQ methodologies to difficult problems,
enhance the ability of VVUQ customers to understand VVUQ results and use them to make informed decisions,
and enhance the ability of all VVUQ stakeholders to communicate with each other. These observations and recom -
mendations are offered in the hope that they will help the VVUQ community as it continues to improve VVUQ
processes and broaden their applications.
7.6 REFERENCES
ASCE (American Society of Civil Engineers). 2006. Vision 2025. Available at http://www.asce.org/uploadedFiles/Vision_2025. Accessed Sep-
tember 7, 2011.
Doherty, J., and D. Welter. 2010. A Short Exploration of Structural Noise. Water Resources Research 46:W05525.
Liu, F., M.J. Bayarr, and J. Berger. 2009. Modularization in Bayesian Analysis, with Emphasis on Analysis of Computer Models. Bayesian
Analysis 4:119-150.
Morris, M. 1991. Factorial Sampling Plans for Preliminary Computational Experiments. Technometrics 33(2):161-174.
NAE (National Academy of Engineering). 2004. The Engineer of 2020: Visions of Engineering in the New Century. Washington, D.C.: The
National Academies Press.
Oreskes, N., K. Shrader-Frechette, and K. Berlitz. 1994. Verification, Validation, and Confirmation of Numerical Models in the Earth Sciences.
Science 263(5147):641-646.
Randall, D.A., R.A. Wood, S. Bony, R. Colman, T. Fichefet, J. Fyfe, V. Kattsov, A. Pitman, J. Shukla, J. Srinivasan, R.J. Stouffer, A. Sumi, and
K.E. Taylor. 2007. Climate Models and Their Evaluation. Pp. 591-648 in Climate Change 2007: The Physical Science Basis. Contribution
of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. S.D. Solomon, D. Qin, M. Man-
ning, Z. Chen, M.C. Marquis, K.B. Averyt, M. Tignor, and H.L. Miller (Eds.). Cambridge, U.K.: Cambridge University Press.