4
Model Evaluation

INTRODUCTION

How does one judge whether a model or a set of models and their results are adequate for supporting regulatory decision making? The essence of the problem is whether the behavior of a model matches the behavior of the (real) system sufficiently for the regulatory context. This issue has long been a matter of great interest, marked by many papers over the past several decades, but especially and distinctively by Caswell (1976) who observed that models are objects designed to fulfill clearly expressed tasks, just as hammers, screwdrivers, and other tools have been designed to serve identified or stated purposes. Although “model validation” became a common term for judging model performance, it has been argued persuasively (e.g., Oreskes et al. 1994) that complex computational models can never be truly validated, only “invalidated.” The contemporary phrase for what one seeks to achieve in resolving model performance with observation is “evaluation” (Oreskes 1998). Although it might seem strange for such a label to be important, earlier terms used for describing the process of judging model performance have provoked rather vigorous debate, during which the word “validation” was first to be replaced by “history matching” (Konikow and Bredehoeft 1992) and later by the term “quality assurance” (Beck et al. 1997; Beck and Chen 2000). Some of these terms imply, innately or by their de facto use,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 104
Models in Environmental Regulatory Decision Making 4 Model Evaluation INTRODUCTION How does one judge whether a model or a set of models and their results are adequate for supporting regulatory decision making? The essence of the problem is whether the behavior of a model matches the behavior of the (real) system sufficiently for the regulatory context. This issue has long been a matter of great interest, marked by many papers over the past several decades, but especially and distinctively by Caswell (1976) who observed that models are objects designed to fulfill clearly expressed tasks, just as hammers, screwdrivers, and other tools have been designed to serve identified or stated purposes. Although “model validation” became a common term for judging model performance, it has been argued persuasively (e.g., Oreskes et al. 1994) that complex computational models can never be truly validated, only “invalidated.” The contemporary phrase for what one seeks to achieve in resolving model performance with observation is “evaluation” (Oreskes 1998). Although it might seem strange for such a label to be important, earlier terms used for describing the process of judging model performance have provoked rather vigorous debate, during which the word “validation” was first to be replaced by “history matching” (Konikow and Bredehoeft 1992) and later by the term “quality assurance” (Beck et al. 1997; Beck and Chen 2000). Some of these terms imply, innately or by their de facto use,

OCR for page 104
Models in Environmental Regulatory Decision Making a one-time approval step. Evaluation emerged from this debate as the most appropriate descriptor and is characteristic of a life-cycle process. Two decades ago, model “validation” (as it was referred to then) was defined as the assessment of a model’s predictive performance against a second set of (independent) field data given model parameter (coefficient) values identified or calibrated from a first set of data. In this restricted sense, “validation” is still a part of the common vocabulary of model builders. The difficulty in finding a label for the process of judging whether a model is adequate and reliable for its task is described as follows. The terms “validation” and “assurance” prejudice expectations of the outcome of the procedure toward only the positive—the model is valid or its quality is assured—whereas evaluation is neutral in what might be expected of the outcome. Because awareness of environmental regulatory models has become so widespread in a more scientifically aware audience of stakeholders and the public, words used within the scientific enterprise can have meanings that are misleading in contexts outside the confines of the laboratory world. The public knows well that supposedly authoritative scientists can have diametrically opposed views on the benefits of proposed measures to protect the environment. When there is great uncertainty surrounding the science base of an issue, groups of stakeholders within society can take this issue as a license to assert utter confidence in their respective versions of the science, each of which contradicts those of the other groups. Great uncertainty can lead paradoxically to a situation of “contradictory certainties” (Thompson et al. 1986), or at least to a plurality of legitimate perspectives on the given issue, with each such perspective buttressed by a model proclaimed to be valid. Those developing models have found this situation disquieting (Bredehoeft and Konikow 1993) because, even though science thrives on the competition of ideas, when two different models yield clearly contradictory results, as a matter of logic, they cannot both be true. It matters greatly how science and society communicate with each other (Nowotny et al. 2001); hence, in part, scientists shunned the word “validation” in judging model performance. Today, evaluation comprises more than merely a test of whether history has been matched. Evaluation should not be something of an afterthought but, indeed, a process encompassing the entire life cycle of the task. Furthermore, for models used in environmental regulatory activities, the model builder is not the only archetypal interested party holding a stake in the process but is also one among several key players,

OCR for page 104
Models in Environmental Regulatory Decision Making including the model user, the decision maker or regulator, the regulated parties, and the affected members of the general public or the representative of the nongovernmental organization. Evaluation, in short, is an altogether much broader, more comprehensive affair than validation and encompasses more elements than simply the matching of observations to results. This is not merely a question of form, however. In this chapter, where the committee describes the process of model evaluation, it adopts the perspective, discussed in Chapter 1 of this report, that a model is a “tool” designed to fulfill a task—providing scientific and technical support in the regulatory decision-making process—not a “truth-generating machine” (Janssen and Rotmans 1995; Beck et al. 1997). Furthermore, in sympathy with the Zeitgeist of contemporary environmental policy making, where the style of decision making has moved from that of a command-and-control technocracy to something of a more participatory, more open democracy (Darier et al. 1999), we must address the changing perception of what it takes to trust a model. This not only involves the elements of model evaluation but also who will have a legitimate right to say whether they can trust the model and the decisions emanating from its application. Achieving trust in the model among those stakeholders in the regulatory process is an objective to be pursued throughout the life of a model, from concept to application. The committee’s goal in this chapter is to articulate the process of model evaluation used to inform regulation and policy making. We cover three key issues: the essential objectives for model evaluation; the elements of model evaluation, and the management and documentation of the evaluation process. To discuss the elements of model evaluation in more detail, we characterize the life stages of a model and the application of the elements of model evaluation at these different stages. We organized the discussion around four stages in the life cycle of a regulatory model—problem identification, conceptual model development, model construction, and model application (see Figure 4-1). The life-cycle concept broadens the view of what modeling entails and may strengthen the confidence that users have in models. Although this perspective is somewhat novel, the committee observed some existing and informative examples in which model evaluations effectively tracked the life cycle of a model. These examples are discussed later in this chapter. We recognize that reducing a model’s life cycle to four stages is a simplified view, especially for models with long lives that go through

OCR for page 104
Models in Environmental Regulatory Decision Making FIGURE 4-1 Stages of a model’s life cycle. important changes from version to version. The MOBILE model for estimating atmospheric vehicle emissions, the UAM (urban airshed model) air quality model, and the QUAL2 water quality models are examples of models that have had multiple versions and major scientific modifications and extensions in over two decades of their existence (Scheffe and Morris 1993; Barnwell et al. 2004; EPA 1999c). The perspective of a four-stage life cycle is also simplified from the stages of model development discussed in Chapter 3. However, simplifying a model’s life cycle makes discussion of model evaluation more tractable.

OCR for page 104
Models in Environmental Regulatory Decision Making Historically, the management of model quality has been inconsistent, due in part to the failure to recognize the impact of errors and omissions in the early stages of the life cycle of the model. At EPA (and other organizations), the model evaluation process traditionally has only begun at the model construction and model application stages. Yet formulating the wrong model questions or even confronting the right questions with the wrong conceptual model will result in serious quality problems in the use of a model. Limited empirical evidence in the groundwater modeling field suggests that 20-30% of model analyses confront new data that render the prevailing conceptual model invalid (Bredehoeft 2005). Such quality issues are difficult to discover and even more difficult to resolve (if discovered) when model evaluation applies only at the late stages of the model life cycle. ESSENTIAL OBJECTIVES FOR MODEL EVALUATION Fundamental Questions To Be Addressed In the transformation from simple “validation” to the more extensive process of model evaluation, it is important to identify the questions that are confronted in model evaluation. When viewing model evaluation as an ongoing process, several key questions emerge. Beck (2002b) suggests the following formulation: Is the model based on generally accepted science and computational methods? Does it work, that is, does it fulfill its designated task or serve its intended purpose? Does its behavior approximate that observed in the system being modeled? Responses to such questions will emerge and develop at various stages of model development and application, from the task description through the construction of the conceptual and computational models and eventually to the applications. The committee believes that answering these questions requires careful assessment of information obtained at each stage of a model’s life cycle.

OCR for page 104
Models in Environmental Regulatory Decision Making Striving for Parsimony and Transparency In the development and use of models, parsimony refers to the preference for the least complicated explanation for an observation. Transparency refers to the need for stakeholders and members of the public to comprehend the essential workings of the model and its outputs. Parsimony derives from Occam’s (or Ockham’s) razor attributed to the 14th century logician William of Occam, stating that “entities should not be multiplied unnecessarily.” Parsimony does not justify simplicity for its own sake. It instead demands that a model capture all essential processes for the system under consideration—but no more. It requires that models meet the difficult goal of being accurate representations of the system of interest while being reproducible, transparent, and useful for the regulatory decision at hand. The need to move beyond simple validation exercises to a more extensive model evaluation leads to the need for EPA to explicitly assess the trade-offs that affect parsimony, transparency, and other considerations in the process of developing and applying models. These trade-offs are important to modelers, regulators, and stakeholders. The committee has identified three fundamental goals to be considered in making trade-offs, which are further discussed in Box 4-1: The need to get the correct answer – This goal refers to the need to make a model capable of generating accurate as well as consistent and reproducible projections of future behavior or consistent assessments of current relationships. The need to get the correct answer for the correct reason – This goal refers to the reproduction of the spatial and temporal detail of what scientists consider to be the essence of the system’s workings. Simple process and empirical models can be “trained” to mimic a system of interest for an initial set of observations, but if the model fails to capture all the important system processes, the model could fail to behave correctly for an observation outside the limited range of “training” observations. Such failure tends to drive models to be more detailed. Transparency – This goal refers to the comprehension of the essential workings of the model by peer reviewers as well as informed but scientifically lay stakeholders and members of the public. This need drives models to be less detailed. Transparency can also been enhanced

OCR for page 104
Models in Environmental Regulatory Decision Making BOX 4-1 Attributes That Foster Accuracy, Precision, Parsimony, and Transparency in Models Gets the Correct Result Model behavior closely approximates behavior of real system High predictive power on a case-by-case basis High predictive power a statistical basis Model results insensitive to factors that should not affect them Gets the Correct Result for the Right Reason Model accurately represents the real system Comprehensive Variables Inputs, outputs Exogenous, endogenous Relationships Functional Cause-effect Statistical Circumstances Input changes Assumption relaxation Resolutions Temporal Spatial Model is based on good science Accepted principles, theory, results From peer reviewed sources Prestige of developer or lab Up-to-date Concepts and theory Algorithms, computational methods Empirical findings Appropriate data are available or feasible to acquire Estimates for model parameters Data for model calibration Transparency Suits specific regulatory context or decisions Address the specific concern Usable by Decision makers Implementers Understandable by Decision makers Stakeholders Implementers

OCR for page 104
Models in Environmental Regulatory Decision Making Model is seen to be appropriate for the specific system Application is within model limitations Resolution Parameter values Special system characteristics (for example, special weather characteristics or soil chemistry) Inputs available for the specific system Parameter estimates Calibration data Results/outputs are helpful Interpretable Relate to regulatory objectives Decision makers Stakeholders Are “actionable,” i.e., they relate to decision variables or policy parameters understandable to decision makers, stakeholders, and the informed public by ensuring that reviewers, stakeholders and the public comprehend the processes followed in developing, evaluating, and applying a model, even if they do not fully understand the basic science behind the models. These three goals can result in competing objectives in model development and application. For example, if the primary task was to use a model as a repository of knowledge, its design might place priority on getting sufficient detail to ensure that the result is correct for the correct reasons. On the other hand, to meet the task of the model as a communication device, the optimal model would minimize detail to ensure transparency. It is also of interest to consider when a regulatory task would be best served by having a model err on the side of getting accurate results but not including sufficient detail to match scientific understanding. For example, when an exposure model can accurately define the relationship between a chemical release to surface water based on a detailed mass balance, should the regulator consider an empirical model that has the same level of accuracy? Here, parsimony might give preference to the simpler empirical model, whereas transparency is best served by the mass-balance model that allows the model user to see how the release is transformed into a concentration. Moreover, in the regulatory context, the more-detailed model addresses the need to reveal to decision makers and stakeholders how different environmental processes can affect the link from emissions to concentration. Nevertheless, if the simpler empirical model provides both accurate and consistent results, it should have a

OCR for page 104
Models in Environmental Regulatory Decision Making role in the decision process even if that role is to provide complementary support and evaluation for the more-detailed model. The committee finds that modelers may often err on the side of making models more detailed than necessary. The reasons for the increasing complexity are varied, but one regulatory modeler mentioned that it is not only modelers that strive to building a more complex model but also stakeholders who wish to ensure that their issue or concerns are represented in the model, even if addressing such concerns does not have an impact on model results (A. Gilliland, Model Evaluation and Applications Branch, Office of Research and Development, EPA, personal commun., May 19, 2006). Increasing the refinement of models introduces increasing model parameters with uncertain values while decreasing the model transparency to users and reviewers. Here, the problem is a model that accrues significant uncertainties when it contains more parameters than can be calibrated with observations available to the model evaluation process. In spite of the drive to make their models more detailed, modelers often prefer to omit capabilities that do not substantially improve model performance—that is, its precision and accuracy for addressing a specific regulatory question. ELEMENTS OF MODEL EVALUATION The evidence used to judge the adequacy of a model for decision-making purposes comes from a variety of sources. They include studies that compare model results with known test cases or observations, comments from the peer review process, and the list of a model’s major assumptions. Box 4-2 lists those and other elements of model evaluation. Many of the elements might be repeated, eliminated, or added to the evaluation as a model’s life cycle moves from problem identification to model application stages. For example, peer review at the model development stage might focus on the translation of theory into mathematical algorithms and numerical solutions, whereas peer review at the model application stage might focus on the adequacy of the input parameters, model execution, and stakeholder involvement. Recognizing that model evaluation may occur separately during the early stages of a model’s life, as well as again during subsequent applications, helps to address issues that might arise when a model is applied by different groups and for different conditions than those for which the model was developed. The committee notes that, whereas the elements of model evaluation and the

OCR for page 104
Models in Environmental Regulatory Decision Making questions to be answered throughout the evaluation process may be generic in nature, what comprises a high-quality evaluation of a model will be both task- and case-specific. As described in Chapter 2, the use of models in environmental regulatory activities varies widely both in the effort and the consequences of the regulatory efforts it supports. Thus, the model evaluation process and the resources devoted to it must be tailored to its specific context. Depending on the setting, model evaluation will not necessarily address all the elements listed in Box 4-2. In its guidance document on the use of models at the agency, EPA (2003d) recognized that a model evaluation should adopt a graded approach to model evaluation, reflecting the need for it to be adequate and appropriate for the decision at hand. The EPA Science Advisory Board (SAB) in its review of EPA’s guidance document on the use of models recommended that the graded concept be expanded to include model development and application (EPA 2006d). The committee here recognizes that model evaluation must be tailored to the complexity and impacts at hand as well as the life stage of the model and the model’s evaluation history. MODEL EVALUATION AT THE PROBLEM IDENTIFICATION STAGE There are many reasons why regulatory activities can be supported by environmental modeling. At the problem identification stage, decision makers together with model developers and other analysts must consider the regulatory decision at hand, the type of input the decision needs, and whether and how modeling can contribute to the decision-making process. For example, if a regulatory problem involves the assessment of the health risk of a chemical, considerations may include whether to focus narrowly on cancer risk or to include a broader spectrum of health risks. Another consideration might be whether the regulatory problem focuses on occupational exposures, acute exposures, chronic exposures, or exposures that occur to a susceptible subpopulation. The final consideration is whether a model might aid in the regulatory activity. If there is sufficient need for computational modeling, there are three questions that must be addressed at the problem identification stage: (1) What types of decisions will the model support? (2) Who will use it? and (3) What data are available to support development, application, and evaluation of a model? Addressing these questions is important

OCR for page 104
Models in Environmental Regulatory Decision Making BOX 4-2 Individual Elements of Model Evaluation Scientific basis – The scientific theories that form the basis for models. Computational infrastructure – The mathematical algorithms and approaches used in the execution of the model computations. Assumptions and limitations – The detailing of important assumptions used in the development or application of a computational model as well as the resulting limitations in the model that will affect the model’s applicability. Peer review – The documented critical review of a model or its application conducted by qualified individuals who are independent of those who performed the work, but who are collectively at least equivalent in technical expertise (i.e., peers) to those who performed the original work. Peer review attempts to ensure that the model is technically adequate, competently performed, properly documented, and satisfies established quality requirements through the review of assumptions, calculations, extrapolations, alternate interpretations, methodology, acceptance criteria, and/or conclusions pertaining from a model or its application (modified from EPA 2006a). Quality assurance and quality control (QA/QC) – A system of management activities involving planning, implementation, documentation, assessment reporting, and improvement to ensure that a model and its component parts are of the type needed and expected for its task and that they meet all required performance standards. Data availability and quality – The availability and quality of monitoring and laboratory data that can be used for both developing model input parameters and assessing model results. Test cases – Basic model runs where an analytical solution is available or an empirical solution is known with a high degree of confidence to ensure that algorithms and computational processes are implemented correctly. Corroboration of model results with observations – Comparison of model results with data collected in the field or laboratory to assess the accuracy and improve the performance of the model. Benchmarking against other models – Comparison of model results with other similar models. Sensitivity and uncertainty analysis – Investigation of what parameters or processes are driving model results as well as the effects of lack of knowledge and other potential sources of error in the model. Model resolution capabilities – The level of disaggregation of processes and results in the model compared to the resolution needs from the problem statement or model application. The resolution includes the level of spatial, temporal, demographic or other types of disaggregation. Transparency – The need for individuals and groups outside modeling activities to comprehend either the processes followed in evaluation or the essential workings of the model and its outputs.

OCR for page 104
Models in Environmental Regulatory Decision Making EPA terms a “dynamic evaluation” study, focuses on a rule issues by EPA in 1998 that required 22 states and the District of Columbia to submit State Implementation Plans providing NOx emission reductions to mitigate ozone transport in the eastern United States. This rule, know as the NOx SIP Call, requires emission reductions from the utility sector and large industrial boilers in the eastern and midwestern United States by 2004. Since theses sources are equipped with continuous emission monitor systems, the NOx SIP call represents a special opportunity to directly measure the emission changes and incorporate them into model simulations with reasonable confidence. Air quality model simulations were developed for summers 2002 and 2004 using the CMAQ model, and the resulting ozone predictions were compared to observed ozone concentrations. Two series of CMAQ simulations have been developed to test two different chemical mechanisms in CMAQ to consider model uncertainty that is associated with the representation of chemistry in the model. Given that regulatory applications use the model’s prediction of the relative change in pollutant concentrations, dynamic evaluations such as these are particularly relevant to the way the model is used. Groundwater models are critical for regulatory applications, such as assessing containment transport from hazardous waste sites and assessing the long-term performance assessments of high level nuclear waste disposal sites. Bredehoeft (2003, 2005) summarizes a series of post-hoc studies where later observations were used to evaluate how well earlier groundwater modeling did in predicting future conditions. Besides errors in conceptual models of the system, which are discussed in the body of this report, Bredehoeft identified insufficient observations for specifying input parameters and boundary conditions as another critical reason why model predictions did not match observations. An additional issue cited was that, in some instances, the assumed environmental management actions that were modeled ended up to be very different from the actual actions taken. It is important to note that, while the number of studies discussed in Bredehoeft (2003, 2005) was extensive, the modeling resources involved was not. Instead, the insights were developed by having an experienced modeler look across a number of applications for overarching conclusions. This observation is important when considering the resource needs and scope of retrospective analysis. into an analysis. In his experience, Bredehoeft noted that alternatives are not carried into analysis. However, such an approach has been applied in the health risk assessment area. Distinctly different conceptual models for health risks from sulfur oxides in air were discussed in several papers by Morgan and colleagues (Morgan et al. 1978, 1984). These papers described alternative conceptualizations of the health risks that are incompatible with each other but that, at the time of the analyses, were supported by some data. In his 2003 paper, Bredehoeft described the following difficulties with conceptual models:

OCR for page 104
Models in Environmental Regulatory Decision Making Modelers tend to regard their conceptual models as immutable. Time and again errors in prediction revolve around a poor choice of the conceptual model. More often than not, data will fit more than one conceptual model equally well. Good calibration of a model does not ensure a correct conceptual model. Probabilistic sampling of the parameter sets does not compensate for uncertainties in the appropriate conceptual models or for wrong or incomplete models. The point of this list is that models with conceptual problems cannot be improved by enhanced efforts at calibration or management of uncertainties. The best chance for identifying and correcting conceptual errors is through an ongoing evaluation of the model against data, especially data taken under novel conditions. The question that should be explored is whether other classes of models share a common weakness. For example, as a class, what weaknesses would be identified by an evaluation of air dispersion, transport and atmospheric chemistry models, or structure-activity relationships? Identifying systemic weaknesses would focus the attention on the most productive priorities for improvement. With a long-term perspective, there will be cases in which it is possible to compare model results with data that were not available when the models were built. A key benefit of retrospective evaluations of models of individual models and of model classes is the identification of priorities for improving models. Efforts to add processes and features of diminishing importance to current models may be of much lower benefit than revisions based on priorities derived from retrospective analyses. The committee did not identify a solid technical basis for deciding whether specific models should be revised other than to address the perception that a specific model was incomplete. RECOMMENDATIONS The committee offers several recommendations based on the discussion in this chapter. They deal with life-cycle model evaluation, peer review, uncertainty analysis, retrospective analysis, and managing the model evaluation process.

OCR for page 104
Models in Environmental Regulatory Decision Making Life-Cycle Model Evaluation Models begin their life cycle with the identification of a need and the development of a conceptual approach, and proceed through building of a computational model and subsequent applications. Models also can evolve through multiple versions that reflect new scientific findings, acquisition of data, and improved algorithms. Model evaluation is the process of deciding whether and when a model is suitable for its intended purpose. This process is not a strict verification procedure but is one that builds confidence in model applications and increases the understanding of model strengths and limitations. Model evaluation is a multifaceted activity involving peer review, corroboration of results with data and other information, quality assurance and quality control checks, uncertainty and sensitivity analyses, and other activities. Even when a model has been thoroughly evaluated, new scientific findings may raise unanticipated questions, or new applications may not be scientifically consistent with the model’s intended purpose. Recommendations Evaluation of a regulatory model should continue throughout the life of a model. In particular, model evaluation should not stop with the evaluation activities that often occur before the public release of a model but should continue throughout regulatory applications and revisions to the model. For all models used in the regulatory process, the agency should begin by developing a life-cycle model evaluation plan commensurate with the regulatory application of the model (for example, the scientific complexity, the precedent-setting potential of the modeling approach or application, the extent to which previous evaluations are still applicable, and the projected impacts of the associated regulatory decision). Some plans may be brief, whereas other plans would be extensive. At a minimum each plan should Describe the model and its intended uses. Describe the relationship of the model to data, including the data for both inputs and corroboration. Describe how such data and other sources of information will be used to assess the ability of the model to meet its intended task.

OCR for page 104
Models in Environmental Regulatory Decision Making Describe all the elements of the evaluation plan by using an outline or diagram showing how the elements relate to the model’s life cycle. Describe the factors or events that might trigger the need for major model revisions or the circumstances that might prompt users to seek an alternative model. These could be fairly broad and qualitative. Identify responsibilities, accountabilities, and resources needed to ensure implementation of the evaluation plan. It is essential that the agency is committed to the concept that model evaluation continues throughout a model’s life. Model evaluation should not be an end unto itself but a means to an end, namely, a model fitted to its purpose. EPA should develop a mechanism that audits the evaluation process to ensure that an evaluation plan is developed, resources are committed to carry it out, and modelers respond to what is learned. Although the committee does not make organizational recommendations or recommendations on the level of effort that should be expended on any particular type of evaluation, it recognizes that the resource implications for implementing life-cycle model evaluation are potentially substantial. However, given the importance of modeling activities in the regulatory process, such investments are critical to enable environmental regulatory modeling to meet challenges now and in the future. Peer Review Peer review is an important tool for improving the quality of scientific products and is basic to all stages of model evaluation. One-time reviews, of the kind used for research articles published in the literature, are insufficient for many of the models used in the environmental regulatory process. More time, effort, and variety of expertise are required to conduct and respond to peer review at different stages of the life cycle, especially for complex models. Recommendations Peer review should be considered, but not necessarily performed, at each stage in a model’s life cycle. Some simple, uncontroversial models

OCR for page 104
Models in Environmental Regulatory Decision Making might not require any peer review, whereas others might merit peer review at several stages. Appropriate peer review requires an effort commensurate with the complexity and significance of the model application. When a model peer review is undertaken, EPA should allow sufficient time, resources, and structure to assure an adequate review. Reviewers should receive not only copies of the model and its documentation but also documentation of its origin and history. Peer review for some regulatory models should involve comparing the model results with known test cases, reviewing the model code and documentation, and running the model for several types of problems for which the model might be used. Reviewing model documentation and results is not sufficient peer review for many regulatory models. Because many stakeholders and others interested in the regulatory process do not have the capability or resources for a scientific peer review, they need to be able to have confidence in the evaluation process. This need requires a transparent peer review process and continued adherence to criteria provided in EPA’s guidance on peer review. Documentation of all peer reviews, as well as evidence of the agency’s consideration of comments in developing revisions, should be part of the model origin and history. Quantifying and Communicating Uncertainty There are two critical but distinct issues in uncertainty analysis for regulatory environmental modeling: what kinds of analyses should be done to quantify uncertainty, and how these uncertainties should be communicated to policy makers. Quantifying Uncertainty A wide range of possibilities is available for performing model uncertainty analysis. At one extreme, all model uncertainties could be represented probabilistically, and the probability distribution of any model outcome of interest could be calculated. However, in assessing environmental regulatory issues, these analyses generally would be quite complicated to carry out convincingly, especially when some of the uncertainties in critical parameters have broad ranges or when the parameter uncertainties are difficult to quantify. Thus, although probabilistic uncer-

OCR for page 104
Models in Environmental Regulatory Decision Making tainty analysis is an important tool, requiring EPA to do complete probabilistic regulatory analyses on a routine basis would probably result in superficial treatments of many sources of uncertainty. The practical problems of performing a complete probabilistic analysis stem from models that have large numbers of parameters whose uncertainties must be estimated in a cursory fashion. Such problems are compounded when models are linked into a highly complex system, for example, when emissions and meteorological model results are used as inputs into an air quality model. At the other extreme, scenario assessment and/or sensitivity analysis could be used. Neither one in its simplest form makes explicit use of probability. For example, a scenario assessment might consider model results for a relatively small number of plausible cases (for example, “pessimistic,” “neutral,” and “optimistic” scenarios). Such a deterministic approach is easy to implement and understand. However, scenario assessment does not typically include information corresponding to conditions not included in the assessment and whatever is known about each scenario’s likelihood. It is not necessary to choose between purely probabilistic approaches and deterministic approaches. Hybrid analyses combining aspects of probabilistic and deterministic approaches might provide the best solution for quantifying uncertainties, given the finite resources available for any analysis. For example, a sensitivity analysis might be used to determine which model parameters are most likely to have the largest impacts on the conclusions, and then a probabilistic analysis could be used to quantify bounds on the conclusions due to uncertainties in those parameters. In another example, probabilistic methods might be chosen to quantify uncertainties in environmental characteristics and expected human health impacts, and several plausible scenarios might be used to describe the monetization of the health benefits. Questions about which of several plausible models to use can sometimes be the dominant source of uncertainty and, in principle, can be handled probabilistically. However, a scenario assessment approach is particularly appropriate for showing how different models yield differing results. Communicating Uncertainties Effective decision making will require providing policy makers

OCR for page 104
Models in Environmental Regulatory Decision Making with more than a single probability distribution for a model result (and certainly more than just a single number, such as the expected net benefit, with no indication of uncertainty). Such summaries obscure the sensitivities of the outcome to individual sources of uncertainty, thus undermining the ability of policy makers to make informed decisions and constraining the efforts of stakeholders to understand the basis for the decisions. Recommendations Quantifying Uncertainty In some cases, presenting results from a small number of model scenarios will provide an adequate uncertainty analysis (for example, cases in which the stakes are low, modeling resources are limited, or insufficient information is available). In many instances, however, probabilistic methods will be necessary to characterize properly at least some uncertainties and to communicate clearly the overall uncertainties. Although a full Bayesian analysis that incorporates all sources of information is desirable in principle, in practice, it will be necessary to make strategic choices about which sources of uncertainty justify such treatment and which sources are better handled through less formal means, such as consideration of how model outputs change as an input varies through a range of plausible values. In some applications, the main sources of uncertainty will be among models rather than within models, and it will often be critical to address these sources of uncertainty. Communicating Uncertainty Probabilistic uncertainty analysis should not be viewed as a means to turn uncertain model outputs into policy recommendations that can be made with certitude. Whether or not a complete probabilistic uncertainty analysis has been done, the committee recommends that various approaches be used to communicate the results of the analysis. These include hybrid approaches in which some unknown quantities are treated probabilistically and others are explored in scenario-assessment mode by decision makers through a range of plausible values. Effective uncertainty communication requires a high level of interaction with the

OCR for page 104
Models in Environmental Regulatory Decision Making relevant decision makers to ensure that they have the necessary information about the nature and sources of uncertainty and their consequences. Thus, performing uncertainty analysis for environmental regulatory activities requires extensive discussion between analysts and decision makers. Retrospective Analysis of Models EPA has been involved in the development and application of computational models for environmental regulatory purposes for as long as the agency has been in existence. Its reliance on models has only increased over time. However, attempts to learn from prior experiences with models and to apply these lessons have been insufficient. Recommendations The committee recommends that EPA conduct and document the results of retrospective reviews of regulatory models not only on single models but also at the scale of model classes, such as models of ground-water flow and models of health risks. The goal of such retrospective evaluations should be the identification of priorities for improving regulatory models. One objective of this analysis would be to investigate systematic strengths and weaknesses that are characteristic of various types of models. A second important objective would be to study the processes (for example, approaches to model development and evaluation) that led to successful models and model applications. In carrying out a retrospective analysis, it might be helpful to use models or categories of models that are old by current modeling standards, because the older models could present the best opportunities to assess actual model performance quantitatively by using subsequent advances in modeling and in new observations. Models and Rule-makings The sometimes contentious setting in which regulatory models are used may impede EPA’s ability to implement some of the recommendations in this report, including the life-cycle evaluation process. Even

OCR for page 104
Models in Environmental Regulatory Decision Making high-quality models are filled with components that are incomplete and must be updated as new knowledge arises. Yet, those attributes may provide stakeholders with opportunities to mount formal challenges against models that produce outputs that they find undesirable. Requirements such as those in the Information Quality Act may increase the susceptibility of models to challenges because outside parties may file a correction request for information disseminated by agencies. When a model that informs a regulatory decision has undergone the multilayered review and comment processes, the model tends to remain in place for some time. This inertia is not always ideal: the cumbersome regulatory procedures and the finality of the rules that survive them may be at odds with the dynamic nature of modeling and the goal of improving models in response to experience and scientific advances. In such an adversarial environment, EPA might perceive that a rigorous life-cycle model evaluation is ill-advised from a legal standpoint. Engaging in this type of rigorous review may expose the model to a greater risk of challenges, at least insofar as the agency’s review is made public, because the agency is documenting features of its models that need to be improved. Moreover, revising a model can trigger lengthy administrative notice and comment processes. However, an improved model is less likely to generate erroneous results that could lead to additional challenges, and it better serves the public interest. Recommendations It is important that EPA institute best practice standards for the evaluation of regulatory models. Best evaluation practices may be much easier for EPA to implement if its resulting rigorous life-cycle evaluation process is perceived as satisfying regulatory requirements, such as those of the Information Quality Act. However, for an evaluation process to meet the spirit and intent of the Information Quality Act, EPA’s evaluation process must include a mechanism for any person to submit information or corrections to a model. Rather than requiring a response within 60 days, as the Information Quality Act does, the evaluation process would involve consideration of that information and response at the appropriate time in the model evaluation process. To further encourage evaluation of models that support federal rule-makings, alternative means of soliciting public comment on model revisions need to be devised over the life cycle of the model. For example,

OCR for page 104
Models in Environmental Regulatory Decision Making EPA could promulgate a separate rule-making that establishes an agency-wide process for the evaluation and adjustment of models used in its rules. Such a programmatic process would allow the agency to provide adequate opportunities for meaningful public comment at important stages of the evaluation and revision of an individual model, without triggering the need for a separate rule-making for each revision. Finally, more rigorous and formalized evaluation processes for models may result in greater deference to agency models by interested parties and by reviewing courts. Such a response could decrease the extent of model challenges through adversarial processes. Model Origin and History Models are developed and applied over many years by participants who enter and exit the process over time. The model origin and history can be lost when individual experiences with a model are not documented and archived. Without an adequate record, a model might be incorrectly applied, or developers might be unable to adapt the model for a new application. Poor historical documentation could also frustrate stakeholders who are interested in understanding a model. Finally, without adequate documentation, EPA might be limited in its ability to justify decisions that were critical to model design, development, or model selection. Recommendations As part of the evaluation plan, a documented history of important events regarding the models should be maintained, especially after public release. Each documentation should have its origin with such key elements as the identity of the model developer and institution, the decisions on critical model design and development, and the records of software version releases. The model documentation also should have elements in “plain English” to communicate with nontechnical evaluators. An understandable description of the model itself, justifications, limitations, and key peer reviews are especially important for building trust. The committee recognizes that information relevant to model origins and histories is already being collected by CREM and stored in its model database, which is available on the CREM web site. CREM’s da-

OCR for page 104
Models in Environmental Regulatory Decision Making tabase includes over 100 models, although updating of this site has declined in recent years. It provides information on obtaining and running the models and on the models’ conceptual bases, scientific details, and results of evaluation studies. One possible way to implement the recommendation for developing and maintaining the model history may be to expand CREM’s efforts in this direction. The EPA Science Advisory Board review of CREM contains additional recommendations with regard to specific improvements in CREM’s database.