Uncertainty Analysis of Health Risk Estimates
Prepared by Christian Seigneur and Elpida Constantinou, ENSR Consulting and Engineering, 1320 Harbor Bay Parkway, Alameda, California 94501, and Thomas Permutt, Department of Epidemiology and Preventive Medicine, University of Maryland, 655 West Baltimore Street, Baltimore, Maryland 21202 for Leonard Levin, Electric Power Research Institute, 3412 Hillview Avenue, Palo Alto, California 94303. November 1992. Document Number 24600-009-510.
Large uncertainties are typically associated with the estimation of public health risks associated with air toxic emissions. Such uncertainties arise in: (1) the formulation of the models used to simulate the fate and transport of chemicals in the environment and the foodchain, public exposure, dose, and health risks; and (2) the estimation of the parameter values used as input values to these models.
The uncertainty due to model formulation can be reduced to some extent by using models that provide a more comprehensive treatment of the relevant physico-chemical processes. (It should be noted, however, that as a model becomes more comprehensive, the input data requirements may increase substantially; and while the uncertainty associated with the model formulation decreases, the uncertainties associated with the input parameters may increase). Seigneur et al. (1990) provide some guidance on the selection of mathematical models for health risk assessment with various levels of accuracy in their formulation. We focus here on the uncertainties due to the input parameters for a given health risk assessment model.
Uncertainties in parameter values arise for three reasons. First, the value may have been measured, in which case some imprecision is associated with the process of measurement. In the context of this report, however, errors of measurements are likely to be insignificant compared to other kinds of uncertainty. Second, the value may have been measured, but under circumstances other than those for which it must be applied. In this case, additional uncertainty arises from the variation of the parameter in time and space. Third, the value may not have been measured at all, but estimated from relationships with other quantities that are known or measured. In this case, uncertainty in the parameter of interest arises form both uncertainty in the quantities that are measured and from uncertainty about the estimating relationship.
By characterizing the uncertainties in the input parameters of a model and studying the effects of variation in these parameters on the model predictions, we can estimate the part of the uncertainty in the predictions that is due to uncertainty in the inputs.
Uncertainty can be characterized by a probability distribution. That is, the value of a parameter is not known exactly, but, for example, it might be thought to lie between 90 and 100 with probability 0.5, between 85 and 120 with probability 0.75, and so on. Sometimes such probability distributions can be usefully summarized by a few parameters, such as the mean and standard deviation. Uncertainties in the input parameters propagate through the model to produce probability distributions on the output parameters. Figure 1 presents a schematic description of the uncertainty propagation through a model.
We present here a structured methodology for the parameter uncertainty analysis of health risk estimates. The methodology involves: (1) a sensitivity analysis of the model used to perform the health risk calculations, (2) the determination of probability distributions for a number of selected input parameters (i.e., the ones identified as the most influential to the output variable); and (3) the propagation of the uncertainties through the model.
This methodology is applied here to the uncertainty analysis of the carcinogenic health risks estimated as due to the emissions of a coal-fired plant.
Uncertainty Analysis Methodology
A health risk assessment model combines a number of models to simulate the transport and fate of chemicals in air, surface water, surface soil, groundwater and the foodchain. Concentrations calculated by the fate and transport models
are used by exposure-dose models to calculate the doses to exposed individuals, which are then used to calculate health risks.
For the description of the uncertainty analysis methodology, we consider each individual model component as a function Y (dependent variable) of a number of parameters X1, X2,…Xn (independent variables). In summary, we view this methodology as consisting of the following 5 steps:
A more detailed description of each individual step of the methodology is presented in the following paragraphs.
Mathematical models describing physical phenomena are often composed of relatively complex sets of equations involving a large number of input parameters. However, some of these parameters do not have any significant influence on the health risk calculated by the model; i.e., the model output is not sensitive to the values of these input parameters. Therefore, such parameters that do not affect the health risk values significantly, do not need to be known with great accuracy and the uncertainty analysis should focus on those parameters to which the calculated health risks are most sensitive.
The sensitivity analysis allows us to determine the parameters to which the model is most sensitive. These parameters will be called the influential parameters.
When dealing with a complex model, such as a multimedia health risk assessment model, sensitivity analysis should be performed for each individual model component as well as for the overall model.
The sensitivity of the model output (i.e., the dependent variable) to a model input parameter can be measured by the ratio of the change in the model output to the perturbation in the input parameter. We define this ratio as the sensitivity index, SI. For parameter i:
where Xi is the perturbation in the input parameter, and Y is the corresponding change in the model output. In order to compare the sensitivity index for various input parameters, it is appropriate to use a dimensionless representation of the sensitivity index:
where ¯Xi and ¯Y are the mean or some other reference values of the variables, Xi and Y, respectively; and Y* and Xi* refer to normalized perturbations.
Two characteristics of the sensitivity index must be noted:
Even though the sensitivity index, as defined above, sufficiently describes the effect on the model result for a given change in the input parameter, it does not provide a measure of the range of variation in the model output, given the expected range of variation of the input parameter. In other words, a parameter that has a high sensitivity index, may have little effect on the model output if that parameter can only have a very small variation. The height of a power plant stack is an example of a parameter that has significant effect on atmospheric ground-level concentrations but has a small uncertainty. For the case of the power plant studied here, a 100% change in stack height caused a 73% change in the resulting concentrations. Since the uncertainty in stack height, however, can only be due to measurement error, it is not expected to be more than ±2%. Consequently, the actual influence of this parameter on the model result is very small.
We define the uncertainty index as a measure of uncertainty associated with a parameter Xi. Although several definitions of this uncertainty index are possible,
we select one that is objective in a statistical sense by using the standard deviation and the mean of the parameter. The uncertainty index is defined as follows:
where: i is the standard deviation of the parameter distribution ¯X is the mean value of parameter Xi
The uncertainty index, consequently provides a measure of the expected variation of parameter Xi over its range of probable values.
The combination of the model sensitivity to a parameter, and the uncertainty in that parameter provides the information required to assess which parameters need to be included in the uncertainty analysis. We define a sensitivity/uncertainty index as follows:
The sensitivity/uncertainty index, therefore constitutes a representative measure of the influence that a parameter has on the model results and can, thus, be used to select the model influential parameters to be included in the uncertainty analysis. Figure 2 presents a schematic description of the steps followed in the sensitivity analysis procedure.
Even though the concept of the standard deviation of a parameter was used in the definitions of the uncertainty and sensitivity/uncertainty indexes, it is rather unlikely that actual standard deviations will be available for all the parameters examined in the sensitivity analysis. Since the sensitivity analysis is a screening procedure whose goal is to minimize the number of parameters included in the final uncertainty analysis, it is generally appropriate to use other measures that are more readily available to characterize the variability of a parameter. For example, the expected range of variation can be used instead of an actual standard deviation.
Parameterization of the ModeResponse Surface Construction
A multimedia health risk assessment model typically involves a large number of input parameters and comprises several individual models for simulating fate and transport, exposure, dose, and health effects. Such a model can be computationally very demanding and performing an uncertainty analysis for a large number of parameters may, therefore, not be feasible. It is, therefore, necessary to parameterize the various model components in order to reduce the magnitude of
the computations. This model parameterization can be achieved by constructing response surfaces.
A response surface is a simplified version of the actual model which can be used efficiently in the uncertainty analysis as a replacement of the real model. In the case of simple analytical models in the form of a single equation (e.g., dose models) only minor simplifications need to be made for the construction of the response surface. Such simplifications can be accomplished by factoring out of each of the terms of the equation the selected influential parameters, and representing the remaining part of the term by a lumped parameter calculated from previous results of the model. So, the response surface can be of the following form:
where: m = Number of terms in dependent variable expression
Ki = Number of independent variables included in term i
Ai = Calculated lumped parameter of term i
In the case of complex models (e.g., environmental transport models) the response surface can be developed using the following procedure: For all influential parameters of a model component select a number K of parameter sets Xi = (Xil.…Xln) i = 1, K and perform experimental runs of the actual complex model. Then use the pairs of parameter sets Xl…Xk and corresponding model results Yl.…Yk to construct the response surface.
A simple example of a response surface can be that of the atmospheric transport model, in which the air concentration, Ca can be expressed in terms of four independent influential parameters and six constant parameters as follows:
where: Ca = Chemical concentration in air
Qe = Chemical emission rate
Vs = Stack exit velocity
Ts = Stack exit temperature
Ta = Ambient air temperature
Ai = Calculated constant parameters (functions of meteorological data, source characteristics and environmental setting)
A response surface is a parameterization of the model that allows one to calculate the model results with considerably less computations. However, a response
surface is typically specific to a given model application. That is, some of the case study characteristics (e.g., meteorology, hydrology) are implicitly included in the constant parameters of the response surfaces.
Selection of the Probability Distributions for the Input Parameters
Once the influential parameters have been identified, and the response surfaces for each model component constructed, probability distributions must be selected to represent each one of the parameters.
As was mentioned previously, a parameter value can be either directly measured or indirectly estimated through an estimation procedure which usually involves fitting of a curve through a set of experimental points.
In the case of a directly measured parameter, the uncertainty results from uncertainty in the measurement process, and this can sometimes be estimated from repeated measurements. Often, however, the amount of available data is not enough to produce meaningful histograms or probability plots. What is usually available is a range of values within which the true value of a parameter is expected to lie, and possibly a most likely, or range of most likely values for the parameter. In this case, it is left to our judgment and experience to decide what probability distribution is appropriate.
In the case of parameters estimated indirectly through curve fitting (e.g., bioconcentration factors, and cancer potency factors) uncertainty results from both statistical errors in fitting the curve, which can be estimated by statistical procedures, and uncertainty about the form of the curve, which is a matter of judgment.
The development of parameter probability distributions through a combination of a priori expert judgment along with current information in the form of available direct or indirect measurements is known in statistical theory as the Bayesian method. If the measurements are direct, precise, and numerous enough to sufficiently describe the variation pattern of the parameters, then the a priori judgment may have little or no influence on the resulting probability distributions. Conversely, if the measurements are indirect and imprecise, then the a priori judgment may be of great importance.
Propagation of the Model Uncertainties
At this step, the response surfaces developed for the different model components can be combined in a single spreadsheet that performs the function of the overall risk assessment model in a simplified fashion for the case study considered. Several techniques exist to develop a probability distribution in the model output
given probability distributions in the model input parameters. Monte-Carlo and Latin hypercube simulations are standard examples of such techniques. In the special cases where the probability distributions are similar and simple (e.g., normal), the probability distribution in the model output can be calculated analytically. For the general case where no simple analytical approach can be used, however, the spreadsheet model can then be coupled to one of several commercial software packages (e.g., @Risk; Palisade Corp., 1991), which uses the specified probability distributions of the parameters together with the spreadsheet calculations to generate a set of synthetic model results.
Analysis of the Probability Distribution of the Model Health Risk Estimates
If the number of replications for the probabilistic synthetic simulations is large enough, the synthetic results can be statistically analyzed to yield a reasonably reliable probability distribution of the dependent variable (i.e., health risk). If the uncertainty analysis procedure was performed correctly, this probability distribution should represent a more complete and realistic characterization of the anticipated health risks, as it provides a range of possible values accompanied by their corresponding likelihoods instead of a single, deterministic point estimate.
Description Of The Multimedia Health Risk Assessment Model
In this section, we present a description of the multimedia health risk assessment model which was used in the application. This model was developed by combining a number of individual models that handle the fate and transport of chemicals in air, surface water, surface soil, groundwater, and the foodchain, into an integrated multimedia model. A brief description of its individual components, and its overall structure are provided in the following paragraphs. A detailed description is provided by Constantinou and Seigneur (1992).
The model consists of the following nine distinct components:
The multimedia health risk assessment model combines all the model components into a single computer program that takes as input emission stream characteristics and environmental physical parameters, and calculates the resulting carcinogenic and noncarcinogenic health effects. Figure 3 represents the general calculation steps that lead to the final model results.
Deterministic Health Risk Assessment
The boiler of the power plant studied in this application is a 680 MW unit which burns high-sulfur bituminous coal. Four chemicals with listed carcinogenic effect were sampled from the 200m high stack of the facility. Stack air emissions were the only emissions considered in the analysis. Liquid and solid waste discharges were ignored.
The study area examined in the present application was defined to be the area within a 50 km radius of the power plant. This area was divided into 40 subregions by a concentric grid. The major surface water bodies included in the area include a river and a large lake. For the health effect calculations, all public water supply was considered to come from the river and all fish supply was considered to come from the lake.
Carcinogenic and noncarcinogenic health effects were calculated in each of the subregions considered in the study area. The results subject to the uncertainty analysis presented in this report correspond to the carcinogenic health effects in the subregion of maximum risk. Noncarcinogenic risks are not addressed here.
The carcinogenic chemicals detected in the stack air emissions were chromium, arsenic, cadmium, and benzene. The corresponding chemical emission rates were estimated to be 1.08 × 10-2, 4.4 × 10-4, 5.39 × 10-4, and 1.4 × 10-2 g/s, respectively. Since chemical speciation for chromium was not available, the corresponding health effect calculations were performed based on the assumption that total chromium emissions consisted of 5% Cr(VI) and 95% Cr(III). The cumulative carcinogenic lifetime risk from all chemicals and pathways in the subregion of maximum risk was calculated to be 2.2 × 10-8.
Chromium (VI) and arsenic were calculated to be the two major contributors to carcinogenic risk with contributions of 59 and 32%, respectively. Cadmium contributed 8%, and the contribution of benzene was 1%. Among the three exposure pathways considered in the analysis, inhalation was calculated to be the major contributor, with a contribution of 85%. Ingestion ranked second with 15%, and dermal absorption had an insignificant contribution of 0.4%.
Produce was calculated to be the foodchain component which contributed the
most to the ingestion risk, with a contribution of 92%. Fish and soil ingestion had small contributions of 5 and 3%, respectively, and drinking water had an insignificant contribution of 0.3%.
It should be noted that among the four carcinogenic chemicals included in the analysis, only arsenic and benzene are considered to be carcinogenic through noninhalation pathways. Since benzene's contribution was very small, benzene was not included in the uncertainty analysis. Even though arsenic is considered carcinogenic through the ingestion pathway, no cancer potency value is currently tabulated (October, 1992) for this pathway in the Integrated Risk Information System (IRIS) database. The value used for this parameter in the deterministic health risk assessment was the most recent value listed in IRIS.
Sensitivity analysis of the individual model components as well as the overall multimedia health risk assessment model was performed to help identify the influential parameters. A total of 49 parameters were examined, and 22 were selected to be included in the final uncertainty analysis, based on their calculated sensitivity/uncertainty indexes. Table 1 provides a list of all parameters examined, together with their corresponding symbols and units.
Sensitivity/uncertainty indexes of the input parameters were derived for each of the individual model components as well as for the overall risk assessment model. The resulting indexes for the three chemicals included in this analysis are summarized in Table 2.
Using the influential parameters selected, response surfaces were constructed for each of the multimedia health risk assessment model components. In the case of simple models such as the foodchain, exposure-dose, and risk models, the response surfaces were constructed manually by factoring out the influential parameters, and representing the remaining parts of the equations by constants calculated based on the model results. In the case of the more complex environmental transport models, additional sensitivity runs were performed by varying the influential parameters within their assumed range of variation.
In the case of the atmospheric transport model, ISC-LT, four influential parameters were identified: the chemical emission rate (Qe), the stack exit velocity (Vs), the stack exit temperature (Ts), and the ambient air temperature (Ta). The influ-
ences of Ts and Ta were found to be correlated, as what affected the results was the difference between the stack and the ambient temperature, (Ts - Ta) and not their absolute values. Consequently, the two parameters were treated together as one, the temperature difference (Ts - Ta).
Multiple runs were performed by varying Vs and (Ts - Ta) within their assumed range of variation to identify their individual and combined effect on the resulting maximum ground level chemical concentration. Both parameters were found to affect the model results in an exponentially decaying way (i.e., resulting concentrations decreased exponentially for higher values of Vs and (Ts - Ta)) and their variation patterns were fit at the reference point (i.e., parameter values used in the deterministic calculations) by second degree polynomials.
The combined response surface for both parameters was derived by combining their individual curves. It should be noted that this approach is only valid for the range of perturbations considered in this analysis. For larger ranges of parameter variation the model response surface for two or more influential parameters should be derived through multiple regression, where the variation of the model results is examined simultaneously for all parameters. Figure 4 provides a graphical presentation of the derived ISC-LT response surface.
The complete set of equations of the simplified multimedia health risk assessment model is presented below:
where: Ca = Ground-level air concentration; Qe = Chemical emission rate; = Chemical speciation fraction (applies only to chromium case); Vs = Stack exit velocity; Ts = Stack exit temperature; Ta = Ambient temperature; A1j = Constant j for model component 1
where: DR = Chemical deposition rate; Vd = Dry deposition velocity; Vw = Wet deposition velocity; A2j = Constant j for model component 2
where: Lss = Surface soil chemical load; ORf = Fraction of deposited chemical attributed to overland runoff; A3j = Constant j for model component 3
where: Cs = Surface soil concentration; ds = Surface soil depth; EST = Exposure starting time; ED = Exposure duration; pb = Soil bulk density; A4j = Constant j for model component 4
where: BCFp = Soil-to-plant bioconcentration factor
where: D1 = Inhalation dose; IR = Inhalation rate; BW = Body Weight; A6j = Constant j for model component 6
where: D2 = Ingestion dose; INRp = Plant ingestion rate
where: R = Total Carcinogenic risk; CPF1 = Inhalation cancer potency factor; CPF2 = Ingestion cancer potency factor; A7j = Constant j for model component 7
It should be noted that the full set of equations presented above applies only to the arsenic case. Cadmium and chromium are not considered carcinogenic through noninhalation pathways. Consequently, only the atmospheric transport, inhalation dose, and inhalation risk equations apply to their case.
Probability Distribution Selection
Evaluation of the probability distributions of the 22 influential parameters of the model was performed on the basis of available statistical data, literature value ranges, and personal judgment. The selected probability distributions, and the information which the distribution types and parameters where based on are summarized in Table 3.
In a health risk assessment, the uncertainty associated with the health effect parameters (i.e., cancer potency factors in the case of the present application) is of major importance. The EPA recommended values for these parameters are usually derived based on limited animal or epidemiological studies the conditions of which may differ significantly from the conditions for which these values will be applied in a risk assessment.
In the case of epidemiological studies, uncertainty is associated with high-to-low dose extrapolation and factors related to secondary exposures, diet, and hygiene of the population under study. The data are fitted by an assumed model, and the maximum lifetime estimate (MLE) is usually recommended for use by EPA.
In the case of animal studies, uncertainty is associated with interspecies as well as high-to-low dose extrapolations. Due to the additional uncertainty of interspecies extrapolation in this case an upper bound value (95th percentile) is usually recommended for use by EPA.
The cancer potency factors for the three chemicals included in this application were all derived based on epidemiological studies.
In the case of arsenic inhalation, the CPF derivation was based on two separate U.S. smelter worker populations (EPA, 1984a). The data collected were analyzed by five different investigators who derived five different CPF values, using a linear nonthreshold model. The EPA recommended value was derived by obtaining the geometric mean of the individually derived CPFs. In this application we chose to represent the uncertainty of the arsenic inhalation CPF by a uniform distribution extending over the range of values provided by the above mentioned five investigators.
In the case of arsenic ingestion, the CPF derivation was based on an epidemiological study of a Taiwanese population exposed to high arsenic concentrations in drinking water (EPA, 1984a). Analysis of these data resulted in a CPF which was later deemed overconservative by EPA, after comparison with the results of limited U.S. studies. Its high value was attributed to underestimation of the exposure of the Taiwanese population, and the lack of consideration of the pop-
ulation's poor diet and hygiene in the analysis. The value was then scaled by EPA to a lower value that would be more representative for a U.S. population. Due to the uncertainties associated with its derivation, this CPF was recently removed from IRIS until a more reliable value could be derived. In this application we chose to represent the uncertainty of the arsenic ingestion CPF by a triangular distribution with lower bound equal to zero, most likely value equal to the scaled value most recently listed in IRIS (September, 1991), and upper bound equal to the value originally derived from the Taiwanese population data.
In the case of cadmium inhalation, the CPF derivation was based on epidemiological data of a U.S. smelter worker population (EPA, 1985). The EPA-recommended value was determined by fitting a linear nonthreshold model to these data. A 90% confidence interval based only on statistical consideration was constructed around the MLE. In this application we chose to represent uncertainty associated with the cadmium inhalation CPF by a normal probability distribution with mean equal to the maximum likelihood estimate and standard deviation calculated from the estimated confidence interval.
In the case of chromium (VI) inhalation, the CPF was based on a U.S. population of chromate plant workers (EPA, 1984b). The EPA-recommended value was derived by fitting a two-stage model to the data. A lower bound and an upper bound were constructed around the MLE to account for the possibility of underestimation or overestimation of the exposure due to lack of consideration of poor hygiene, smoking habits, and chemical speciation in the analysis. In this application, we chose to represent the uncertainty associated with the chromium (VI) CPF by a triangular distribution defined by the best estimate, upper, and lower bounds.
Monte Carlo AnalysisHealth Risk Probability Distribution
The derived response surfaces were combined in a simplified spreadsheet model which was coupled to the software package @RISK (Palisade Corporation, 1991) which performed the propagation of the input parameter uncertainties through the model. A Monte Carlo analysis with 5000 iterations of the simplified model was performed to produce a synthetic set of carcinogenic health risks associated with the studied coal-fired power plant. Statistical analysis of the synthetic results yielded a probability distribution for the risk. The risk value calculated in the deterministic risk assessment (2.2 × 10-8) was estimated to be at the 83rd percentile of the derived probability distribution. The statistical parameters of this distribution are summarized below:
The derived probability density plot is presented in Figure 5.
A general methodology for the performance of sensitivity/uncertainty analysis was presented. The methodology was applied to a multimedia health risk assessment model. A case study of a coal-fired power plant was used as the basis of this application. The uncertainty of the carcinogenic risk associated with the power plant emissions was examined. The results indicated that the deterministic risk value calculated in the original risk assessment study was a conservative estimate, corresponding to a higher risk percentile on the estimated risk probability distribution.
This work was performed under contract No 3081-1 with the Electric Power Research Institute, Palo Alto, California. The authors gratefully acknowledge the continuous support of the EPRI project manager, Dr. Leonard Levin.
Constantinou, E. and C. Seigneur. A Mathematical Model for Multimedia Health Risk Assessment. Submitted for publication in Environmental Software, 1992.
Palisade Corporation. Risk Analysis and Simulation Add-In for Microsoft Excel, Windows or Apple Macintosh Version, Release 1.02 User's Guide, March, 1991.
Seigneur, C., C. Whipple, et al. Formulation of Modular Mathematical Model for Multimedia Health Risk Assessment, American Institute of Chemical Engineers Summer Meeting, San Diego, California, August, 1990.
U.S. Environmental Protection Agency. Updated Mutagenicity and Carcinogenicity Assessment of Cadmium. Addendum to the Health Assessment Document for Cadmium (May 1981). EPA/600/8-81/023, June, 1985.
U.S. Environmental Protection Agency. Health Assessment Document for Inorganic Arsenic. Environmental Criteria and Assessment Office, Research Triangle Park, NC. EPA/600/8-83-021F, 1984a.
U.S Environmental Protection Agency. Health Assessment for Chromium - Final Report. EPA/600/8-83-014F, August, 1984b.
(Table continued on following page.)