Models, Methods, and Data
Health risk assessment is a multifaceted process that relies on an assortment of methods, data, and models. The overall accuracy of a risk assessment hinges on the validity of the various methods and models chosen, which in turn are governed by the scope and quality of data. The degree of confidence that one can place in a risk assessment depends on the reliability of the models chosen and their input parameters (i.e., variables) and on how well the boundaries of uncertainty have been quantified for the input parameters, for the models as a whole, and for the entire risk-assessment process.
Quantitative assessment of data quality, verification of method, and validation of model performance are paramount for securing confidence in their use in risk assessment. Before a data base is used, the validity of its use must be established for its intended application. Such validation generally encompasses both the characterization and documentation of data quality and the procedures used to develop the data. Some characteristics of data quality are overall robustness, the scope of coverage, spatial and temporal representativeness, and the quality-control and quality-assurance protocols implemented during data collection. More specific considerations include the definition and display of the accuracy and precision of measurements, the treatment of missing information, and the identification and analysis of outliers. Those and similar issues are critical in delineating the scope and limitations of a data set for an intended application.
The performance of methods and models, like that of data bases, must be characterized and verified to establish their credibility. Evaluation and valida-
tion procedures for a model might include sensitivity testing to identify the parameters having the greatest influence on the output values and assessment of its accuracy, precision, and predictive power. Validation of a model also requires an appropriate data base.
This chapter discusses the evaluation and validation of data and models used in risk assessment. In cases where there has been an insufficient assessment of performance or quality, research recommendations are made. Although in this chapter we consider validation issues sequentially, according to each of the stages in the (modified) Red Book paradigm, our goal here is to make the assessment of data and model quality an iterative, interactive component of the entire risk-assessment and risk-characterization process.
As described in Chapter 3, emissions are characterized on the basis of emission factors, material balance, engineering calculations, established Environmental Protection Agency (EPA) protocols, and measurement. In each case, this characterization takes the structural forms of a linearly additive process (i.e., emissions equals product – [feedstock + accumulations]), a multiplicative model (i.e., emissions equals [emission factor][process rate]), or an exponential relationship (e.g., emission equals intercept + [(emission factor) (measurement)exp]).
The additive form is based on the mass-balance concept. An estimate is made by measuring the feedstock and product to determine an equipment-specific or process-specific transfer coefficient. This coefficient is used to estimate emissions to the atmosphere. The measurements available for the additive form are often not sufficiently precise and accurate to yield complete information on inputs and outputs (NRC, 1990a). For example, an NRC committee (NRC, 1990a) considered a plant that produced 5 million pounds of ethylene per day and used more than 200 monitoring points to report production with a measurement accuracy of 1%, equivalent to 50,000 lb of ethylene per day. The uncertainty in this estimate (50,000 lb) greatly exceeded a separate estimate of emissions, 191 lb, which was calculated by the plant and was confirmed by monitoring of the emission points. Thus, despite the apparently good precision of estimates within 1%, the additive method was not reliable. This seems to be generally true for complicated processes or multiple processing steps.
The other forms are based on exponential and multiplicative models. Each may be deterministic or stochastic. For example, emissions from a well-defined sample of similar sources may be tested to develop an emission factor that is meant to be representative of the whole population of sources. A general difficulty with such fits that use these functional (linear or one of several nonlinear forms) forms is that the choice of form may be critical but hard to validate. In addition, it must be assumed that data from the sources used in the calculations are directly applicable to the sources tested in process design and in the manage-
ment and maintenance approaches of the organizations that run them are the same in all cases.
An example of an exponential form of an emission calculation is shown in Figure 7-1. This figure shows the correlation between screening value (the measurement) and leak rate (the emission rate) for fugitive emissions from a valve. The screening value is determined by measuring the hydrocarbons emitted by a piece of equipment (in this case, a valve in gas service) with an instrument like an OVA (organic-vapor analyzer). The leak rate (i.e., emission) is then determined by reading the value on the y axis corresponding to that screening value. Note that the plot is on a log-log scale, so that a "3" on the x axis indicates that a 1,000-ppm screening value corresponds to a "-3.4" on the y axis, or 0.001 lb/hr for each value in gas service at that screening value. The observations here are based on an analysis conducted for 24 synthetic organic chemical manufacturing industry (SOCMI) units representing a cross-section of this industry (EPA, 1981a).
As part of this analysis, a six-unit maintenance study (EPA, 1981a) was used to determine the impact of equipment monitoring and maintenance using an OVA instrument on emission reduction. The equation derived for the value
emissions in gas service explains only 44% (square of the correlation coefficient) of the variance in the points shown in Figure 7-1. Similar results were obtained from other possible emission points.
The facilities in this SOCMI study could reduce the estimate of their emissions by 29-99% by determining plant-specific emission factors, indicating the difficulties in using industry-wide average to represent specific plant behavior.
The multiplicative form improves on the emission-factor approach, in that it incorporates more features of the process, attempting to accommodate the types of equipment being used, the physical properties of the chemical, and the activity of the equipment as a whole. The deterministic form of the multiplicative model is based on the chemical and physical laws that determine the emission rate. The variables measuredvapor pressure, molecular weight, temperature, etc.are chemical physical properties that are related to the emission rate. The multiplicative form provides some scientific basis for the estimate beyond the simple curve-fitting. However, it has difficulties, because some of the properties are not constant. For example, the ambient air temperature, one factor in determining the emission rate, can vary quite widely within a day. The average temperature for a given period, such as a month, is used for ease in calculation, but this practice introduces some error. EPA might want to consider a more detailed analysis in which the emissions that occur during the period are stratified into groups with smaller variations in variables such as ambient temperature. The emissions in the strata could be estimated and weighted sums calculated to provide a better estimate.
Probably the most accurate procedure is to use none of those "forms" to determine emissions, but rather to sample stack and vent emissions at each source. However, such sampling can be quite expensive, and the costs could overburden owners of small sources. Apart from costs, the primary difficulty with this procedure is that it yields an estimate for one site on one occasion. Emissions could change because of a variety of factors. An alternative to testing is to estimate emissions from monitoring data. Continuous emission monitors (CEMs), which are available for a small number of chemicals, are placed in stacks or near fugitive-emission points to measure the concentration of a chemical being released; concentrations can then be converted to amounts. However, CEMs can be expensive and difficult to maintain, and they may produce incomplete or inaccurate measurements. When such testing is conducted, however, they may show that other kinds of estimates are seriously in error. For example, a study (Amoco/EPA, 1992) compared emissions estimated primarily from emission factors with those determined during testing. The measured overall actual estimate of emissions was more than twice as high as the TRI estimate for a variety of reasons, including identification of new sources, overestimation or underestimation of the importance of some sources, and the lack of a requirement to report source emissions under a particular regulation.
Evaluation of EPA Practice EPA has worked diligently to help members of the public who are required to provide emission estimates for regulatory purposes. This 20-year effort has provided documents that are used to estimate air-pollutant emissions throughout the world. However, in some cases, EPA has had to provide emission estimation factors based on very little information about the process involved; it was difficult to check the assumption that the process for which the calculation is being used is similar to the process that was tested in the development of the emission factor.
There are two basic difficulties with the way EPA applies its emission estimation techniques. First, most estimates are made by using the emission factors or by fitting the linear or exponential forms. As discussed previously, the accuracy of emission estimates using these techniques might not be high.
Second, the information is generated in such a way that only point estimates are presented. Although it is clear from the earlier discussion that there can be uncertainty in the estimates, EPA has extensive files on how the emission factors were determined, and this information presumably contains enough points to generate distribution of emissions rather than just a point estimate. EPA provides only qualitative ratings of the accuracy of the emission method. The ratings are not based on the variance in the estimate, but just on the number of emission points used to generate the data. If there are enough points to generate an emission factor, it is possible to estimate the distribution of emission factors from which an estimate can be chosen to solve a particular exposure-risk estimation problem.
However, the emission factors are given only a ''grade" from A (best) to E relative to the quality and amount of data on which estimates are based. An emission factor based on 10 or more plants would likely get an "A" grade, whereas a factor based on a single observation of questionable quality or one extrapolated from another factor for a similar process would probably get a D or E. The grades are subjective and do not consider the variance in the data used to calculate factors. According to EPA (1988e), the grades should "be used only as approximations, to infer error bounds or confidence intervals about each emission factor. At most, a [grade] should be considered an indicator of the accuracy and precision of a given factor used to estimate emissions from a large number of sources." The uncertainty in the estimates is such that EPA is not comfortable with the A-E system and is developing a new qualitative system to indicate uncertainty. EPA is attempting to generate estimation factors for hazardous air pollutants industry by industry, but it is still hesitant to ascribe any sort of uncertainty to emission factors.
A single disruption in operation of a plant can increase the release rate for some interval (hour or day). An extreme example is the dioxin release from a manufacturing plant in Seveso, Italy. Such disruptions are not incorporated into any of the emission characterizations, except for the few cases where emission monitoring is available. However, in those cases, emissions might be so high
that they exceed the maximum reading of a monitor and thereby lead to just a lower bound (if this problem is recognized) or even to a serious underestimate of the actual emission. Furthermore, the frequency and duration of such episodes are unpredictable.
Therefore, EPA should also attempt to make some sort of quantitative estimates of the variability of measured emissions among sources within a category and of the uncertainty in its overall emission estimates for individual sources and the source category as a whole. This issue is discussed in more depth in Chapter 10, but could involve analyzing the four kinds of circumstances as appropriate for a particular source typeroutine, regular maintenance, upsets and breakdowns, and rare catastrophic failures. EPA could also note the implications of the dynamics of causation of different effects for emission estimation, and the resulting need for estimates of exposure and exposure variability over different averaging times.
The itemization of emissions by chemical constituent also raises problems. Emission characterization methods often provide only the amount of VOCs (volatile organic compounds) that is emitted. The amounts of particular compounds (benzene, toluene, xylene, etc.) within these VOC emissions are often not individually reported. Without the emission data on particular compounds, it is impossible to provide the information needed for exposure modeling in the risk-assessment process.
EPA does not appear to be making major strides toward improving the methods used to evaluate emissions. Although EPA is making extensive efforts to distribute the emission factors it has generated, the committee has found insufficient effort either to evaluate the accuracy of the underlying method used to derive the emission estimates or to portray the uncertainty in the emission factors. The primary exception is a joint effort of the Chemical Manufacturers Association (CMA) and EPA on fugitive emissions called Plant Organization Software System for Emission Estimation or POSSEE (CMA, 1989). In this case, companies are testing fugitive emissions within plants and collecting data on chemical and physical variables to derive emission estimates based on deterministic models (which use physical and chemical properties), rather than stochastic models. There have been efforts to increase the scientific justification of estimates of emissions from storage tanks: the American Petroleum Institute has developed data that have been used for developing the estimation method shown in the multiplicative form described above. The question then arises as to how to approach emission estimates in exposure assessments and risk assessments. The uncertainty in the mass-balance approach (additive form) can be so large that its use should be discouraged for any purposes other than for a very general screening. It is unlikely that an emission estimate derived with this method would be appropriate for risk assessment.
The linear emission-factor approach could be used as a general screening tool in an exposure assessment. As indicated by EPA in response to a question from this committee:
While emission factor-based estimates can be useful in providing a general picture of emissions across an entire industrial category, use of such factors to provide inputs to a site-specific risk assessment may introduce a great deal of uncertainty into that assessment.
If such an approach is used for an entire industrial category, then at least the uncertainty of each emission factor should be determined. If there is enough information to derive an emission factor, then a probability distribution could be calculated. There may then be disagreement about where on the probability distribution the emission estimate should be chosen. However, it is better to make the choice explicitly, as discussed in Chapter 9. The same situation is true for emissions estimated with the exponential and multiplicative approaches. EPA should include a probability distribution in all its emission estimates.
One method to determine the uncertainty in an emission estimate more easily would be to require each person submitting an emission estimate (for SARA 313 requirements, permitting, etc.) to include an evaluation of the uncertainty in the estimate. EPA could then evaluate the uncertainty in the estimation methods to determine whether the estimation was done properly. Although that might increase the costs of developing submissions slightly, the organization submitting the estimate might benefit from the results. Small sources unable to afford such analysis could instead define a range that is consistent with known or readily determined factors in their operation (e.g., for a dry cleaner, the pounds of clothes per week and gallons of solvent purchased each month).
EPA is reviewing, revising, and developing emission estimation methods for sources of the 189 chemicals. It is focusing on adding data, rather than evaluating its basic approachthe use of a descriptive model, instead of a model based on processes, for emission estimation. It appears from the examples given above that the uncertainties in emissions can dominate an exposure assessment and that a concerted effort to improve emission estimation could serve to substantially reduce the uncertainty in many risk estimates. Combined industry efforts to improve the techniques used to estimate fugitive emissions on the basis of physical and chemical properties (not just curve-fitting) should be encouraged.
Once an emission characterization is developed, it becomes one of the inputs into an air-quality model to determine the amount of a pollutant in ambient air at a given location. A population-exposure model is then used to determine how much of a pollutant reaches people at that location.
The size of the population that might be exposed to an emission must be determined. Population data have been collected, published, and scrutinized for
centuries. Many such data refer to entire populations or subpopulations, so questions of representation and statistical aspects of sampling do not arise in their usual form. Even where sampling is used, a large background of technique and experience allows complex estimation and other kinds of modeling to proceed without the large uncertainties inherent in, for example, extrapolation from high to low doses of toxic agents or from rodents to humans.
Population data are almost always affected to some degree by nonsampling error (bias), but this is well categorized, understood, and not a serious problem in the context of risk assessment. For example, terminal-digit preference (e.g., a tendency to report ages that end in zero or five) has been minimal since the attainment of nearly universal literacy and especially since the adoption of birth certification. Attainment of advanced ages (i.e., over 80 years) is still overstated, but this is not quantitatively serious in age estimation for purposes of risk assessment (because EPA still assumes that 70 years is the upper-bound value of the length of a liftime). Population undercounts in the U.S. census of 1990 averaged about 2.1% and were substantially higher for some subgroups, perhaps up to 30%; however, even 30% uncertainty is smaller than many other sources of error that are encountered in risk assessment. The largest proportionate claim of uncertainty seems to be in the number of homeless persons in the United States; estimated uncertainty is less than a factor of 10.
Estimation of characteristics in groups or subgroups not examined directly is subject to additional uncertainty. For example, the 1992 population is not directly counted, but standard techniques are used to extrapolate from the census of 1990, which was a nearly complete counting of the population. Investigators have found earlier years estimates to be generally quite accurate, whether the extrapolations were strictly mathematical (e.g., based on linear extrapolation) or demographic (based on accounting for the addition of 3 years between 1990 and 1993, with adjustments for deaths, for births of the population under age 3, deaths, and net migration). The problems are greater for states and smaller areas, because data on migration (including internal migration) are not generally available.
Error tends to increase as subgroups get smaller, partly because statistical variability increases (i.e., small sample size leads to less precision in the estimate of the central tendency with any distributed measurement), but also because individual small segments are not as well characterized and as well understood as larger aggregates and because population data are generally collected according to a single nationwide protocol that allows for little deviation to accommodate special problems.
The committee is comfortable about using published population data for nearly all population characteristics and subgroups. Where adjustment to reduce errors is feasible, it should be used; but in the overall context of risk assessment, error in population assessment contributes little to uncertainty.
In some cases, a research study must define and identify its own population
without help from official census and surveys. An example is a long-term followup study of workers employed in a specific manufacturing plant. When such studies are done by skilled epidemiologists, total counts, ages, and other demographic items tend to be accurate to within a factor of 2 or 3. The largest uncertainties are likely to be in the estimation of exposure to some toxic agent; these are often dealt with by the use of rough categories (high, medium, and low exposure) or surrogate measures (e.g., years employed in a plant, rather than magnitude of exposure). Errors in such work are of great concern, but they tend to be peculiar to each study and hence lead to study-specific remedies in design, performance, or analysis. They tend to be smaller than other kinds of uncertainties, but can still be of concern if a putative effect is also small.
As indicated, population data derived from a census and fortified with estimation methods are regarded as accurate and valid, and uncertainties introduced into risk assessment are relatively small. There is a need, however, for information on additional population characteristics that are not included in the census. There is a paucity of activity-pattern information, and population-exposure models or individual-exposure-personal-exposure models have not been adequately tested or validated, because they use people's activity to estimate exposure to chemicals in air. Only a few small efforts have been undertaken to develop such a data base, namely, EPA's Total Exposure and Assessment Methodology (TEAM) program and the California EPA's State Activity Pattern Study. Those programs have acquired information about people's activities that cause the emission of air pollutants or place people in microenvironments containing air pollutants that potentially lead to exposure. There is a need to develop a national data base on activity patterns that can be used to validate models that estimate personal exposure to airborne toxic chemicals. Accurately described activity patterns coupled with demographic characteristics (e.g., socioeconomic) can be used for making a risk assessment and assessing the environmental equity of risk across socioeconomic groups and races.
When exposure-characterization models are developed for use in risk assessment, the bias and uncertainty that they yield in the calculation of exposure estimates should be clearly defined and stated, regardless of whether activity patterns are included. Later, the choice of an appropriate model from an array of possibilities should be based on, but not necessarily limited to, its quantitative measure of performance and its rationale should be included with a statement of the criteria for its selection.
Air-Quality Model Evaluation
Air-quality models are powerful tools for relating pollutant emissions to ambient air quality. Most air-quality models used in assessing exposure to toxic air pollutants have been extensively evaluated with specific data sets, and their underlying mathematical formulations have been critically reviewed. Relative
to some of the other models for risk assessment of air pollutants, air-quality models probably enjoy the longest history of model evaluation, refinement, and re-evaluation. For example, the original Gaussian-plume models were formulated and tested in the 1950s. That does not mean, however, that model evaluation does not still continue or that the model evaluation should be dismissed in assessing air-pollutant exposure; in fact, previous studies have shown the benefits of model evaluation in every application.
Evaluation of the air-quality models and other components of air-pollutant risk assessment is intended to determine accuracy for providing the details required in a given application and to provide confidence in the results. In air-quality modeling, that is particularly important. A Gaussian-plume model, when used with the input data generally available, might not correctly predict where maximal concentrations will be realized (e.g., because winds at the nearest station, such as an airport, might differ in direction from winds near the source of interest), but should provide a reasonable estimate of the distribution of pollutant concentrations around the site. That might be sufficient for some applications, but not others. Model evaluation can also add insight as to whether a tool is "conservative" or the opposite, and it can provide a quantitative estimate of uncertainty.
Of particular concern are the more demanding applications of models, such as in areas of complex terrain (e.g., hills, valleys, mountains, and over water), when deposition is important, and when atmospheric transformation occurs. As discussed below, it is difficult enough to use models in the simple situations for which they were specifically designed. One should always try to ascertain the level of accuracy that can be expected from a given model in a given application. Sufficient studies have been performed on most air-quality models to address that question.
Zannetti (1990) reviews evaluations of many air-quality models, including Gaussian-plume models. Evaluation procedures have recently been reviewed for photochemical air-quality models (NRC, 1991a). Similar procedures are applicable to other models. In essence, the models should be pushed to their limits, to define the range in which potential errors in either the models themselves or their inputs still lead to acceptable model performances and so that compensatory errors in the models and their inputs (e.g., meteorology, emissions, population distributions, routes of exposure, etc.) will be identified. That should lead to a quantitative assessment of model uncertainties and key weaknesses. As pointed out in the NRC (1991a) report, model evaluation includes evaluation of input data. The greatest limitation in many cases is in the availability and integrity of the input data; for the most part, many models can give acceptable results when good-quality input data are available.
A key motivation in model evaluation is to achieve a high degree of confidence in the eventual risk assessment. Pollutant-transport model evaluation, as it pertains to estimating air-pollutant emissions, has been somewhat neglected and
is used without adequate discussion and analysis. For example, the modeling of emissions from the ASARCO smelter (EPA, 1985b) showed significant bias. However, the reasons for both the bias and errors were not fully identified. A major plume-model validation study was mounted in the early 1980s with support of the Electric Power Research Institute (EPRI); it was the first study of a large coal-fired power plant situated in relatively simple terrain. The study compared three Gaussian-plume models and three first-order closure numerical (stochastic) models, and an experimental, second-order closure model, for which ground-level concentrations were obtained with both routine and intensive measurement programs (Bowne and Londergan, 1983). (First-order closure and second-order closure refer to how the effects of turbulence are treated.) The authors conclude that
Predictions and observed pollutant concentrations often differed by factors of 2-10. It is clear from the studyin which there was no effect of complex terrain, heat islands, or other complicating effectsthat the dispersion models had serious deficiencies. Dispersion models have been developed since then, but they require further development and improvement and they warrant evaluation when applied to new locations or periods.
Larger-scale urban air-quality models perform better in predicting concentrations of secondary speciessuch as ozone, nitrogen dioxide, and formaldehydeeven though the complex chemical reactions might seem to make the task
harder. Prediction accuracy, on the average, is usually within about 10% (NRC, 1990a). This performance is due in part to the coarser spatial resolution used by the model, the chemical transformation times allowing the dispersion from the original sources, and better spatial separation of the sources. The lower spatial resolution, with increased chemical detail and performance, leads back to a consideration of model choice and evaluation: What type of detail is required from a particular model application and what level of performance can be expected?
In summary, model evaluation is an integral part of any risk assessment and is crucial for providing confidence in models. Evaluation procedures have been developed for various classes of air-quality models. Studies have shown that air-quality models can give reasonable predictions, but do not always (or often) do so. Results of a model evaluation can be used in an uncertainty analysis of predicted risk.
Evaluation of EPA Practice The validity of the population-exposure models used by EPA remains largely untested. Ott et al. (1988) used data from EPA's TEAM studies of carbon monoxide (CO) of Denver and Washington, D.C., to examine the validity of the SHAPE model and compared the estimated co-exposure distribution based on the SHAPE model with the distribution based on direct measurement (personal monitoring). They found the estimated average exposure to be similar with the two approaches, but the ranges in estimated exposure distributions were quite different. The SHAPE exposure model predicted median values well, but there were substantial discrepancies in the tails of the distribution.
Duan (1991) also using data from EPA's TEAM study of carbon monoxide in Washington, D.C., found that the concentrations and time intervals were independent and tested the effectiveness of a "variance-components exposure" model in comparison with SHAPE. Both the long-term average concentrations and short-term fluctuations in concentration were important in predicting exposure. Duan (1988) and Thomas (1988) examined several statistical parameters for several microenvironments and found the time-invariant component (i.e., a component that does not vary with time, often taken as a background level) to be dominant. Thus, there has been some effort to validate the exposure models developed for research purposes.
There have been no systematic attempts, however, to validate either of the exposure models used for regulatory purposes, the Human Exposure Model (HEM) and the National Ambient Air Quality Standard Exposure Model (NEM). The dispersion-model portion of HEM was compared with other simple Gaussian-plume models, and the results were similar. However, neither actual airborne concentrations nor measured integrated exposures to any airborne constituent were compared with the model results to test its utility in estimating individual or population exposures. Comparison of the site-specific model used to evaluate the health impact of arsenic from the ASARCO smelter in Tacoma,
Washington, from the few available data proved to have low marginal accuracy, and arsenic in the exposed human urine samples did not correlate well with estimated exposures, as discussed in Chapter 3. Thus, the effectiveness of these models is essentially unknown, although it will be important to understand their strengths and limitations, including prediction accuracy and the associated uncertainty, when residual risk must be estimated after installation of Maximum Achievable Control Technology (MACT).
When EPA conducts a risk assessment of a hazardous air pollutant, it generally relies on Gaussian-plume models. Gaussian-plume models are inadequately formulated, so inaccuracies appear in predicted pollutant concentrations (e.g., Gaussian-plume models generally are not applicable for nonlinear chemistry or particle dynamics). Furthermore, the inputs to these models are often inaccurate and not directly appropriate for a given application. In practice, application of Gaussian-plume models has not been adequately evaluated, and some evaluations have shown substantial discrepancies. More comprehensive and robust pollutant-transport models (i.e., those more directly applicable to a wider variety of situations) are available, including stochastic Lagrangian and photochemical models, and evaluations have shown good agreement with direct observations. In specific applications, model evaluation (via pollutant monitoring and assessment of model inputs and theory) should be undertaken and ranges of applicability determined. Demonstrations should include, but not be restricted to, showing that the model assumptions reasonably represent physical-chemical behavior of the contaminant, source configuration, and atmospheric dispersion. For environmental conditions for which the performance of Gaussian-plume models are demonstrated to be unsatisfactory, more comprehensive models should be considered; however, their superior performance should be documented and clearly evident when they are considered as an alternative in a risk assessment.
EPA has generally not included population activity, mobility, and demographics and has not adequately evaluated the use of population averages (as used by default in HEM) in its exposure assessments. Exposure models, such as NEM and SHAPE, have been developed to account for personal activity. Population-activity models should be used in exposure assessments; however, their accuracy should be clearly demonstrated before considering them as alternatives to the default approach. Demographics might also play a role in determining risk. Further evaluation of some simple methods (e.g., use of population centroids), compared with more comprehensive tools (e.g., NEM and SHAPE), is warranted, before they are considered in lieu of the default option.
EPA currently uses HEM to screen exposure associated with HAP releases from stationary sources. The HEM-II model uses a standardized EPA Gaussian-plume dispersion model and assumes nonmobile populations residing outdoors at specific locations. The HEM construct is not designed to provide accurate estimates of exposure in specific locations and for specific sources and contaminants when conditions are not represented by the simplified exposure- and dis-
persion-model assumptions inherent in the standard HEM components. Alternative models for transport and for personal activity and mobility can be adopted in an exposure-modeling system to provide more accurate, scientifically founded, and robust estimates of pollutant-exposure distributions (including variability, uncertainty, and demographic information). Those models can be linked to geographic data bases to provide both geographic and demographic information for exposure-modeling systems.
Application of HEM generally does not include noninhalation exposures to hazardous air pollutants (HAPs) (e.g., dermal exposure), but these routes can be important. Modeling systems similar to extensions of HEM have been developed to account for the other pathways. Unless there is good evidence to the contrary, the contribution of alternative pathways of exposure to HAPs should be considered explicitly and quantified in a risk assessment.
Relatively simple models for exposure assessments, such as HEM, can provide valuable information for setting priorities and determining what additional data should be developed. However, exposure estimates that use this model can have large uncertainties (e.g., a factor of 2-10 due to the Gaussian-plume dispersion model used in HEM alone). Furthermore, Gaussian-plume models, in general, have not been validated for pollutants that are reactive and easily transformed to other chemicals such as organic gases (e.g., formaldehyde), particles, and acids (e.g., nitric and sulfuric acids). Multiple exposure routes can add still more uncertainty as to actual exposure. Uncertainty can be used as a tool for assessing the performance of a model like HEM. This is because HEM is based on very simplified descriptions of pollutant dynamics and was designed for use as a screening tool for estimating human exposure via inhalation.
The predictive accuracy and uncertainty associated with the use of the HEM should be clearly stated with each exposure assessment. The underlying assumption that the calculated exposure estimate is a conservative one should be reaffirmed; if not, alternative models whose performance has been demonstrated to be superior should be used in exposure assessment.
Assessment Of Toxicity
The first step in assessing human toxicity based on animal experiments is the extrapolation of observations from studies in rats, mice, monkeys, and other laboratory animals to humans. The extrapolation procedure used in risk assessment to assess the toxicity of a substance is both an intellectual exercise and a tool for making practical decisions. It is based on two assumptions: that the biological response to an external stimulus in one species will occur in a different species that is subjected to the same stimulus and that the biological response is proportional to the size of the stimulus (except that a very small stimulus will often result in only a transient response or no immediate response at all). Those two assumptions are invoked whenever extrapolation from animals to humans
and from high doses to low doses is performed. Cancer and other end points are discussed separately here because considerations related to extrapolation can differ.
Cancer, defined as abnormal and uncontrolled growth, is ubiquitous among higher organisms; it occurs in plants, animals, and humans. In some cases, carcinogens can be identified as physical or chemical agents or self-replicating infectious agents. Many epidemiological studies have documented an association between exposure to particular chemicals and an increased incidence of particular malignancies in humans (Doll and Peto, 1981). Examples are cancers related to exposure to industrial agentssuch as aniline dyes, mustard gas, some metal compounds, and vinyl chlorideand, in the general population, tobacco and tobacco smoke. Perhaps most convicing in this context is the repeated observation that cessation of exposure to a given chemical (e.g., cessation of smoking or introduction of appropriate mitigation or hygienic measures) results in a decrease in cancer incidence. When tested in animal studies, almost all known human carcinogens have been found to produce cancer in other mammals. There are a few exceptions to that rule, e.g. tobacco smoke in laboratory animals. Recent advances in the understanding of basic mechanisms of carcinogenesis, often very similar in laboratory animals and humans, lend credibility to a relationship between animal carcinogenesis and human carcinogenesis, particularly when mutagenicity is involved (OSTP, 1985; Barbacid, 1986; Bishop, 1987); in other cases, advances in the understanding of species-specific mechanisms of carcinogenesis do not support a relationship between humans and specific laboratory animals studied to date (Ellwein and Cohen, 1992). Current long-term carcinogenicity bioassays are conducted with rodents using, among other doses, the highest dose that does not reduce survival as a result of causes other than cancer, known as the maximum tolerated dose (MTD). Information acquired from rodent bioassays conducted at the MTD might yield information on whether a chemical can produce tumors in humans, but it generally cannot provide information on whether it produces tumors through generalized, indirect mechanisms or directly as a result of its specific properties. Mechanistic data could resolve the question of whether it is valid to extrapolate the results of a bioassay to humans (see NRC, 1993b). Current regulatory practice takes the view that in the absence of information to the contrary, animal carcinogens are human carcinogens; however, the data base supporting this assumption is not complete.
Obtaining more information on the biological mechanisms of carcinogenesis, their dose dependence, and their interspecies relevance will permit better and
more valid qualitative and quantitative extrapolations. For example, there is a tendency to give more weight to an observation when it relates chemical exposure to development of malignant tumors and to place less emphasis on an observation that suggests that a given chemical induces benign tumors. It might be an oversimplification to consider one category of abnormal growth as invariably detrimental and another as comparatively harmless. Tumor biology is much more complicated. Most, if not all, bronchial adenocarcinomas will kill when they run their course, whereas subcutaneous lipomas will not; however, excision of a malignant basal cell skin tumor is considered a cure, whereas a benign tumor of the VIIIth cranial nerve or of the pituitary gland can be lethal. Available knowledge on causes of cancer and on the biological behavior of tumors does not permit us to ascertain whether a compound that produces a benign tumor in laboratory animals would be either capable or incapable of producing a malignant tumor in humans. In the absence of information to the contrary, the conservative view equates abnormal growth with carcinogenicity. Circumstances that produce benign tumors in animal systems might have the potential for producing abnormal growth in humans, depending on the mechanism involved. Many benign tumors are most easily produced in animal strains that already have an inherently high spontaneous incidence of such tumors (e.g., liver and lung adenomas in mice and mammary tumors in rats). Studies of the genetic, biochemical, hormonal, and other factors that determine development of such tumors might improve the validity of human risk assessments based on animal studies, and should be pursued more vigorously.
The assumption that the organ or tissue affected by a chemical in animals is also the site of greatest risk in humans should also be made cautiously. It is likely that the site of tumor formation is related to the route of exposure and to numerous pharmacokinetic and pharmacodynamic factors. Each route of exposure might result in carcinogenicity and should be considered separately. It probably is reasonable to assume that in some cases, animal models of carcinogenesis can be used to predict the development of human tumors at specific sites, provided that conditions of exposure are comparable. However, if exposure conditions are not similar, that might not be true. For example, it might well be incorrect to assume that agents that produce sarcomas in laboratory animals after subcutaneous injection will induce sarcomas in humans after inhalation. Animal models can be used to detect potential carcinogenicity; however, extrapolating from animal models to particular human organs is not valid without a great deal of additional mechanistic information, such as information on the effects of exposure route, dose, and many other factors, including the metabolism of the agent in question.
Evaluation of EPA Practice Experience has shown that, in a broad sense, extrapolation from species to species is justifiable (Allen et al., 1988; Crump, 1989; Dedrick and Morrison, 1992). It is prudent to assume that agents that
cause abnormal growth of tissue components in laboratory animals will do so in humans. The animal species (mice and rats) most commonly used in the National Toxicology Program (NTP) to make predictions about human carcinogenesis were selected for convenience, not because they have been demonstrated to predict human risks accurately. For example, the risk of inhaled particles for humans might be underestimated in animal assays that use rats and mice, which are obligatory nose breathers and thus might filter out much of the coarser dust. Conversely, some believe that rodents might overpredict human risk when mechanisms of carcinogenesis that are operative in rodents do not occur in humans (Cohen et al., 1992). It appears that NTP has not seriously explored alternatives to rats and mice in carcinogenesis testing, except perhaps for the use of hamsters in inhalation studies.
In principle, selection of data for estimation of carcinogenic potential from the most sensitive strain or species of animals tested is designed to be conservative; whether it is actually conservative and accurate is unknown. This default assumption contributes to the uncertainty in risk assessment, and research designed to investigate the biological mechanisms of carcinogenesis in both rodents and humans should be vigorously pursued so that more accurate risk assessments can be conducted.
Key terms in quantitative cancer risk characterization are unit cancer risk and potency. As currently estimated by EPA, potency is a statistical upper bound on the slope of the linear portion of a dose-response curve at low doses as calculated with a mathematical dose-response model. The unit cancer risk is based on potency and is an upper-bound estimate of the probability of cancer development due to continuous lifetime exposure to one unit of carcinogen. For airborne agents, that unit is commonly defined as exposure to 1 µg of agent per cubic meter of air over a 70-year lifetime.
Cancer potencies are generally based on dose-response relationships generated from cancer bioassays performed with rodents exposed to doses that are several orders of magnitude greater than those for which risk must be estimated. Bioassays typically include two, and to a lesser extent three or more, doses in addition to controls, and are rarely repeated. Often, positive results are obtained at only one dose. Therefore, for most carcinogens, few unequivocal data points are available for potency calculation. In addition, several assumptions often enter into calculations of potency, such as considerations related to tissue dosimetry, in which metabolism data obtained from different experimental systems and used in PBPK modeling might be used in place of bioassay exposure levels. It is not unusual for potency estimates based on the same bioassay data to vary substantially from one risk assessment to another, depending on these additional assumptions and the dose-response model used. Accordingly, potency values
are often fraught with as much uncertainty as other aspects of quantitative risk assessment.
To estimate cancer potencies, EPA currently uses the linearized multistage model (EPA, 1987a). This model uses what is essentially an empirical curvefitting procedure to describe the relationship between bioassay dose and response and to extrapolate the relationship to exposures below the experimental range. A statistical upper bound on the slope of the low-dose linear portion of the curve is considered to represent an upper bound on a chemical's carcinogenic potency. The multistage model is based on a theory of carcinogenic mechanism proposed in the early 1950s by Armitage and Doll. In essence, normal cells in a target organ are envisioned as undergoing a sequence of irreversible genetic transformations culminating in malignancy. Each transformation to a new stage is assumed to occur at some nonzero background rate. Exposure to a carcinogen is presumed simply to increase one or more of the transformation rates in proportion to the magnitude of the exposure (technically, dose at the target site). However, actual exposure circumstances are more complicated than can be briefly described here. No other potential effects of exposure or alternative mechanisms of carcinogenesis, such as induced cell proliferation or receptor-mediated alterations in gene expression, are included in the Armitage-Doll model. One important consequence of this assumption about how exposure influences transformations is the linearity of risk at low doses, i.e., risk increases and decreases in direct proportion to the delivered dose. That result arises in part because the model assumes that the number of cells at risk of undergoing the first transformation (the susceptible target-cell population) is constant and independent of age, magnitude of exposure, and exposure duration. Thus, the normal processes of cell division, differentiation, and death are not taken into account by the model.
Another cancer dose-response model that has been developed to estimate cancer potencies for risk assessment, but that is not used routinely for regulatory purposes, is the two-stage model. The two-stage model was developed by Moolgavkar, Venzon, and Knudson (Moolgavkar and Venzon, 1979; Moolgavkar and Knudson, 1981; Moolgavkar, 1988; Moolgavkar et al., 1988; Moolgavkar and Luebeck, 1990) and postulates that two critical mutations are required to produce a cancer cell. The model presupposes three cell compartments: normal stem cells, intermediate cells that have been altered by one genetic event, and malignant cells that have been altered by two genetic events. The size of each compartment is affected by cell birth, death, and differentiation processes and by the rates of transition between cell compartments. The model can accommodate some current concepts regarding the roles of inactivated tumor-suppressor genes and activated oncogenes in carcinogenesis. Unlike the Armitage-Doll model, it can explicitly account for many processes considered important in carcinogenesis, including cell division, mutation, differentiation, and death and the clonal expansion of populations of cells. Some knowledge of a chemical's mechanism
of action and dose-response data for that mechanism are required to apply the two-stage model, however, and such data on most chemicals are scanty.
Potency estimates are generally based on the assumption that exposure to a particular agent occurs over a 70-year lifetime under constant conditions. That assumption is not likely to apply to the entire exposed population, however, and might produce a conservative estimate of risk. Use of a single potency number implies that the biological response of concern, such as carcinogenesis, depends only on total dose and therefore is independent of dose rate (the quantity of the agent received per unit time). this assumption might be invalid in some cases; for example, studies of low-energy-transfer radiation carcinogenesis show that low-dose-rate exposures are less effective than high-dose-rate exposures (NRC, 1990b). Other studies of radiation have differing results.
Potency estimates provide a means for comparing animal data with human data and for ranking potential carcinogens. Analysis of data available for some 20 known human carcinogens has shown that, in general, potency values derived from carcinogenicity bioassays in animals agree reasonably well with values calculated for humans from epidemiological studies (Allen et al., 1988). However, ranking of chemicals according to potency should not necessarily be used to make conclusions on the ranking of the corresponding hazards or risk. It is only multiplication of potency (unit risk) with exposure (dose) that yields an estimate of risk. Where there is no exposure, there might be little practical need for information on potency.
Evaluation of EPA Practice The selection by EPA of a mathematical model to estimate potency is a critical step in quantitative risk assessment, in which alternative assumptions can lead to large differences in estimated risks. Such a model provides explicit, objective rules for extrapolating from the risks observed in controlled, high-dose laboratory experiments to those associated with the far lower doses that people might receive through inhalation. However, all dose-response models are simplified characterizations of the underlying biological reality. That is due, in part, to the incomplete scientific understanding of toxic mechanisms and to the requirement that the models be usable in a broad array of cases.
The challenge for EPA is to incorporate the expanding knowledge of mechanisms into the design of extrapolation models. The models would then depict more accurately the dose-response relationship at the low doses that are of concern to regulators, but are too low for toxic effects to be directly observed in whole animal studies or, often, any feasible human studies. The challenge can be illustrated by examining the simplified mechanistic assumptions that are included in the multistage model used by EPA in light of new understanding of mechanisms, which is not included in that model.
As long as exposure to a chemical has no substantial effect on cell processes other than genetic change, one would not expect the exclusion of these processes
from the multistage model to compromise the resulting cancer risk estimates. The model would likely be appropriate for ''direct-acting" carcinogensones, such as radiation, that act by directly attacking cellular DNA and thereby causing genetic transformation. In recent years, however, it has become apparent that many substances alter the pharmacodynamics of cells and can be carcinogenic by mechanisms that do not involve direct covalent interaction with DNA at all, but involve indirectly caused alterations in gene expression. One consequence of such a change could be altered cellular dynamics in the target organ. Because genetic transformations can occur spontaneously, many target organs contain a background of continuing steps in the multistep carcinogenic process. Exposure to a chemical could augment those background carcinogenic processes by simply increasing the pool of cells that are susceptible to further transformation. Such augmentation might occur as a regenerative response to cellular injury among surviving cells or to the cell-killing that occurs after exposure to highly toxic substances. The augmentation of background carcinogenic processes could also occur as an indirect response to alterations in hormonal balances induced by exposure or as a response to a directly mitogenic substance, i.e., one that stimulates normal cell division. By increasing the rate of cell division, such substances can increase the overall probability of generating a mutation, even though they have no direct effect on the transformation probability per cell division.
Similarly, exposure to substances classified as nongenotoxic carcinogens or "promoters" can create physiologic conditions within a target organ that favor the growth of "initiated" cells, i.e., cells that have already sustained at least one irreversible change from normal cells. Clonal expansions of initiated cell populations can be induced by exposure to promoters, thus increasing the probability of cell transformation and malignancy without directly affecting DNA.
Critical to effective regulatory use of biologically based models such as the two-stage model is accurate determination of the dose-response and time-response relationships for agent-induced cell death, differentiation, transformation and division, if any, in target tissues. Those processes might exhibit threshold-like dose-response relationships, in contrast to the presumed low-dose linear response of conventional multistage model transformation rates. Conversely, better understanding might show supralinear relations. Thus, use of a two-stage pharmacodynamic model might predict low-dose risks that are lower or higher than those predicted by the linearized multistage model.
Successful use of biologically based models in the risk assessment process will require a greater variety and amount of information on and understanding of carcinogenic mechanisms than is typically available for most chemicals. In the near term, such a data-intensive approach might be applied only to substances that have great economic value. In the long run, as knowledge and experience accrue, the use of models that incorporate relevant pharmacodynamic data should become more routine. Those models, used in conjunction with pharmacokinetic
models for determining delivered doses, will increase the accuracy of quantitative risk assessment. For that reason, EPA should intensify their incorporation into the cancer-risk assessment process. For more information on two-stage models, see the NRC (1993c) report on this topic.
As noted in Chapter 4 (Table 4-1), EPA, following the lead of the International Agency for Research on Cancer (IARC), provides an evaluation of the available evidence of carcinogenicity of individual substances. The direction and strength of evidence are summarized by a letter: A, B1, B2, C, D, or E (see Table 4-1). The assignment of a substance to a class (actually, the assignment of available evidence to a class) depends almost entirely on epidemiological evidence and evidence derived from animal studies. The evidence for each of these is classified by EPA as "sufficient," "inadequate," or ''limited." Some other types of experimental evidence (e.g., on genotoxicity) might sometimes play a role in the classification, but the epidemiological and bioassay data are generally of overriding importance.
The EPA classification scheme is intended to provide information on hazardnot to provide information about potential human risk; the latter cannot be assessed without the additional evaluation of dose-response and exposure information. The assignment of evidence to a class is intended by EPA only to suggest how convinced we should be that a substance poses a carcinogenic hazard to people. The classification is thus meant to depict the state of our knowledge regarding human carcinogenic hazard.
The difference between hazard and risk needs to be further emphasized here. As conceived in EPA's current four-step approach, identifying a substance as a possible, probable, or known carcinogenic hazard to humans means only that, under some unspecified conditions, the substance could cause excess cancers to occur in people. Evaluation of potency and of the exposures incurred by specific populations provides the information needed to assess the probability (risk) that the substance will cause cancer in the specified population. EPA developed the categorization scheme because it believes that, in addition to the risk estimate, decision-makers should have some sense of the strength of the evidence supporting identification of a substance as a carcinogen. There has been some confusion regarding the terms strength of evidence, as used by EPA, and weight of evidence. Some interpret strength to only describe the degree of positive evidence and weight to apply when all evidencepositive, negative, and evidence on relevance to humansis considered. The committee adopts those uses of the terms. In many cases, substances for which the evidence of human carcinogenicity is strong (classification A) will, in specific circumstances, pose relatively small risks (because of low potency or low exposure), whereas substances for which the evidence of human carcinogenicity is much less
convincing (classification B2, for example) are likely to pose large risks (because of high potency or exposure). The typical question faced by a decision-maker is whether, for example, more restrictive controls should be placed on substances in class A that pose relatively small risks or on substances in lower classes that pose equal or greater risks. Stated in other terms, the issue concerns the justification for placing different degrees of regulatory restriction on substances that pose equal risks but which are differently classified. Should we control more carefully substances for which the state of our knowledge regarding human carcinogenicity is highly certain than we do substances for which the state of our knowledge is relatively weak? Although EPA includes a strength-of-evidence classification with each risk characterization, there is no clear indication of whether and how the classification influences ultimate agency decision-making.
Evaluation of EPA Practice Does EPA's approach accurately portray the state of knowledge regarding human carcinogenic hazard? It is certainly the case that the state of scientific knowledge regarding the potential for various substances to contribute to the development of human cancers is highly variable among them. It also seems reasonable that risk assessors should have available a means to express that knowledge in a relatively simple way. It is for this reason that any such scheme should be examined carefully to ensure that it expresses as closely as possible what it is intended to express and that it summarizes all the relevant and appropriate findings derived from data, with no extraneous data.
Because two conclusions (that the substance might pose a carcinogenic hazard to humans under some conditions of exposure and that animal data can be unconditionally extrapolated to humans) are implicitly contained in the current EPA classification system, it could be conceived as misleading in some cases in which the scientific evidence does not support one or more of the typical default assumptions (for example, on route-to-route, high to low dose, or animal-to-human extrapolation). Such a situation could arise when, for example, data are available to show clearly and convincingly that some types of animal tumors would not likely to be produced in humans or when mechanistic data show that results obtained at high doses are not relevant to low doses. Although different in kind, classification of substances at EPA's D or E level could also be misleading. If, for example, a substance were classified at level E on the basis of negative chemical bioassays in two species, but additional data suggested that neither animal species metabolized the substance in the way humans did, then the absence of potential human hazard would be improperly inferred.
The present EPA system might also be misleading because it is too susceptible to "accidents of fate." The carcinogenicity of a substance that happens to cause very rare tumors in humans (e.g., vinyl chloride, which causes angiosarcoma of the liver) is much easier to detect in epidemiological studies than is the carcinogenicity of a substance that causes very common human cancers, such as colorectal carcinoma. Although the available animal data on the latter substance
might be very convincing with respect to carcinogenicity and there might be every reason to believe that it will be as hazardous to humans as the former (i.e., the "known" human, category A carcinogen), it will usually end up in category B, which may be interpreted as suggesting a lesser likelihood of hazard. Such a distinction might be due only to differences in our ability to detect the carcinogenic properties of substances that produce different types of cancers, and not to any true differences in human hazard.
Possible Improvements in EPA Practice Before turning to the issue of improvements in EPA's carcinogen classification scheme, the committee first considered whether any such scheme should be used at all. As noted above, the current scheme can easily be misinterpretedunfamiliar users might be led to believe that all substances in a specific category are equally hazardous or non-hazardous. Moreover, it is impossible to capture in any simple categorization scheme the completeness and complexity of the information that supports scientific judgments about the nature of a human carcinogenic hazard and the conditions under which it can exist. The quality, nature, and extent of such information vary greatly among carcinogens, and it is not an exaggeration to state that every substance is unique with respect to the scientific evidence bearing on its hazards.
It is for these reasons that the committee strongly recommends that EPA include in each hazard-identification portion of a risk assessment a narrative evaluation of the evidence of carcinogenicity. Such a narrative should contain at least the following:
Such a narrative seems to be the best way to describe the type of information typically available to evaluate carcinogenic hazards and should be used by EPA when it undertakes full-scale risk assessments.
Although the committee agreed that such narrative descriptions are the preferred way to express scientific evidence, it also recognized that there are important practical needs for some type of simple categorization of evidence. The committee recognized, for example, that many regulatory actions or plans for action require, for practical reasons, the creation of lists of carcinogens and that narrative statements are not likely to be included in such lists. Without some simple categorization scheme, such lists are likely to be completely undiscrimi-
nating with respect to the potential human hazards of the substances on them. When any such lists are used, for example, to create priorities for full risk assessment or for some type of regulation, the results could be seriously misleading to decision-makers and the public.
As already noted, however, the committee believes that the current EPA categorization scheme is inadequate. Substantial improvements could be made if the scheme incorporated not only "strength-of-evidence" information, but also some of the information we have called for in the narrative description.
It will not be easy to create a categorization scheme for carcinogens that incorporates both strength of evidence and the two "relevance" considerations. Moreover, EPA is not the only agency for which such a categorization scheme is useful. Indeed, there is a strong need for international agreement on a single classification. It would be highly desirable for EPA to convene a workshop on the matter and involve other agencies of federal and state governments, IARC, and other national and international bodies to develop a scheme that would have worldwide acceptance. IARC has recently moved to include information on mechanisms of carcinogenic action in its evaluation of carcinogens. Such an effort seems essential to eliminating the deficiencies of current schemes and the confusion that exists because of differences in approaches to categorization around the globe.
The committee suggests the scheme in Table 7-1 as a draft or prototype to avoid the difficulties of the current EPA scheme. The proposal in this table incorporates both strength-of-evidence considerations (as in the current EPA and IARC schemes) and "relevance" information, as specified in the two points mentioned above. The example also reduces the susceptibility of current classification schemes to the "accidents of fate" that can artificially influence the availability of evidence for different substances.
The classification in Table 7-1 takes place in two steps. In Step 1, a classification is made (into Categories I-IV) according to the two relevance criteria mentioned above. Note also that Category I is used for all substances on which positive carcinogenicity data are available and on which there are no substantive data to support conclusions that would place them in Category II or IIIi.e., Category I is the default option that applies when data related to relevance are weak or absent. Step 2 of the classification involves evaluation of the strength of the available evidence.
Such a categorization scheme can provide guidance on priorities for both risk assessment and a variety of regulatory efforts. Substances placed in Category I, for example, would generally receive greater attention with respect to their carcinogenic properties than those in Category II; and within Category I, the nature of the attention received might be further influenced by the strength of available evidence (i.e., Ia › b › c › d). A Ia substance, for example, might be a prime candidate for immediate and stringent regulation, whereas a Id substance might be a prime candidate for high-priority information-gathering.
(Table contimues on following page.)
Placement of a substance in Category II does not mean that regulatory efforts should not be undertaken. For example, there might be reason to determine whether potentially risky conditions of exposure exist in any situations. The categories do not influence ultimate actions, but only priorities and the relative, inherent degrees of concern associated with different substances.
Although the committee recommends that any categorization scheme adopted by EPA include the elements associated with the above example, it also recognizes that there might be other ways to capture and express the same information. Some members suggested, for example, that substances listed as carcinogens simply be accompanied by a set of codes that specify both the strength of supporting evidence and the conditions and limitations, if any, that might pertain to the interpretation of that evidence (e.g., an asterisk next to a chemical might mean "assumed to be carcinogenic in humans only when inhaled").
Other End Points of Toxicity
The standard approach to regulating chemicals that are associated with non-cancer end points of toxicity has been based on the theory of homeostasis. According to that theory, biological processes that maintain homeostasis exist in an interdependent web of adaptive responses that automatically react to and com-
pensate for stimuli that alter optimal conditions. An optimal condition is maintained as long as none of the stimuli that regulate it is pushed beyond some limit or "threshold." For the purposes of regulation, end points of toxicity other than cancer are lumped together under a toxicological paradigm that presumes a dose threshold for any chemical capable of inducing an adverse effect: there is an exposure below which the adverse effect would not be expected to occur. The current approachno-observed-adverse-effect level (NOAEL) and uncertainty factoris only a semiquantitative method designed to prevent exposures that are likely to result in an adverse effect, not a mechanistically based quantitative method for assessing the likely incidence and severity of effects in an exposed population. Moving beyond the current simplistic regulatory method will require, as is the case for carcinogenesis, a greater understanding of the mechanisms of disease causation, of pharmacokinetics, and of interindividual variation in each. Such improved understanding will permit final abandonment of the obsolete "threshold versus nonthreshold" paradigm for regulating carcinogens and noncarcinogens.
Evaluation of EPA Practice The methodology now used by EPA to regulate human exposure to noncarcinogens is in a state of flux. That used by EPA in the past was not sufficiently rigorous. It was not based on evaluations of biological mechanisms of action or on differences in susceptibility between and within exposed populations. In addition, it incorporated risk management, not scientifically based risk-assessment techniques; and it did not permit incorporation of newer and better scientific information as it was obtained. The NOAEL-uncertainty factor approach might be adequate for the immediate future as a screening technique and for setting priorities, but its empirical and scientific basis is meager. EPA appears to be continuing to pursue simplistic, empirical techniques by adding to the list of uncertainty factors in use.
Impact of Pharmacokinetic Information in Risk Assessment
One of the critical steps in risk assessment is the selection of the measure of exposure to be used in defining the dose-response relationship. It is common today to calculate exposure on the basis of the "administered dose" of a chemicalthe dose or amount fed to animals in toxicity studies or ingested by humans in food or water or inhaled in air. That dose can usually be accurately measured.
The dose that is of interest for risk assessment, however, is the amount of the biologically active form of a substance that reaches specific target tissues. This target-tissue dose is the "delivered dose," and its biologically active derivative, if any, is the "biologically active dose." The biologically active dose causes the events that culminate in toxicity to target cells and organs, and ideally it is used as the basis for defining the dose-response relationship and for assessing risk. The science of pharmacokinetics seeks to replace the current operating
assumptionthat administered dose and delivered dose are always directly proportional and that the administered dose is therefore an appropriate basis for risk assessmentwith direct, accurate information about the delivered or biologically active dose.
Pharmacokinetic models are used to study the quantitative relationship between administered and delivered or biologically active doses. The relationship reflects the spectrum of biological responses to exposure, from physiological responses of a whole organism to biochemical responses within specific cells of a target organ. Pharmacokinetic models explicitly characterize biologic processes and permit accurate predictions of the doses of an agent's active metabolites that reach target tissues in exposed humans. As a consequence, the use of pharmacokinetic models to provide inputs to dose-response models reduces the uncertainty associated with the dose parameter and can result in more accurate estimates of potential cancer risks in humans.
The relationship between administered and delivered doses often differs among individuals: because of such differences, some people might be acutely sensitive and others insensitive to the same administered dose. The relationship between administered and delivered doses can also differ between large and small exposures and between continuous and intermittent exposures, and it can differ among species, some species being more or less efficient than humans in the transport of an administered dose to tissues or in its metabolism to a biologically active or inactive derivative. Those differences in the relationship between administered and delivered or biologically active doses can dramatically affect the validity of the predictions of dose-response models; failure to incorporate the difference into the models contributes to the uncertainty in risk assessment.
Differences between administered and biologically active doses occur because specialized organ systems intervene to modulate the body's responses to inhaled, ingested, or otherwise absorbed toxic materials. For example, the liver can detoxify materials circulating in the blood by producing enzymes to accelerate chemical reactions that break the materials down into harmless components (metabolic deactivation, or "detoxification"). Conversely, some substances can be activated by metabolism into more toxic reaction products. Activation and detoxification might occur at the same time and can occur in the same or different organ systems.
Furthermore, the rates at which activation and detoxification take place might have natural limits. Metabolic deactivation might thus be overwhelmed by high exposure concentrations, as seems to be the case with formaldehyde: the biologically active dose and the risk of nasal-tumor development rise rapidly in exposed rats only at high airborne concentrations. The assumption of a simple linear relationship between administered and biologically active doses of formaldehyde is believed by many to result in exaggerated estimates of cancer risk at low exposure concentrations. In contrast, metabolic activation of vinyl chloride occurs more and more slowly with increasing administered dose, because a crit-
ical enzyme system becomes overloaded; the biologically active dose and the resulting liver-tumor response increase more and more slowly as the administered dose increases. The assumption of a linear relationship between administered and delivered doses in the case of vinyl chloride could result in underestimation of the cancer risk associated with low doses. These examples illustrate how using pharmacokinetic models can reduce the uncertainty in risk estimation by modifying the dose values used in dose-response modeling to reflect the nonlinearity of metabolism.
Although most pharmacokinetic models are derived from laboratory-animal data, they provide a biological framework that is useful for extrapolating to human pharmacokinetic behavior. Anatomical and physiological differences among species are well documented and easily scaled by altering model parameters for the species in question. This aspect of pharmacokinetic modeling reduces the uncertainty associated with extrapolating from animal experiments to human cancer risk. For example, considerable effort has been devoted to the development of pharmacokinetic models for methylene chloride, which is considered a rodent carcinogen. The model was initially developed on the basis of rat data, then scaled to predict human behavior. Predictions in humans were compared with published data and with the results of experiments in human volunteers. The model was shown to predict accurately the pharmacokinetic behavior of inhaled methylene chloride and its metabolite carbon monoxide in both species (Andersen et al., 1991). Use of a particular pharmacokinetic model for methylene chloride in cancer risk assessment reduces human risk estimates for exposure to methylene chloride in drinking water by a factor of 50-210, compared with estimates derived by conventional linear extrapolation and body surface-area conversions (Andersen et al., 1987). Other analyses show different results (Portier and Kaplan, 1989). What pharmacokinetic models for methylene chloride do not predict, however, is whether methylene chloride is a human carcinogen. Thus, although use of the model might improve confidence in dose estimation by replacing the conventional scaling-factor approach, it cannot predict the outcome of exposure in humans.
Another way to reduce uncertainty would be to use pharmacokinetic models to extrapolate between exposure routes. If information on the disposition of an agent were available only as a result of its inhalation in the workplace, for example, and a risk assessment were required for its consumption in drinking water, appropriate models could be constructed to relate the delivered dose after inhalation to that expected after ingestion. To the committee's knowledge, pharmacokinetic models have not yet been used in a risk assessment for such regulatory purposes.
Failure to include pharmacokinetic considerations in dose-response modeling contributes to the overall uncertainty in a risk assessment, but uncertainty is associated with their use as well. This uncertainty comes from several sources. First, uncertainty is associated with the pharmacokinetic model parameters them-
selves. Parameter values are usually estimated from animal data and can come from a variety of experimental sources and conditions. Quantities can be measured indirectly, they can be measured in vitro, and they can vary among individuals. Different data sets might be available to estimate values of the same parameters. Hattis et al. (1990) evaluated seven pharmacokinetic models for tetrachloroethylene (perchloroethylene) metabolism and found that their predictions varied considerably, primarily because of the differences in choice of data sets used to estimate values of model parameters. Moreover, analogous parameter values are also needed for humansalthough some values, such as organ weights, are amenable to direct measurement and do not vary widely among humans, others, such as rate constants for enzymatic detoxification and activation, are both difficult to measure and highly variable.
Second, there is uncertainty in the selection of the appropriate tissue dose available to model. For example, information might be available on the blood concentration of an agent, on its concentration in a tissue, or on the concentrations of its metabolites in the tissue. Tissue concentrations of one metabolite might be inappropriate if another metabolite is responsible for the biologic effects. Total tissue concentrations might not accurately reflect the biologically active dose if only one type of cell within the tissue is affected.
Choice of an appropriate measure of tissue dose can have an effect on cancer risk estimates. Farrar et al. (1989) considered three measures of tissue dose for tetrachloroethylene: tetrachloroethylene in liver, tetrachloroethylene metabolites in liver, and tetrachloroethylene in arterial blood. Using EPA's pharmacokinetic model for tetrachloroethylene and cancer bioassay data in mice, they found that human cancer risk estimates varied by a factor of about 10,000, depending on the dose surrogate used. Interestingly, the estimates bracketed that obtained in the absence of any pharmacokinetic transformation of dose as shown in Table 7-2.
This example illustrates the variation in dose and risk estimates that can be obtained under different assumptions, but it does not help to evaluate of the
validity of any of the estimates in the absence of knowledge of the biologic mechanism of action of tetrachloroethylene as a rodent carcinogen and in the absence of knowledge of whether it is a human carcinogen. Although the dose of metabolites to the liver appears to be the most appropriate choice of dose surrogate, there is a high degree of nonlinearity between this dose and the tumor incidence in mice. The nonlinearity indicates either that this dose surrogate does not represent the actual biologically active dose for the particular sex-species combination analyzed by these authors or that the model does not adequately describe tetrachloroethylene pharmacokinetics.
The science of pharmacokinetics seeks to gain a clear understanding of all the biological processes that affect the disposition of a substance once it enters the body. It includes the study of many active biological processes, such as absorption, distribution, metabolism (whether activation or deactivation), and excretion. Accurate prediction of delivered and biologically active doses requires comprehensive, physiologically based computer models of those linked processes. Because the science of pharmacokinetics aims to replace general assumptions with a more refined model based on the specific relationship between administered and delivered or biologically active doses, its use in risk assessment will help to reduce the uncertainties in the process and the related bias in risk estimation. Advances will come slowly and at considerable cost, because detailed knowledge of the biologically active dose of many materials must be acquired before generalizations can be confidently exploited. Nevertheless, EPA increasingly incorporates pharmacokinetic data into the risk-assessment process, and its use represents one of the clearest opportunities for improving the accuracy of risk assessments.
Developing improved methods for assessing the long-term health impacts of chemicals will depend on improved understanding of the underlying science and on more effective coordination, validation, and integration of the relevant environmental, clinical, epidemiological, and laboratory data, each of which is limited by various kinds of error and uncertainty. Goodman and Wilson (1991) have demonstrated that, for 18 of 22 chemicals studied, there is good agreement between risk estimates based on rodent data and on epidemiologic studies. Their quantitative assessment, which can be compared to the Ennever et al. (1987) qualitative evaluation of the same issue, provides stronger evidence that current risk-assessment strategies produce reasonable estimates of human experience for known human carcinogens (Allen et al., 1988).
The reliability of a given health-risk assessment can be determined only by evaluating both the validity of the overall assessment and the validity of its components. Because the validity of a risk assessment depends on how well it predicts health effects in the human population, epidemiologic data are required
for testing the predictions. To the extent that the requisite data are not already available, epidemiologic research will be necessary. An example is the study in which the New York Department of Health conducted biological monitoring for arsenic in schoolchildren (New York Department of Health, 1987). The researchers compared their findings with the arsenic concentrations predicted by the risk assessment conducted by EPA. The good agreement between the estimates and actual urinary arsenic concentrations in the children provided support for the EPA risk model.
The committee believes that substantial research is warranted to validate methods, models, and data that are used in risk assessment. In some instances the magnitude of uncertainty is not well understood, because information on the accuracy of the prediction process for each model used in risk assessment is insufficient. We also note that the uncertainties tend to vary considerably; for example, uncertainties are relatively low for estimation of population characteristics, compared with those associated with extrapolation from rodents to human beings.
The quality of risk analysis will improve as the quality of input improves. As we learn more about biology, chemistry, physics, and demography, we can make progressively better assessments of the risks involved. Risk assessment evolves continually, with re-evaluation as new models and data become available. In many cases, new information confirms previous assessments; in others, it necessitates changes, sometimes large. In either case, public confidence in the process demands that EPA make the best judgments possible. That an estimate of risk is subject to change is not a criticism of the process or of the assessors. Rather, it is a natural consequence of increasing knowledge and understanding. Re-evaluating risk assessments and making changes should be expected, embraced, and applauded, rather than criticized.
Findings And Recommendations
The following is a compilation of findings and recommendations related to evaluation of methods, data, and models for risk assessment.
Predictive Accuracy and Uncertainty of Models
Various methods and models are available to EPA and other organizations for conducting emission characterization, exposure assessment, and toxicity assessments. They include those used as default options and their corresponding alternatives, which represent deviations from the defaults. The predictive accuracy and uncertainty of the methods and models used for risk assessment are not clearly understood or fully disclosed in all cases.
EPA does not have a set of guidelines for emission characterization to be used in risk assessment.
EPA does not adequately evaluate the uncertainty in the emission estimates used in risk assessments.
EPA has worked with outside parties to design emission characterization studies that have moved the agency from crude to more refined emission characterization.
In its regulatory practice, EPA has relied on Gaussian-plume models to estimate the concentrations of hazardous pollutants to which people are exposed. However, Gaussian-plume models are crude representations of airborne transport processes; because they are not always accurate, they lead to either underestimation or overestimation of concentrations. Stochastic Lagrangian and photochemical models exist, and evaluations have shown good agreement with
observations. Also, EPA has typically evaluated its Gaussian-plume models for release and dispersion of criteria pollutants from plants with good dispersion characteristics (i.e., high thermal buoyancy, high exit velocity, and tall stacks). EPA has not fully evaluated the Gaussian-plume models for hazardous air pollutants with realistic plant parameters and locations; thus, their potential for underestimation or overestimation has not been fully disclosed.
EPA has not adequately evaluated HEM-II for estimation of exposures, and prior evaluations of exposure models have shown substantial discrepancies between measured and predicted exposures, i.e., yielding under prediction of exposures.
EPA has not previously used population activity, population mobility, and demographics in modeling exposure to hazardous air pollutants and has not adequately evaluated the effects of assuming that the population of a census enumeration district is all at the location of the district's population center.
EPA uses the Human-Exposure Model (HEM) to evaluate exposure associated with hazardous air-pollutant releases from stationary sources. This model generally uses a standardized EPA Gaussian-plume dispersion model and assumes nonmobile populations residing outdoors at specific locations. The HEM construct will not provide accurate estimates of exposure in specific locations and for specific sources and contaminants where conditions do not match the simplified exposure and dispersion-model assumptions inherent in the standard HEM components.
Assessment of Toxicity
Extrapolation from Animal Data for Carcinogens
EPA uses laboratory-animal tumor induction data, as well as human data, for predicting the carcinogenicity of chemicals in humans. It is prudent and reasonable to use animal models to predict potential carcinogenicity; however, additional information would enhance the quantitative extrapolation from animal models to human risks.
The location of tumor formation in humans is related to route of exposure, chemical properties, and pharmacokinetic and pharmacodynamic factors, including systemic distribution of chemicals throughout the body. Thus, tumors might be found at different sites in humans and laboratory animals exposed to the same chemical. EPA has accepted evidence of carcinogenicity in tissues of laboratory animals as evidence of human carcinogenicity without necessarily assuming correspondence on a tumor-type or tissue-of-origin basis. EPA has extrapolated evidence of tumorigenicity by one route to another route where route-specific characteristics of disposition of the chemical are taken into account. EPA has traditionally treated almost all chemicals that induce cancer in a similar manner, using a linearized multistage nonthreshold model to extrapolate from large exposures and associated measured responses in laboratory animals to small exposures and low estimated rates of cancer in humans.
Extrapolation of Animal Data on Noncarcinogens
EPA uses a semiquantitative NOAEL-uncertainty factor approach to regulating human exposure to noncarcinogens.
Classification of Evidence of Carcinogenicity
EPA's narrative descriptions of the evidence of carcinogenic hazards are appropriate, but a simple classification scheme is also needed for decision-making purposes. The current EPA classification scheme does not capture information regarding the relevance to humans of animal data, any limitations regarding the applicability of observations, or any limitations regarding the range of carcinogenicity outside the range of observation. The current system might thus understate or overstate the degree of hazard for some substances.
EPA uses estimates of a chemical's potency, derived from the slope of the dose-response curve, as a single value in the risk-assessment process.
Although EPA routinely cites available human evidence, it does not always rigorously compare the quantitative risk-assessment model based on rodent data with available information on molecular mechanisms of carcinogenesis or with available human evidence from epidemiological studies.