A recurring issue in quantitative risk assessment and quantitative risk characterization is the aggregation (and disaggregation) of separate but related causes and effects of risk. Questions about the aggregation of causes or agents differ somewhat from questions about the aggregation effects or end points, but the similarities are great enough for us to treat them together in this chapter. For example, people may be exposed to mixtures of compounds from a single stack, and each compound may be associated with an increase in the degree or probability of occurrence of one or more toxic end points; the situation can be further complicated by questions about synergy. In contrast, dose-response data are often available only on single end points in response to doses of single agents. How should we characterize and estimate the potential aggregate toxicity posed by exposure to a mixture of toxic agents?
The aggregation problem is simplified when all end points of concern are believed to have dose-response thresholds or no-adverse-effect levels. Under this restriction, "acceptable," "allowable," or "reference" doses are typically calculated by dividing empirically determined threshold estimates (such as no-observed-adverse-effect levels, NOAELs) by appropriate safety or uncertainty factors (Dourson and Stara, 1983; Layton et al., 1987; Barnes and Dourson, 1988; Lu, 1988; Shoaf, 1991). The risk-management goal for mixed exposures is generally to avoid exposures that exceed any of the relevant thresholds, while taking into account the possible joint effects of multiple agents. One strategy that has been implemented in environmental and occupational settings is to en-
sure that the sum of all the ratios of incurred dose to acceptable dose relevant to a given end point total less than 1 (NRC, 1972a, 1989; OSHA, 1983; ACGIH, 1977, 1988; EPA, 1987a, 1988g; Calabrese, 1991; Pierson et al., 1991). That approach is based on an assumption that doses of different agents can be treated as roughly additive with regard to inducing the end point; this assumption is reasonably consistent with much of the experimental evidence on the joint actions of chemicals in mixtures.
Among the key problems associated with the general strategy is that the procedures currently used for defining acceptable exposures to systemic toxicants are rather crude. Proposals to incorporate more quantitative treatment of data and to focus on risk prediction without reference to thresholds (e.g., Crump, 1984; Dourson et al., 1985; Dourson, 1986) have not been widely adopted. The additivity assumption for systemic toxicants further complicates the crude approaches taken to identifying safe intakes of the components of a complex mixture. As an Environmental Protection Agency (EPA) technical support document (EPA, 1988g) comments, this use of the additivity assumption implies that,
as the acceptable level is approached or exceeded, the level of concern increases linearly … and in the same manner for all mixtures [which is incorrect, because the estimates used to derive such recommended acceptable levels] do not have equal accuracy or precision, and are not based on the same severity of toxic effect. Moreover, slopes of dose-response curves in excess of [such levels] in theory are expected to differ widely. The determinations of accuracy, precision or slope are exceedingly difficult because of the general lack of toxicity data.
Despite its drawbacks, the crude additivity approach to the problem of aggregation of potential threshold effects has had relatively straightforward and uncontroversial regulatory applications.
Much more debate has focused on quantitative risk-assessment methods for end points assumed not to have threshold dose-response relationships, such as cancer. Particularly with regard to environmental exposures to multiple chemicals, risk-management decisions (e.g., cleanup criteria) tend to be driven by the estimated low-dose risk associated with exposure to materials that lead to assumed nonthreshold end points. This chapter focuses on aggregation of different risks and different types of risk attributable to integrated, multiroute exposure to multiple chemicals that are assumed to have nonthreshold effects.
Any comprehensive assessment of health risk associated with environmental exposure to any particular compound must consider all possible routes by which people might be exposed to that compound, even if expected applications in risk management are limited to some particular medium, such as air, or partic-
ular source generator or category, such as a coke-oven facility. That is because compounds present in one environmental medium might be transferred to another at any time before exposure. The major routes of exposure are inhalation, ingestion, and dermal absorption. In the context of environmental exposures, inhalation pertains to uptake of compounds present in respired air during rest or activity both indoors and outdoors; ingestion refers to gastrointestinal absorption of compounds that are intentionally or unintentionally present in any ingested material, including water, liquid foods, mother's milk, solid foods (including crops and game), and soil; and dermal absorption refers to percutaneous uptake of compounds deposited on skin, including those present in water during showering, bathing, or recreational swimming. Assessments of exposure to a substance from a given source must account for all potentially important routes by which the substance might come into contact with people (or environmental biota, if an ecological impact assessment is being undertaken). For example, mercury emitted into air from an industrial smoke stack might be inhaled by nearby residents, but might pose an even greater health risk by the ingestion of bioconcentrated mercury in fish that are caught locally after mercury from the stack plume has been deposited onto lake water.
EPA has given the issue of integrated multiroute exposure considerable attention in the context of risk-assessment guidance for Superfund-related regulatory compliance (EPA, 1989a). For example, EPA suggested that assessment of the environmental fate and transport of compounds in ambient air address a range of issues as diverse as volatilization and occurrence in wild game (EPA, 1988h, 1989a,c,d,e). Additional information on multimedia transport and multiroute exposure assessments is available (Neely, 1980; Neely and Blau, 1985; Cohen, 1986; McKone and Layton, 1986; Allen et al., 1989; Cohen et al., 1990; McKone and Daniels, 1991; McKone, 1991, 1992).
Quantitative environmental risk assessment is often needed for exposure to multiple toxic agents, for example, in the context of hazardous-waste, drinking-water, and air-pollution control. The 1990 Amendments to the Clean Air Act in particular list 189 airborne pollutants of immediate regulatory concern that can be emitted singly or in combination from a variety of specified emission-source categories.
Over the last 2 decades, environmental remediation involving complex chemical mixtures has required general reviews of issues and cases of potential toxicity associated with concurrent exposure to multiple chemical agents (e.g., NRC, 1972a, 1980a,b, 1988a, 1989; EPA, 1988i; Goldstein et al., 1990; Calabrese, 1991). The earlier reviews supported the concept that toxicity predicted by dose additivity or concentration additivity was reasonably consistent with data on the joint action of acute toxicants (NRC, 1972a, 1980a,b; ACGIH, 1977;
EPA, 1987a). Although some cases of supra-additivity for acute toxicants are known, such as the synergistic interaction of organophosphate pesticide combinations in which one compound inhibits the detoxification of another compound, additivity has nevertheless been viewed as a reasonable expectation at the low doses at which detoxification enzymes are not expected to be saturated (NRC, 1988b; Calabrese, 1991).
The EPA Database on Toxic Interactions, as of 1988, covered 331 studies involving roughly 600 chemicals (EPA, 1988g). Most of the studies focused on the effects of two-compound mixtures on acute lethality; fewer than 10% examined chronic or lifetime toxicity. Less than 3% of all the studies reported clear evidence of a synergistic interactioni.e., a "response to a mixture of toxic chemicals that is greater than that suggested by the component toxicities" (EPA, 1988g). However, EPA also concluded that in only one of 32 studies chosen as a 10% random sample of the 331 studies was the design and use of statistics "appropriate with the conclusion justified" (EPA, 1988g). As a consequence, EPA has asserted that
given the quality and quantity of the available data on chemical interactions, few generalizations can be made concerning the likelihood, nature or magnitude of interactions. Most interactions that have been quantified are within a factor of 10 of the expected activity based on the assumption of dose addition (EPA, 1988g).
Results of the few detailed comparative studies in which Salmonella-mutation assays were applied to complex mixtures (kerosene-combustion particles, coalhydrogenation material, and heterocyclic amines from cooked food) are also generally consistent with approximate additivity of mutagenic potencies of constituents within complex mutagenic mixtures (Thilly et al., 1983; Felton et al., 1984; Schoeny et al., 1986).
Epidemiological evidence concerning the synergistic potential of human carcinogens (usually involving long-term cigarette-smoking) has been extensively reviewed (Saracci, 1977; Steenland and Thun, 1986; EPA, 1988g; NRC, 1988a,b; Kaldor and L'Abbé, 1990; Pershagen, 1990; Calabrese, 1991). Although no single mathematical expression is likely to give an accurate representation of joint effects, especially given the heterogeneity of human responses, the discussion here has often focused on whether responses are more clearly additive or multiplicative. The best-studied interactions (such as in joint exposure to tobacco and radon or tobacco and asbestos) suggest that a strictly additive model within the dose ranges studies may underestimate the true joint effects by a factor of 3-10. Results of epidemiological studies of joint exposure to radon progeny and cigarette smoke, for example, have been interpreted as showing an additive or possibly multiplicative interaction of the two agents with respect to the number of cancers induced and a synergistic decrease in latency period for tumor induction (NCRP, 1984; NRC, 1988a). The NRC (1988b) BEIR IV com-
mittee concluded that results of epidemiological studies of smoking and nonsmoking uranium miners exposed to radon gas, particularly the large study by Whittemore and McMillan (1983), were consistent with a multiplicative effect of the combined agents.
The effects of asbestos exposure among workers who have a history of cigarette-smoking have been described (NRC, 1988a) as "one of the most current and well-recognized examples [based on epidemiological data] of how two distinct agents administered together can produce an increased incidence of [lung] cancer that is greater than that predicted from the administration of either agent alone [and that] is considered multiplicative by most investigators who have studied the problem." A study not cited by NRC of more than 1,600 British asbestos workers suggests an additive, rather than multiplicative, increase in relative risk after joint tobacco and asbestos exposure (Berry et al., 1985). Other investigators have also concluded that the overall evidence of multiplicative interaction of these agents is questionable (Saracci, 1977; Steenland and Thun, 1986).
Epidemiological detection of possible multiplicative action among human carcinogens is not surprising, given the large amount of experimental data on the action of cancer promoters in animals, including clear examples of supra-additive interaction (EPA, 1988g; Calabrese, 1991; Krewski and Thomas, 1992). Highly nonlinear, supra-additive synergistic interaction of some types of nongenotoxic cancer promoters with genotoxic agents is predicted by "biomechanistic" multistage models of carcinogenesis. In those models, increased cell replication can play a pivotal role either by directly increasing the rates of production of premalignant or malignant lesions, by amplifying the incidence of malignant lesions through stimulated growth of spontaneously occurring premalignant lesions, or both (Armitage and Doll, 1957; Moolgavkar and Knudson, 1981; Moolgavkar, 1983; Bogen, 1989; Cohen and Ellwein, 1990a,b; 1991; Ames and Gold, 1990a,b; Preston-Martin et al., 1990). From that mechanistic perspective, several nongenotoxic compounds are now thought to be capable of promoting carcinogenesis, both spontaneous and experimentally chemically induced, solely by increasing target-cell replication, a phenomenon that might have a threshold-like dose-response relation (Weisburger and Williams, 1983; Weisburger, 1988; Butterworth, 1989, 1990; Bogen, 1990b; IARC, 1991; Flamm and Lehman-McKeeman, 1991). EPA is considering formal recognition of such threshold carcinogens from the mechanistic perspective (e.g., EPA 1988g, 1991d), although these cases remain awkward to accommodate within EPA's currently-used 1986 general scheme for classifying potential chemical carcinogenicity (EPA, 1987a).
In general, both biological and statistical considerations make it difficult to rule out a nonthreshold mutation-related component of chemically induced carcinogenesis, and this effect might be dominant at low environmental exposures (Portier, 1987; Portier and Edler, 1990; Kopp-Schneider and Portier, 1991; Weinstein, 1991). For example, an increase in target-cell replication induced by some
nongenotoxic chemicals might have a low-dose, linear, nonthreshold dose-response relation. Alternatively, a broad distribution of thresholds within a highly heterogeneous human population might give rise to practical quasilinearity or superlinearity for low-dose promotional effects. Therefore, low-dose linearity has been recommended as a reasonable default assumption, even for agents known to increase cancer risk through nongenotoxic promotional mechanisms, in the absence of data establishing a pertinent, clearly defined, generally applicable threshold dose-response relation (Lutz, 1990; Perera, 1991). Under this default assumption, the mechanistic type of cancer-risk model and the classical multistage cancer-risk model both predict that small amounts of increased risk will be approximately linearly proportional to the risk associated with small combined doses of genotoxic or nongenotoxic carcinogens, or both, and that their joint action will be approximately additive (Gibb and Chen, 1986; NRC, 1988a; Brown and Chu, 1989; Krewski et al., 1989; Kodell et al., 1991b).
The general assumption of low-dose linearity for a presumed nonthreshold quantal end point (i.e., an end point observed only as present or absent), such as cancer occurrence before age 70, is equivalent to assuming P = p + qD, where P is the risk of such occurrence after a lifetime exposure at dose rate D, p is the background cancer risk by age 70, and q is the potency (increased risk per unit dose) for small values of D. Of interest is the aggregate increased probability P of cancer occurrence due to exposure to a low-dose environmental mixture of nonthreshold toxic agents. If the linear model is assumed for each of two such agents, and if an additional independent-action assumption is made that the agents act through statistically independent events to increase risk R, it follows that P ≈ q1D1 + q2D2 for very small D1 and D2 (NRC, 1980b, 1988b; Berenbaum, 1989). A more general sum of potency-dose products has been used by EPA for approximating P in cases of exposure to a mixture of carcinogens (EPA, 1987a, 1988g). Appendix I-1 shows that the same general assumptions imply that a similar sum-of-products relation may be used to approximate the risk associated with mixtures of agents, each having one or more different end-point-specific effective dose rates. Multiple nonthreshold end points can be of interest in quantitative risk assessment, as discussed in more detail below.
Types of Nonthreshold Risk
Quantitative risk assessment can involve multiple toxic end points, as well as multiple toxic agents. In particular, toxic end points other than cancer might at some point also be assumed to have nonthreshold dose-response relations for public-health regulatory purposes. Furthermore, cancer is not a single disease, but a variety of neoplastic disorders with different characteristics that occur in different tissues of animals and humans at different times in the life history. Aggregate human cancer risk is often estimated from animal bioassay data that indicate statistically significant increases in dose-related risk of more than a
single tumor type (e.g., cancer of the lung and cancer of the kidney). Similarly, genetic, reproductive, and developmental risks can arise in multiple forms that are measured separately in toxicity assays (e.g., reduced fertility and incomplete ossification of some bone). The issues of aggregating risk of both multiple end points and multiple types of a given end point are discussed below. Both these aggregation problems can be addressed simultaneously by using Expression 6 in Appendix I-1, if independent actions and effects are assumed.
The issue of how to use bioassay data that indicate dose-related effects for multiple tumor types is addressed by the EPA (1987a) cancer-risk guidelines as follows:
To obtain a total estimate of carcinogenic risk, animals with one or more [histologically distinct] tumor sites or types showing significantly elevated … incidence should be pooled and used for [risk] extrapolation. The pooled estimates will generally be used in preference to risk estimates based on single types or sites.
If different tumor types observed to have increased incidences are known to occur in a statistically independent fashion within and among the bioassay animals tested, this EPA-recommended procedure leads to inconsistently biased estimates of aggregate potency or risk because, under the independence assumption, the pooled tumor-incidence data may randomly exclude relevant information (Bogen, 1990a). For potency estimates based on classical multistage models, that statistical problem is avoided if aggregate potency is estimated as the sum of tumor-type-specific potencies (Bogen, 1990a). If the latter approach is used, then the aggregate increased risk P of incurring one or more tumor types at a very low dose can be estimated from Expression 7 in Appendix I-1 (for one carcinogen). The type-specific potencies are uncertain quantities (one reason is that they are generally estimated from bioassay data), so appropriate procedures must be used for summation.
This alternative (Expression 7 in Appendix I-1) to EPA's procedure for estimating aggregate cancer potency depends on the validity of the assumption that different tumor types occur independently within individual bioassay animals. If substantial interanimal heterogeneity exists in susceptibility to cancer, or if tumor types are positively correlated, the occurrence of multiple tumor types would be expected to cluster in the more susceptible individuals. Although some significant tumor-type associations have been identified in some species, they have tended to involve a relatively small number of tumor types (see Appendix I-2).
Appendix I-2 summarizes an investigation of independence in interanimal tumor-type occurrence in a subset of the National Toxicology Program (NTP) 2-
year cancer-bioassay data, which has been used by EPA as the basis for quantifying the potency of most chemical carcinogens. Separate analyses were conducted for four sex-species combinations (male and female mice, male and female rats) by using control-animal data from 61 rat studies and 62 mouse studies and treated-animal data from a subset of studies in which there were significant increases in multiple tumor types. Correlations in the occurrence of pairs of tumor types in individual animals were evaluated. Little evidence was found of tumor-type correlation for most of the tumor-type pairs in control and treated mice and rats. Some tumor-type pairs were statistically significantly (and generally negatively) correlated, but in no case was the correlation large. These findings indicate that a general assumption of statistical independence of tumor-type occurrences within animals is not likely to introduce substantial error in assessing carcinogenic potency from NTP rodent-bioassay data.
Other Nonthreshold End Points
Two major categories of possible nonthreshold toxicity other than cancer that may often be relevant in quantitative risk assessment are genetic mutation (which might be caused by material that reaches and damages gonadal DNA) and developmental and reproductive toxicity (such as developmental neurotoxicity of lead). In general, however, if both dose-response linearity at low doses and independent dose induction of these effects are assumed, then they may also be incorporated with cancer into the general additive strategy already discussed. The extent to which those assumptions might apply to genetic toxicity and reproductive and developmental toxicity is considered below.
Mutagenic agents can cause detrimental inherited effects with an important genetic component, such as clinically autosomal dominant and recessive mutations, X-linked mutations, congenital birth defects, chromosomal anomalies, and multifactorial disorders of complex origin. Inherited genetic effects other than complex multifactorial effects have been found to occur spontaneously in roughly 2% of all liveborn people, appearing either at birth or thereafter; about 40-80% often involve chromosomal anomalies or dominant or X-linked mutations ("CADXMs") (Mohrenweiser, 1991). In addition, more than 25% of all spontaneous abortions are thought to be due to genetic defects, the majority involving CADXMs (Mohrenweiser, 1991). Rates of those genetic effects are known to be increased in animals by exposure to environmental agents, such as ionizing radiation (which also causes cancer); furthermore, the risks of both genetic and cancer end points associated with low doses of ionizing radiation are currently modeled as being increased above background in linear proportion to dose (NRC, 1972b, 1980c, 1990b; NCRP, 1989; Favor, 1989; Sobels, 1989; Vogel, 1992).
Exposure of experimental animals to mutagenic chemicals can also cause some of these genetic effects, although specific characteristics of chemically induced genetic damage appear to differ in some ways from those induced by irradiation, e.g., in the fraction of dominant versus recessive specific-locus effects (Ehling and Neuhauser, 1979; Lyon, 1985; Favor, 1989; Rhomberg et al., 1990).
Experimental data are not all consistent with a linear nonthreshold dose-response relation for genetic end points induced by either chemicals or ionizing radiation (ICPEMC, 1983a; Sobels, 1989). Chemical mutagenesis, in particular, involves many potentially nonlinear and threshold processes, such as transport of reactants, metabolic activation and deactivation, DNA repair, and chemically induced functional change and lethality (ICPEMC, 1983a). However, it is difficult (if not impossible) to show experimentally that a complex, inherently statistical biological response does not differ from background (ICPEMC, 1983a). In light of such complexities, several National Research Council committees (NRC, 1975, 1977, 1983b) have concluded that the linear nonthreshold dose-response assumption used for ionizing radiation is also a reasonable default hypothesis for mutagenic chemicals. That conclusion reflects the fact that ''if an effect can be caused by a single hit, a single molecule, or a single unit of exposure, then the effect in question cannot have a threshold in the dose-response relationship, no matter how unlikely it is that the single hit or event will produce the effect." It has been similarly concluded that a linear nonthreshold dose-response relation is a reasonable default assumption for chemical mutagens (Ehling and Neuhauser, 1979; ICPEMC, 1983a,b; Lyon, 1985; Ehling, 1988; Favor, 1989; Sobels, 1989; Rhomberg et al., 1990).
Such support of a default assumption of nonthreshold linearity in induced genetic risk has highlighted the uncertainty that exists in quantitative assessment of the total genetic risk to humans associated with exposure to ionizing radiation or genotoxic chemicals. That uncertainty, due particularly to problems in estimating possible increases in rates of human genetic disease, has led some to conclude that realistic assessment of total genetic risk associated with environmental exposure will not soon be possible (NRC, 1990b; Mohrenweiser, 1991; Vogel, 1992). The degree of uncertainty varies greatly among different end points, but dose-response data for mutations in mice, supplemented by corresponding estimates of human spontaneous incidence rates, appear to provide a basis for reasonable quantitative risk assessment for some genetically simple and straightforward end points, such as those involving CADXMs (NRC, 1990b; Mohrenweiser, 1991; Vogel, 1992).
In 1986, EPA adopted guidelines for mutagenicity risk assessment that do not specifically endorse a linear nonthreshold default assumption. Rather, they state that EPA "will strive to use the most appropriate extrapolation models for risk analysis" and "will consider all relevant models for gene and chromosomal mutations in performing low-dose extrapolations and will chose the most appropriate model" (EPA, 1987a). The 1986 guidelines committed EPA to "assess
risks associated with all genetic end points" to the greatest extent possible when data are available, with risk to be "expressed in terms of the estimated increase of genetic disease per generation, or the fractional increase in the assumed background spontaneous mutation rate of humans." In pursuit of methods to implement the goals of its guidelines, EPA sponsored a major effort concerning genetic-risk assessment for the direct-acting mutagen ethylene oxide (Dellarco and Farland, 1990; Dellarco et al., 1990; Rhomberg et al., 1990). But EPA does not now routinely perform quantitative assessments of genetic risk posed by chemical mutagens in the environment as part of any of its regulatory programs.
EPA's 1986 guidelines are nonspecific not only regarding particular methods to be used by the agency for estimating mutagenic risk, but also regarding how such risk might be aggregated with risks estimated for other end points, such as cancer. The suggested measures of genetic risk in the guidelines cannot readily be aggregated with EPA's commonly used measures of increased cancer risk to individuals or populations. However, individual genetic risk could be expressed as increased lifetime risk of expression of a serious inherited genetic end point in a person whose parents were both exposed from birth to a given relevant compound at a given effective dose rate. And addition of such a predicted risk to a corresponding magnitude of predicted somatic (cancer) risk would be appropriate under assumptions of low-dose linearity and independence as discussed above and in Appendix I.
Risk assessments of ionizing radiation provide precedents for the simple addition of quantitative estimates of genetic and cancer risk (e.g., Anspaugh and Robison, 1968; ICRP, 1977a,b, 1984, 1985). However, EPA has made no systematic effort to consider the combination of mutagenic and cancer risks. In the context of setting radiological National Emission Standards for Hazardous Air Pollutants (NESHAPs), the agency's Office of Radiation Programs made a substantial effort to describe quantitative risk estimates for both cancer and genetic end points (EPA, 1989b). However, the genetic risk factors were not used later in EPA's corresponding quantitative radiologic-risk assessments for radioactive air contaminants (EPA, 1989b), nor are they considered in current EPA guidance on how to calculate preliminary Superfund remediation goals for radionuclides at hazardous-waste sites (EPA, 1991f).
The importance of considering a quantitative combination of genetic and cancer end points depends on the ratio of genetic-to-cancer potency of any given chemical. If the ratio is much less than 1, genetic-risk assessment of the chemical is probably unwarranted, because it is likely to have little impact on regulatory action. For example, the upper-bound estimate of the potency of ethylene oxide (ETO) to produce heritable translocations (HTs) in children of exposed men was recently estimated to be equivalent to 0.00066 per part of ETO per million parts of air continuously inhaled. This estimate was based on an EPA analysis that applied a linearized multistage extrapolation model to dose-response data on HT induction in mice; a 21-day critical exposure period was assumed to
be potentially damaging to human males (Rhomberg et al., 1990). In contrast, EPA had previously estimated ETO's cancer potency to be 0.19 per part of ETO per million part of air continuously inhaled over a lifetimea value almost 290 times its estimated HT potency (EPA, 1985c). The genetic risk associated with ETO could not therefore constitute a substantial fraction of the genetic-pluscancer risk unless HT represented a very small fraction (e.g., less than 1/290) of all reasonably quantifiable ETO-induced genetic end points. This appears to be unlikely, given that HTs constitute between about 5% and 10% of CADXMs (ICPEMC, 1983b).
There are continuing concerns about the adequacy of current approaches (threshold, linear, nonlinear, BD, etc., described in Chapter 4) to characterize the risks associated with potential reproductive and developmental hazards (Barnes and Dourson, 1988; Mattison, 1991). Particular questions remain regarding thresholds. Although threshold mechanisms might seem plausible, the estimation of an upper limit to ensure that doses are safe depends heavily on available methods of study and measurement and our knowledge of organ- and tissue-specific repair mechanisms. The issue merits continued consideration. This issue is also discussed in the NRC report entitled Seafood Safety (NRC, 1991b).
The current and proposed EPA guidelines concerning reproductive- and developmental-toxicity risk assessment are based on the controversial assumption that chemical induction of reproductive or developmental toxicity generally has a true or practical threshold dose-response relationship. As noted by EPA (1991a), such thresholds might differ among exposed people, and EPA has traditionally accommodated such interindividual variability by using an extra uncertainty factor or safety factor of 10, whose adequacy remains to be established.
Measures And Characteristics Of Risk
Overall Characterization Goals
An essential component of risk characterization is the aggregation of different measures and characteristics of risk; the risk assessor must communicate measures and characteristics of predicted risk in ways that are useful in risk management. The technical aspects of risk aggregation and characterization cannot and should not be separated from the design of useful, politically responsible, and legally tenable criteria of risk acceptability, because such criteria must generally be based on risk characterizations that follow some standard format, and the format must accommodate the criteria. As new, more sophisticated approaches to risk assessment and characterization are proposedsuch as the incorporation of integrated uncertainty and variability analysisthe correspond-
ing more complicated criteria for risk acceptability have not been agreed on. It is therefore appropriate to establish as an interim goal of risk characterization the adoption of a format that includes a summary of predicted risk that is accurate, comprehensive, easily understood, and responsive to a wide array of public concerns about risk. The format should include the magnitude and uncertainty of estimated population risk (that is, predicted incidence) as well as individual risk, the uncertainty of estimates of costs and competing risks inherent in alternative risk-management options, the degree to which estimated risks might vary among exposed individuals, and the time frame of risks imposed.
Consistency in Characterization: Example of Aggregation of Uncertainty
To the extent that a given aggregated characteristic of a risk assessment, such as uncertainty, is addressed in an overall characterization of predicted risk, it should be determined with a consistent approach to estimates of the magnitudes of the components considered (e.g., ambient concentration, uptake, and potency). In the case of uncertainty aggregation, such consistency will come about through a rigorous, fully quantitative approach (see Chapter 9). But such a fully quantitative approach might be deemed impractical; for example, quantification of subjective probability judgments in the assessment might be considered difficult or misleading. A screening-level alternative to a fully quantitative approach to uncertainty aggregation is to use a qualitative or categorical approach that describes, in narrative or tabular form, the impact of each component of the analysis on each aspect of predicted risk. However, an exclusively qualitative, categorical approach is generally impractical because it fails to communicate effectively the fundamental quantitative conclusions of the risk analysis in terms that are of direct use to risk managers.
Thus, the approach to uncertainty aggregation most often used has been a semiquantitative approach incorporating specific key assumptions whose merits and impact are discussed verbally. The difficulty with this approach lies in ensuring that resulting semiquantitative characteristics are properly interpreted and communicated. For example, it would be illogical and potentially misleading to characterize a final risk estimate as a "plausible upper bound" on risk, if it were derived by aggregating component-specific point estimates that represent a mixture of best estimates and statistical upper confidence limits. That is particularly true if the components for which best estimates are used are also the components known to be the most uncertain among those considered. When, for example, risk is modeled as a simple product of estimated quantities (such as concentration, potency, etc.) a great deal of conservatism is lost whenever a best estimate is used in place of a far larger corresponding upper-bound value (and little conservatism is gained by using an upper-bound value if it is close to the corresponding best estimate). Thus, if a semiquantitative approach is to be used,
the only way to obtain a meaningful "upper-bound" point estimate of risk from component-specific point estimates would be to base the "upper-bound" point estimate entirely on "upper-bound" estimates of all the component quantities. This point is illustrated by the following example involving EPA's cancer-risk guidelines.
The EPA guidelines for cancer-risk characterize the estimate produced by following the guidelines as a "plausible upper bound" on increased cancer risk. Such a risk estimation will generally involve a pertinent set of animal bioassay data, an animal-cancer potency estimate, and an interspecies dose-scaling factor. According to the 1986 guidelines, the risk assessment is to be based on the data showing the most sensitive response (i.e., that give the highest estimated potency value or set of related values), and the animal-cancer potency value used is a statistical upper confidence limit of potency estimated from the animal-bioassay data set selected. The guidelines specify a dose-scaling factorbased on what was intended by EPA to be a deliberately conservative assumption that carcinogenic doses are equivalent between species if they are expressed as daily mass per unit of body surface area. Recently, EPA (1992e) proposed adopting a new scaling factor that is somewhat less conservative because this new factor appears to be close to a "best" estimate of what the factor might actually be. However, EPA (1992e) noted that
Although scaling doses by [the newly proposed factor] characterizes the trend [relating epidemiologically based human-cancer potencies with corresponding experimentally determined ones for animals] fairly well, individual chemicals may deviate from this overall pattern by two orders of magnitude or more in either direction. … The proposed scaling [approach] … represents a best guess … surrounded by an envelope of considerable uncertainty. … [It] is intended to be…an unbiased projection; i.e., it is to be thought of as a "best" estimate rather than one with some conservatism built in … [such] as a "safety factor" or other intentional bias designed to "err on the side of safety."
A similarly large degree of uncertainty associated with interspecies dose scaling was also indicated in a recent reassessment of uncertainty pertaining to interspecies extrapolation of acute toxicity (Watanable et al., 1992). Other studies (Raabe et al., 1983; Kaldor et al., 1988; Dedrick and Morrison, 1992) provide evidence that a milligram-per-kilogram-per-lifetime dose metric may be roughly equivalent across species. These studies compare human carcinogenicity and animal carcinogenicity for alkylating or radioactive agents (administered for therapeutic purposes in the case of humans). Dose-scaling uncertainty may thus be substantially far greater than that associated with parameter-estimation error for cancer potency in bioassay animals and be at least as great as that associated with the selection of a bioassay data set for analysis. EPA's proposed dose-scaling policy would therefore be an exception to its reasonably consistent practice of using component-specific upper bounds when semiquantitative aggregation of uncertainty is used to derive a "plausible upper bound" on increased risk. The most
straightforward way to obtain such an upper-bound dose-scaling factor would be to calculate it directly from the best available relevant empirical data that relate epidemiologically based human-cancer potencies to corresponding experimentally determined animal-cancer potencies (e.g., Raabe et al., 1983; Allen et al., 1988; Kaldor et al., 1988; Dedrick and Morrison, 1992). An uncertainty distribution for the scaling factor could also readily be developed from these data, and an appropriate summary statistic chosen explicitly from this distribution, rather than by fiat and without reference to uncertainty (see, for example, Watanabe et al., 1992).
Uncertainty and Variability
We have deliberately treated these two concepts separately up to this point in the report, because we view them as conceptually quite different even though they share much of the same terminology (e.g., "upper confidence limit," "standard deviation"). Indeed, as emphasized in Chapters 9 and 10, the realms of uncertainty and variability have fundamentally different ramifications for science and judgment: uncertainty forces decision-makers to judge how probable it is that risks will be overestimated or underestimated for every member of the exposed population, whereas variability forces them to cope with the certainty that different individuals will be subjected to risks both above and below any reference point one chooses.1
Thus, any criticism that EPA has assessed or managed a risk too "conservatively" needs to consider and explain which type of conservatism is being decried. The use of a plausible but highly conservative scientific model, if it imposes large costs on society or the regulated community, can throw into question whether it is wise to be "better safe than sorry." The attempt to provide protection to persons at the "conservative" end of a distribution of exposure or risk, in contrast, determines who ends up with what degree of safety and thus requires a different decision calculus. In particular situations, either uncertainty or variability (or perhaps both) might be handled "conservatively.'' For example, society might in one case determine that the marginal costs of protecting individuals with truly unusual hypersusceptibility were too large relative to the costs of protecting only the majority, but might still choose to assess the risk to each group in a highly conservative manner. In another case, society might view the central tendency of an uncertain risk as an appropriate summary statistic, yet deem it important to extend protection to individuals whose risks are far above the central tendency with respect to the varied risks across the population.
On the other hand, this risk management distinction between uncertainty and variability should not blind people to a central fact of environmental health risk assessment: that in general, risks are both uncertain and variable simultaneously. In the prototypical hazardous air pollutant risk assessment case, one can think of the source exposing each nearby resident to a different ambient
concentration of each emitted pollutant; each of these concentration values is made still more variable by the unique activity patterns, uptake parameters, and susceptibility of each individual. Simultaneously, each of these "individualized" parameters is either hard to measure or impossible to model with certainty (or both), and all of the "generalized" parameters (such as the inherent carcinogenic potency of each substance) are also surrounded by uncertainty. In sum, the source does not impose "a risk"it imposes a spectrum of individual risks, each of which can only be completely described as a probability distribution rather than a single number.
Elsewhere in the report, we have commented on two aspects of the challenge of assessing variable and uncertain risks: communicating them correctly and comprehensively (see the findings and recommendations for this chapter), and describing how to relate variability to uncertainty in order to explicitly target risk management to the desired members of the population (average, "high-end," maximally at risk, etc.) in light of the uncertainty (again, see the findings and recommendations for this chapter).
Here, we briefly mention two additional complications that arise because uncertainty and variability work in tandem. We make no specific recommendations regarding either issue, because we feel EPA analysts and other risk assessors need flexibility to account for these technical problems as they gradually improve their treatment of the separate phenomena of uncertainty and variability. Nevertheless, it is important to keep in mind two other relationships between these phenomena:
In sum, EPA should realize that estimates of variability themselves may be too large or too smallif "conservatism" is crucial, it may make sense to take account of this impreciseness of variability as well as taking account of variability itself (e.g., if fish consumption is deemed to be lognormal with a standard deviation somewhere between (x - ) and (x + ), it might be appropriate to use an upper confidence limit for fish consumption that is in turn based on the larger of the two estimates of variability, x + .
Aggregation of Uncertainty and Variability
To the extent that both uncertainty and interindividual variability (that is, heterogeneity or differences among people at risk) are addressed quantitatively with separate input components (e.g., ambient concentration, uptake, and potency) for aggregation into an assessment of risk, the distinction between uncertainty and variability ought to be maintained rigorously throughout the analytic process, so that uncertainty and variability can be distinctly reflected in calculated risk. If no distinction were made between uncertainty-related and heterogeneity-related distributions associated with inputs to a given risk calculation, then whatever distribution might be obtained as a characteristic of risk would necessarily reflect risk to an individual selected at random from the exposed population (Bogen and Spear, 1987). This restricted result would render such analyses less useful for environmental regulatory purposes, in light of the tendency to focus substantial regulatory attention on increased risk to highly sensitive or highly exposed members of the population.
Another advantage of distinguishing between uncertainty and variability is that it permits one to estimate the uncertainty in the risk to the individual who is "average" with respect to all characteristics that are heterogeneous among indi-
viduals at risk, and the latter risk may be used to estimate uncertainty in predicted population risk or number of cases (Bogen and Spear, 1987). Technical issues that arise in aggregating uncertainty and interindividual variability for the purpose of calculating estimated individual and population risk are described in Appendix I-3.
Findings And Recommendations
Multiple Routes of Exposure
Although the Clean Air Act Amendments of 1990 do not specifically refer to multiple exposure pathways, EPA has routinely considered multiple exposure routes in regulatory contexts, such as Superfund, that logically concern source-specific pollutants that might transfer to other media before human exposure.
Multiple Compounds and End Points
When aggregating cancer risk associated with exposures to multiple compounds, EPA adds the risk related to each compound in developing its risk estimate. That is appropriate when the only risk characterization desired is a point estimate used for screening-level analysis. However, if a quantitative uncertainty characterization is desired, simple addition of upper confidence limits may not be appropriate.
EPA currently uses a specific procedure when analyzing animal bioassay data involving the occurrence of multiple tumor types (e.g., lung, stomach, etc.) to estimate the total cancer risk associated with exposure to a single compound. In this procedure EPA adds the numbers of animals with tumor types that are significantly increased above control levels, such that an animal with multiple tumor types counts the same as one with a single tumor type. This procedure does not allow full use of the data available and can overestimate or underestimate total cancer risk.
Current EPA guidelines do not clearly state a default option of nonthreshold low-dose linearity for genetic effects that can be reasonably estimated for quantitative risk assessment.
Reproductive and Developmental Toxicants
While EPA is increasing its use of the benchmark dose, it still uses a threshold model in its proposal for regulation for reproductive and developmental toxicants. Although the threshold model is generally accepted for these toxicants, it is not known how accurately it predicts human risk. Current evidence on some toxicants, most notably lead and alcohol, does not unequivocally demonstrate any "safe" threshold and thus has raised concerns that the threshold model might only reflect the limits of current scientific knowledge, rather than the limits of safety.
"Upper-Bound Estimates" versus "Best Estimates"
In a screening-level or semiquantitative risk characterization, component uncertainties associated with predicted cancer risk are not generally aggregated in a rigorous quantitative fashion. In such cases, it is practical to calculate an "upper-bound" point estimate of risk by combining similarly "upper-bound" (and not "best") point estimates of the component quantities involved, particularly for
quantities (such as the dose-scaling factor) that are highly uncertain. For screening-level analyses, the EPA (1992d) proposal to adopt a new interspecies dose-equivalence factor is inconsistent with the 1986 guideline stipulation that risk estimated under the guidelines represents a "plausible upper bound" on increased cancer risk, and it is inconsistent with the corresponding stipulation that "upper-bound" or health-conservative assumptions are to be used at each point in cancer-potency assessment that involves substantial scientific uncertainty.
Uncertainty versus Variability
A distinction between uncertainty (i.e., degree of potential error) and interindividual variability (i.e., population heterogeneity) is generally required if the resulting quantitative risk characterization is to be optimally useful for regulatory purposes, particularly insofar as risk characterizations are treated quantitatively.
1. For example, in the 1980s the Consumer Products Safety Commission (CPSC) had to issue a standard regarding how close together manufacturers had to place the vertical slats in cribs used by infants, with the aim of minimizing the number of accidental strangulations nationwide. Presumably, there was virtually no uncertainty about the diameter of an average infant's head, but there was significant variability in distinguishing different infants from each other. CPSC thus had to make a decision about which estimation of head size to peg the standard toan "average" estimate, a "reasonable worst case," the smallest (i.e., most conservative) plausible value, etc. We suggest that it is not apropos to use the phrase "better safe than sorry" to apply to this kind of reasoning, because uncertainty is not at work here. Rather, deciding whether to be conservative in the face of variability rests on a policy judgment about how far to extend the attempt to provide safety.