This chapter discusses the quantity, quality, and availability of data needed for conducting an adequate risk assessment in the context of the Clean Air Act Amendments of 1990 (CAAA-90). It begins by discussing the need for a priority-setting process, and the need for an iterative data-collection process. It then indicates the proper prioritization for data collection and the availability of data in each of the key risk-assessment steps. It concludes with a discussion of how data should be managed.
Context Of Data Needs
Most would agree that, given the best available model, additional relevant data will lead to a more accurate and precise risk assessment. The quality of the data is critical, no matter how excellent the model chosen, to avoid the classic ''garbage in, garbage out" problem. In the gathering of data, tradeoffs must often be made among data that are necessary, data that are desirable, and data that are affordable. Desirability must be defined in the context of the risk-management goals to be achieved, which might be the development of regulations, the setting of standards, or the screening of chemicals to set priorities.
The more precisely the risk manager frames the questions to be addressed by the risk assessment at the outset, the less ambiguity there will be as to what data are required to answer the questions, the less need for judgment in datagathering, and the lower the likelihood that inappropriate or insufficient data will be gathered. As a corollary, public input into the framing of goals and questions can help to avoid public criticism and distrust of the process of risk assessment,
including the gathering of exposure and toxicity data. Public confidence that risk managers are addressing real concerns, as opposed to going through a process perfunctorily, is critical to the future of risk assessment as an activity capable of improving the quality of life. Risk managers need to articulate clearly from the beginning who is to be protected from what, when and where, and at what cost (including how much effort and funds are to be expended to collect appropriate data), so that risk assessors can provide relevant information.
Implications For Priority-Setting
It is not necessary, nor would it be cost-effective, to collect all the data needed for a complete health-hazard assessment on all the 189 chemicals (or mixtures) listed in CAAA-90. It is important, however, that the entire list be examined to identify chemicals that are potentially hazardous and that the later full-scale evaluation of each chemical selected for further scrutiny proceed as effectively as possible. An overall strategy is essential for setting priorities among the steps in the information-gathering process and for determining the extent of assessment needed.
Because risk is a function of exposure, as well as toxicity, determining both that a chemical is of low toxicity to all humans and that all humans have only small exposures to it would lead to an overall low priority for a full-scale risk assessment. Obviously, assigning a high priority to both would lead to an overall high priority for such assessment and argue for collection of a complete data set in all categories of exposure and toxicity. There will be various intermediate levels between low and high overall priority.
In the absence of pertinent human data, toxicological evaluation should begin with the simplest, most rapid, and most economical tests and proceed to more complex, time-consuming, and more expensive tests only as warranted by the initial steps. Similarly, emission, transport, and exposure data might be used to rank chemicals for testing, from those with relatively large exposure potential down to those with a very low likelihood of significant exposure, either for the population at large or for any substantial subset of the population. What is "substantial" in this context will of course depend on concurrent assessments of toxicity. Ordering can then be based on an evaluation of a relatively modest or limited data set.
To assess whether there is a potential for exposure, and to gauge the magnitude and duration of exposure, one needs to know:
If the chemical is not emitted or is so unstable that it breaks down into innocuous products before reaching a population, no further data need be col-
lected and further risk assessment is not warranted. But if it is emitted and can be transported to a population, one needs to ask:
In an iterative data-collection process, one works through data related to questions 1-4, first collecting the most critical data within each category, then judging needs for more data within that category before moving to the next category. The process is iterative until sufficient information is gathered to draw a conclusione.g., on a potential threat to public health.
Section 112 of the Clean Air Act mandates that EPA consider the hazards and possible regulation of 189 specified chemicals. Considering both the effort required to carry out complete risk assessments and the resources of the agency, it is unlikely that that can be accomplished within the time constraints of the act. Consequently, in the spirit of the act and in the interest of the public welfare, it is critical that EPA assign priorities to the chemicals listed. These priorities should be based first on their potential impact on human health and welfare.
Some of the 189 chemicals appear to present major problems because of their variety of sources, large exposures, or high potency. Other chemicals present simpler problemse.g., some have relatively few sources, some have lower potential for human exposures, and some have very low potency. It is an inefficient use of resources to invest huge amounts of money and time in research and analysis to determine factors already known to be inconsequential for final risk assessment or to confirm credible estimates on which consensus can easily be obtained. Therefore, EPA should do preliminary analyses (screenings) on all listed compounds to ascertain which chemicals merit detailed risk-assessment efforts and which do not merit such work. These preliminary analyses should be reviewed by an independent board to ensure the validity of the resulting priorities for full-scale assessments. Priorities should be continually reevaluated and changed as appropriate in response to new data. The task of setting priorities and keeping them up to date is not trivial and should be specifically included, with adequate resources, in EPA's evolving program plan to implement CAAA-90. The iterative data-collection process can then help in setting priorities for ranking needed studies to avoid the accumulation of a surfeit of data, which would result in misuse of funds and waste of time.
Data Needed For Risk Assessment
The following sections discuss the priority-setting and availability of data for each of the key data-processing steps in risk assessment: emissions, environmental fate and transport, exposure, and toxicity. The final section summarizes the data priorities in each of these areas, and indicates how this data can be used for overall priority-setting for data collection.
Knowledge of emissions of a chemical into the airspecifically, the quantity emitted per unit of time (flux) from each place where it is made, stored, used, or disposed of plus its physical and chemical formis fundamental to characterizing the magnitude of expected exposure to the chemical.
Priorities for Collecting Data
The specific methods for characterizing emissions are described and evaluated in Chapter 7. On the basis of this analysis, an iterative data-collecting process for emission characterization might proceed roughly as follows:
Data quality is critical, because of the wide variety of emission-estimation techniques and the many types of facilities emitting hazardous air pollutants. EPA often uses whatever data are available at the time of decision-making and has not published guidelines or standards for the quality of emission data to be used in its risk assessments.
Because the emission-characterization database is extremely important for priority-setting, EPA should review the emission estimates submitted to ensure that they meet reasonable quality standards and that emission estimates from all sources within a site are submitted.
EPA plans to use emission information that is available in the Toxic Release Inventory (TRI) database as required by Title III of the Superfund Amendments and Recovery Act (SARA). The information available in this database is shown in the table provided by EPA to the committee in Appendix A. The TRI database includes information on annual emissions, facility location, and categorization of emissions as fugitive, point source, or both.
These data have two serious limitations for any use in risk assessment. First, the database does not include emissions from all operations at a facility; for example, transfer operations are not reported. Second, the database does not include emissions of less than 10 tons/year, nor does it have the locations of emission points or the frequency of emissions. Some information is available in emission inventory databases that are required by state implementation plans (SIPs) that states are required to submit to EPA to indicate how they plan to control emissions relative to CAAA-90, but that information is not necessarily
well characterized. For example, emissions of volatile organic chemicals (VOCs) might be listed as a total, instead of as emissions of separate chemicals; but risk assessments should generally be done for separate chemicals, rather than for classes of chemicals.
A study by Amoco and EPA (1992) gives an example of the differences between estimated or calculated emissions (such as those listed in the TRI database) and emissions determined via direct measurement. This study found that the "existing estimates of environmental releases were not adequate for making a chemical-specific, multi-media, facility wide assessment." The report identified several specific problems in using the TRI database to conduct an in-depth evaluation of a facility:
EPA should develop a mechanism to gather the information just listed in a consistent fashion. This mechanism could include changes in Title III of SARA, which requires the TRI reporting requirement or development of information for Title I or V of CAAA-90. Although development of emission characterization databases for all of the 189 chemicals might initially seem to be a major task, CAAA-90 requires states to develop more detailed emission inventories by November 1992 and to update them. Most facilities are then required to estimate their emissions on a point basis to satisfy state requirements for emission inventories. Much of this information is also required for permit purposes.
Even simple changes, such as modifying the SARA Title III requirements to include all 189 hazardous air pollutants on the list, would help. Sixteen of the 189 compounds in CAAA-90 Title III are not on the TRI list (see Table 8-1). In addition, the TRI database includes only sources that have 10 or more full-time employees and that manufacture, process, or use specified chemicals above a certain production rate. That restriction excludes smaller sources within the manufacturing sector for which risk assessments must be conducted under the Title III requirements. Instituting an emission threshold relative to the Title III requirements (e.g., 10 tpy for single compound; 25 tpy for multiple compounds) might be more appropriate for gathering information for risk-assessment purposes.
TABLE 8-1 List of Section 112 Pollutants Not in Toxic Release Inventory Data Base
Fine mineral fibers
Polycylic organic matter
Sulfur dioxide, anhydrous
For evaluation of VOCs, many of which are on the list of 189 compounds under Title III, emission estimates developed for other regulatory purposes (such as the ozone provisions of CAAA-90) can be used. However, these data are frequently not speciated in terms of the chemical composition of the VOCs. In addition, the reporting of VOC emission information is required only in nonattainment areas, so this information may not always be available.
Environmental Fate and Transport
Emitted pollutants can move within and between environmental media and be converted to different forms. A thorough understanding of what happens to a chemical in the environment forms part of the basis for estimating human exposure and hence determining risk.
Priorities for Collecting Data
In the proposed iterative data-collection process described at the beginning of this chapter, data on environmental fate and transport would be acquired in roughly the following order:
Once that information is available, a model calculation of expected concentra-
tions in nearby air is relatively straightforward. If the information is not available, it must be obtained or assumed.
Data on emissions and physical properties are generally available or can be estimated (Lyman et al., 1982). For chemical properties and reactivity, they are available for some environmental reactions, but not all. In the case of physicochemical properties, the environment data are generally available at most locations in the United States. Information on the rates of potential removal processes are more difficult and costly to obtain.
Careful evaluation of data is necessary. For example, published vapor pressures of organic chemicals of moderate to low volatility determined under laboratory conditions can be seriously inaccurate and misleading. For all chemicals, vapor-phase reaction rate constants, when extrapolated from the laboratory to outdoor ambient air, can be seriously in error. The literature is not always for purposes of risk assessment.
Accurate exposure data are crucial to valid risk assessment. For example, exposure data must match up temporally with the health end points of concern. Key issues in the evaluation of exposure are
Rarely are all those issues resolved by the exposure data available for a risk assessment. Efforts to collect the data should focus on the minimum needed to meet the goals of the assessment in its risk-management context.
Priorities for Collecting Data
In the proposed iterative data-collection process, the order of data collection might be as follows:
Some of the 189 chemicals on the Clean Air Act Amendments list have relatively abundant data on concentrations; some have virtually none. When concentration data are available, they are more likely to be from ambient-air monitoring or, at best, targeted fixed-point monitoring. For only some of the compounds are sufficient exposure data available for preliminary evaluation of relative priority for more detailed risk assessment (see Appendix A). That is a major problem that can be solved only by a much more extensive state or federal monitoring program. Some states, such as California, are moving rapidly in developing a hazardous air-pollutant monitoring program. Coordination between states and with federal agencies is necessary to keep scarce resources from being wasted in duplicative efforts.
Collection of new exposure data on humans is limited by current methods of monitoring individual exposures (which are often expensive, often of low accuracy or precision, and often nonquantitative or lacking in the ability to determine the source of exposure) and by methods of obtaining information on human behavior that might affect uptake or exposures. In addition, no reference database is available for comparing new data, that is, for determining whether new data represent exposure outside the general norm or are within the realm of acceptability defined by prior studies. Furthermore, when exposure data are gathered, they should be probability-based to allow inferences to the population and estimation of the tails of the distribution of exposures.
A full assessment of the inherent toxicity of an agent requires some combination of structure-activity analyses, in vitro or whole-animal short-term tests, chronic or long-term animal bioassays, human biomonitoring, clinical studies, and epidemiological investigations (NRC, 1984, 1991c,d). A complete hazard identification might entail review of information in all those categories before a determination that a quantitative risk assessment of the agent is warranted (Bailar et al., 1993).
Estimation of dose-effect relationships requires data on the effects of a wide range of doses, on factors that influence the dose delivered to critical target cells by given magnitudes and patterns of exposure (e.g., uptake, anatomic distribution, metabolism, and excretion) (NRC, 1987), on the shapes and slopes of pertinent dose-effect curves, on the relevant mechanisms of effects (NRC, 1991c),
and on the extent to which the response to an agent can vary with species, sex, age, previous exposure, health status, exposure to extraneous agents, and other variables (NRC, 1988a).
Priorities for Collecting Data
Strategies to fill data gaps in toxicity assessment are best developed case by case, but the following priority-setting of the major types of toxicological data that may be used are listed below. In the suggested iterative data-collection process, the toxicity data listed in the first three categories below (i.e., generic and acute toxicity, acute mammalian lethality) should be collected on every chemical as a starting point, and other, more expensive, data should be collected only on chemicals that give cause for concern based on the data in those categories.
This prioritization is based on the cost and complexity of gathering such data (NRC, 1984). It is generally not possible to plan the collection of clinical and epidemiological data. Toxicological studies conducted clinically in humans are usually planned and implemented under experimental control, but very few are done, because of the attendant hazards. Epidemiological studies are relative-
ly expensive and often produce data that are difficult to interpret as to effects of specific toxic agents. If one were to set data-collection priorities without concern for cost, ethical, or other considerations, the sequence of collection might be
Availability of requisite data varies widely among the 189 chemicals. On the one hand, some preliminary toxicity data are available on some of the chemicals, or at least can be estimated from structure-activity correlations. On the other hand, the toxicity data are incomplete on almost all 189 chemicals.
The amount of data available is highly variable and depends largely on the existence of uncontrollable chance events. Generally, better data sets exist on individual chemicals that have been used over long periods (vinyl chloride, some solvents, etc.) and on chemicals of wide use (such as pesticides) than on chemicals rarely used or chemicals that are byproducts of other chemicals (e.g., chemicals in automobile exhaust and cigarette smoke). Additional information and analysis on the Integrated Risk Information System (IRIS) used by EPA is provided in Chapter 12. Some of the partial data needed to test models are discussed in Chapter 6.
Overall Priority Setting
The data needed for each step of risk assessment are summarized in rough order of increasing complexity (see Table 8-2). In an iterative data-collection process, if information in the top one or two items of each of the four columns in Table 8-2 does not indicate increased risk potential the priority for full risk assessment should be low. Various combinations of negative information in the first few items of any two of the first three lists (e.g., emissions, environmental fate and transport, exposure) with positive information in the third list might lead to a medium priority. Positive information in the early items of two, or perhaps three, of the lists would argue for a high priority. Data for the more complex items of each list would be developed when evidence of potential hazard exceeded an agreed-on ''bright line" of concern, i.e., a decision point set either by regulation or programmatic procedures.
Although a full priority scheme probably should be on a continuous scale, several important points to develop a more detailed scheme might appear as follows:
Screening risk assessment
EmissionsItems 1 and 2
Environmental fate and transportItems 1-3
Full risk assessment
Environmental fate and transportItems 1-5
Reliable positive human evidence will always result in a high priority and the full risk evaluation. Any positive clinical, toxicologic, or epidemiological human data would override a priority based on exposure and animal toxicity data alone and move a given chemical to the stage of full risk assessment.
The detailed nature of the process used to set priorities for full risk assessment needs to be addressed in a coordinated way by federal and state agencies, to ensure the best use of limited resources for this programmatic step. There might be, for example, a numerical weighting or scoring approach based on data in the four categories of emissions, environmental fate and transport, exposure, and toxicological data. EPA should consider convening a panel of experts to develop a priority-setting process and the requisite accompanying iterative approach to data collection.
More attention needs to be paid to data management to ensure that vital data gaps are filled, that data used in risk assessments are of the best possible quality, and that relevant information (such as negative epidemiological information) is
not overlooked. The lack of a consistent data-collection scheme makes data analysis, and thus effective risk assessment, inconsistent and unreliable for risk-management purposes.
For example, risk assessment often requires that the assessor decide whether to set aside information from old studies when newer, supposedly better information is available. The ultimate desire is for credibility; therefore, it is important to use information that is widely acknowledged as the best representation of reality. If the results of a new study contradict information from an old study and if there is only a small difference in the "bottom-line" estimate of human health risk, then both should be used, and the error bounds of the current risk assessment should be revised. However, if the studies lead to quite different conclusions, use of both might be feasible. For example, some animal evidence might show a major health hazard while there may also be weak, negative, or equivocal animal studies. Such conflicting data should be carefully reviewed in the risk-assessment document, with detailed study of possible reasons for the discrepancy. When no reconciliation of results seems feasible, the committee recommends that the voice of prudence be heard and that the risk assessment be either based on the higher ultimate risk estimate or delayed (as was done in part on formaldehyde) until additional studies can be completed.
Findings And Recommendations
The committee's findings and recommendations follow.
Insufficient Data for Risk Assessment
EPA does not have sufficient data to assess fully the health risks of the 189 chemicals in Title III within the time permitted by the Clean Air Act Amendments of 1990.
Need for Data-Gathering Guidelines
EPA has not defined the guidelines or process to be used for determining the types, quantities, and quality of data that are needed for conducting risk assessments for facilities emitting one or more of the 189 chemicals.
Inadequacy of Emission and Exposure Data
EPA has often relied on non-site-specific emission and exposure data. These data are often not sufficient to assess the risk to individuals and the affected population at large.
Inadequacy of TRI Database as a Source of Emission Data for Risk-Assessment Purposes
The SARA 313 Toxic Release Inventory data and other readily available data used by EPA for emission characterization may be adequate for screening purposes but are not adequate for developing detailed risk assessments for specific facilities. Present processes of gathering emission data do not yield information appropriate for all risk-assessment purposes under the Clean Air Act Amendments.
Lack of Adequate Natural Background-Exposure Database
EPA does not have an adequate database on natural background exposures to the 189 air pollutants against which to evaluate total human exposure data from facilities producing or using these substances.
Inadequate Explanation of Analytical Techniques
EPA does not always explain adequately the analytical and measurement methods it uses for estimating ambient outdoor exposures.
Need for System of Data Management for Risk Assessment
EPA needs more adequate mechanisms to compile and maintain databases for use in health-risk screening and assessment.