In the United States, several federal agencies have been given authority to regulate a variety of environmental agents that might harm public health. Accordingly, the agencies implement regulations that establish maximum acceptable concentrations of environmental agents in drinking water, set permissible limits of exposure of workers, define labeling requirements, establish tolerances for pesticides residues on food, and set other kinds of limits on the basis of risk assessment. Toxicity testing in laboratory animals provides many of the data needed for risk assessment, such as information on the possible effects of exposure to a substance and the exposure concentrations at which effects might be observed.
New directives and initiatives for toxicity testing in the United States and Europe reflect an increased demand for toxicity information to provide a rational basis for regulating environmental agents. At the same time, new testing technologies and methods have continued to emerge. The U.S. Environmental Protection Agency (EPA) recognized the need for a comprehensive review of established and newly developed toxicity-testing methods and strategies and asked the National Research Council (NRC) to conduct an independent review and to develop a long-range vision and strategy for toxicity testing. In response to EPA’s request, the NRC convened the Committee on Toxicity Testing and Assessment of Environmental Agents.
COMMITTEE’S CHARGE AND APPROACH TO ITS CHARGE
The committee was asked to conduct a two-part study to assess and advance current approaches to toxicity testing and assessment undertaken to meet regulatory data needs. For the first part of the study, the committee was asked to review relevant aspects of several reports by EPA and others on the topic of toxicity testing and assessment. For the second part, the committee was asked to develop a long-range vision and strategic plan to advance the practices of toxicity testing and human health risk assessment of environmental contaminants. The present report fulfills the first part of the two-part study. The second report is expected to be completed by fall 2006. The committee was asked to focus on human toxicology and was not charged with reviewing toxicity testing and strategies designed to evaluate ecologic effects of environmental agents.
The committee heard presentations from representatives of several EPA offices, other federal agencies, and a number of organizations at public sessions, and it considered numerous documents and resources. The committee structured its review by first considering current toxicity-testing protocols. Recognizing that human data can be the most relevant for human health risk assessment, the committee considered the various types of human data available and the impediments that often prevent the use of epidemiologic data in regulatory risk assessment. Testing strategies used to rank, screen, or characterize substances were reviewed next. Various guidance documents that discuss the use of toxicity data for human health risk assessment were then considered. Finally, the committee reviewed some near-term improvements in toxicity-testing approaches proposed by others and some emerging technologies that may advance the field of toxicity testing.
Most of the documents reviewed by this committee describe initiatives or proposals that are still under development. Some have few details, and some were available to the committee only as drafts. Therefore, the committee focused on major themes rather than details, and it reviewed the documents primarily to compare various overall testing strategies and to evaluate the potential for the strategies to improve testing of environmental agents. The committee primarily examined toxicity-testing strategies rather than protocols for individual assays. Regarding documents that included an array of issues, the committee focused on the sections that dealt directly with toxicity testing and strategies and did not review sections that discussed risk-assessment approaches and policy issues, which were considered outside the scope of the committee’s task.
The goals of toxicity testing are to identify possible adverse effects of exposure to environmental agents, to develop dose-response relationships that can elucidate the severity of effects associated with known exposures, and ultimately to predict the effects of exposure of human populations. Over the last several decades, scientists have developed consensus testing protocols, which have been designed to minimize variance and bias, to reduce false-positive and false-negative results, and to balance desired information with costs and resources. Some toxicity tests are designed to evaluate general toxicity resulting from exposures of various durations—acute, subchronic, and chronic—and others are designed to evaluate specific health effects, including reproductive and developmental toxicity, neurotoxicity, immunotoxicity, genetic toxicity, and carcinogenicity. Toxicity tests may also be distinguished by their objectives—to evaluate final outcomes of a specified exposure duration; to characterize the possible modes of action of such outcomes, which can depend on exposure route, concentration, and duration; to characterize dose-response relationships; or to identify a potential hazard, such as carcinogenicity from the results of a genotoxicity assay.
Testing strategies vary considerably, although they can often be described by three basic testing approaches: battery, tiered, or tailored. A battery is a specific set of toxicity tests applied to all chemicals in a group. Testing batteries are sometimes intended to provide the minimal dataset necessary for risk-based screening, regulation, or management. In tiered testing, the results of a specific set of toxicity tests and risk-management needs are used to guide decisions about the nature and extent of further testing. A substance is assigned to a category and then moves through a series of tests sequentially with the data from each test informing the next step in the process. In tailored testing, information on exposure, suspected adverse effects, and mechanism of action is used to determine the scope of tests to be conducted on a given chemical or class of chemicals. Characterizing an overall testing strategy as a bat-
tery, tiered, or tailored approach is often impossible because testing strategies are typically combinations of these three basic approaches.
The toxicity tests and strategies discussed in this report have evolved primarily as a means of characterizing potential human health hazards and dose-response relationships at least at high doses. The information produced is often judged to be sufficient for decision-making. For example, test results may provide reasonable assurance that a food additive or pesticide can be safely used as proposed. In contrast, if the difference between toxic doses and relevant human exposures is not large, further testing may be needed to refine the dose-response relationship at lower doses and to answer questions concerning the mechanism of action. Alternatively, regulatory action may be used to reduce human exposures.
Different testing strategies generally stem from legislative mandates or from differences in the practices of individual agencies or program offices. Thus, different strategies have developed to evaluate pesticides and food additives, to screen new industrial chemicals, and to investigate specific health effects, such as endocrine disruption. Different approaches can result in inconsistent testing strategies among agencies or categories of chemicals even if the ultimate regulatory goal is the same. The nature and extent of toxicity testing ideally should be guided by the regulatory risk-management decisions to be made and the assessments needed to support them.
Human data usually are not a part of toxicity-testing strategies despite the importance of human responses to potentially toxic agents. Although animal toxicity studies provide relevant information on potential adverse health effects of exposure to an agent, interspecies differences can cause effects relevant to the human population to be missed. A famous example is thalidomide, to which rats are highly resistant but human fetuses are exquisitely sensitive. Studying the human population also provides an opportunity to evaluate the effects of the full variety of agents in the complex contexts of workplaces and daily lives. Clearly, no population data will be available on a chemical newly introduced to the marketplace. Population data will be available only on chemicals that have been in production for some time, perhaps several decades. Thus, differences in data availability on new versus existing chemicals
should be considered in developing the role of human data in any toxicity-testing strategy.
Human data come primarily from epidemiologic studies, which investigate the relationship between exposure to a substance and potential health effects in a human population. Such studies have often been criticized because of methodologic limitations that make it difficult to draw clear associations between particular exposures and potential health effects. Components of epidemiologic studies that have posed problems include the assessment of exposure, which often involves only uncertain or indirect estimates of human exposure, and evaluation of exposure-effect relationships, particularly for chemicals for which there is an indeterminate and possibly long period between exposure and manifestation of effect. However, emerging technologies and approaches, such as biomonitoring and molecular and genetic epidemiology, may overcome some of the limitations and will be discussed in greater detail in the committee’s second report.
Use of Data in Human Health Risk Assessment
Data from animal toxicity testing, human studies, and in vitro methods are used in human health risk assessment to identify potential hazards, to characterize effects at different exposure levels, to determine the probability of adverse effects of given human exposure scenarios, and ultimately to establish environmental standards and exposure guidance levels. Regulatory agencies have developed noncancer and cancer risk-assessment guidelines that provide comprehensive guidance on use and interpretation of relevant data to set exposure limits to protect public health. In general, the guidelines for assessing hazard and dose-response relationships have coevolved with scientific developments and laboratory capabilities. In some respects, the data being generated correlate well with guideline requirements. In other respects, there is a disconnect between the data needed for risk assessment and the data generated in the laboratory or field. Three examples are provided below.
Typical cancer guidelines require direct evidence of cancer in animals or humans to classify a chemical as having carcinogenic potential. When such data are not available, the chemical is classified as having, for example, “inadequate information to assess carcinogenic potential”; cancer risk is not estimated; and the chemical is generally treated as posing zero cancer risk. A system for using indirect evidence, such as structure-
activity information and mechanistic data, could be developed to guide the assessment of chemicals that lack adequate cancer bioassay or epidemiologic data. Similarly, systems and guidance could be created for identifying a potential for neurotoxicity, developmental toxicity, and other kinds of toxicity on the basis of short-term tests and high-throughput approaches that use end points that are more specific to processes that are conserved across species.
For mutagenic carcinogens or carcinogens of unknown mechanism, estimating risk from animal data assumes that each individual faces the same risk of cancer at a given dose. A generic uncertainty factor is used in noncancer guidelines to adjust for variability among people. Testing strategies do not reflect a systematic approach for developing data to assess the variability of human responses to chemicals quantitatively. Such data would aid in understanding whether the current procedures for estimating cancer risk are conservative overall or may in some cases understate the risk for some segments of the population.
The generation of data for mode-of-action evaluations (with the exception of standard genotoxicity testing) and pharmacokinetic modeling is typically ad hoc. The data may be supplied by interested parties or otherwise available in the literature but are generally not required by the regulatory agencies. Although the guidelines may provide a loose framework for those approaches, they provide little specific guidance on data-generation issues. Optimizing further testing to improve the initial characterization of a particular chemical or class of chemicals can be highly context-dependent; however, a general framework and further guidance on developing a testing strategy to improve specific risk assessments would be useful.
Proposals to Improve Toxicity-Testing Strategies
The committee’s review of current toxicity-testing strategies reveals a system that is reaching a turning point. Agencies typically have responded to scientific advances and emerging challenges by simply altering individual tests or adding tests to the existing regimens. That patchwork approach has not provided a fully satisfactory solution to the fundamental problem, which appears to be a tension among four objectives that are difficult to meet simultaneously:
Depth, providing the most accurate, relevant information possible for hazard identification and dose-response assessment.
Breadth, providing data on the broadest possible universe of chemicals, end points, and life stages.
Animal welfare, causing the least animal suffering possible and using the fewest possible animals.
Conservation, minimizing the expenditure of money and time on testing and regulatory review.
The committee acknowledges that meeting all four objectives poses a substantial challenge.
Several agencies or organizations have evaluated various toxicity-testing strategies with the goal of addressing gaps and inefficiencies in current approaches. The following sections highlight the committee’s findings on proposals by EPA, the Health and Environmental Sciences Institute of the International Life Sciences Institute (ILSI-HESI), the European Union (EU), and the National Toxicology Program (NTP). More detailed discussion is provided in Chapter 6 of the committee’s report.
In its 2002 report A Review of the Reference Dose and Reference Concentration Processes, EPA reviewed its procedures for deriving reference values and specifically the adequacy of the toxicity tests to accomplish that purpose. The committee focused its review on Chapter 3 of the EPA report because that chapter directly addressed toxicity-testing approaches. The committee did not critique the other chapters on risk-assessment approaches and application of uncertainty factors, which were considered outside the scope of the committee’s task.
EPA’s report raised five major issues: (1) the presence of data gaps in current toxicity-testing approaches, (2) a possible need to refine acute-toxicity testing protocols to support short-term risk assessments, (3) concerns about methods to incorporate pharmacokinetic and pharmacodynamic data into toxicity-testing approaches, (4) questions regarding incorporation of data on direct dermal toxicity into reference dose (RfD) development, and (5) a need to reconsider current toxicity-testing strategies systematically with an eye to improving efficiency and effectiveness.
First, the committee agrees that there are numerous data gaps in life stages and specific health effects evaluated in current toxicity-testing approaches. Few data are available to determine the degree to which those gaps have practical significance for risk assessment or whether they are primarily of theoretical or academic concern. The committee cautions against adding testing requirements only for the sake of theoretical thoroughness, because such an approach could result in substantial waste of animals and resources with little gain. However, the extent to which the data gaps might have practical consequences for risk assessment should be evaluated, and a reasonable interim approach to address this problem should be generated. Modest changes in existing protocols could enhance the array of health effects and life stages evaluated, and the resulting findings could trigger more in-depth testing of specific outcomes and life stages where it is warranted. The committee notes that epidemiologic studies with reliable exposure assessments could shed some light on the likelihood that current toxicity tests are missing important health effects or are not adequate for evaluating different life stages.
Second, the committee agrees that the existing protocols for acute toxicity testing focus on lethal effects and gross observations and generally do not provide adequate information for acute and short-term RfDs or reference concentrations (RfCs). Conducting acute protocols that address latency, reversibility, and differential susceptibility for all toxicity outcomes currently required in subchronic and chronic protocols would lead to very complex animal studies. Before such complex protocols are conducted, acute lethality studies, repeated-dose toxicity studies, and human data should be evaluated to determine the need for the more complex studies and ultimately to guide the design of these studies.
Third, the committee agrees that generally little information is available on pharmacokinetics, including possible differences across life stages. It is critically important to define the purpose of pharmacokinetic studies to avoid the creation of data that are unlikely to be used and therefore represent a waste of animals, time, and resources. Additional data should not be routinely required, but the need should be evaluated case by case.
Fourth, the committee finds that the relevant exposure route and exposure durations should be considered in developing a testing strategy. When dermal exposure is a primary exposure route, there is a general need for better data on dermal uptake and absorption. However, it is important to consider whether skin is an important route of exposure before beginning the process of setting a dermal RfD. Worker data and clinical
reports could be collected more systematically and used preferentially in setting dermal reference doses of existing chemicals.
Finally, the committee agrees that a new strategy is needed to improve efficiency, reduce animal use, increase the number of chemicals screened for toxicity, and address some of the data gaps identified. EPA explored alternative testing protocols for acute and chronic toxicity testing to stimulate new ideas. It did not articulate how such protocols might be incorporated into a testing strategy. The committee supports the notion of expanded tests that combine studies to conserve resources and provide more in-depth evaluations of outcomes and life stages. However, considerable development and evaluation may be required to ensure that tests are feasible and reproducible, do not compromise study sensitivity, produce the desired data, and reduce the use of animals. Expanded bioassays may ultimately have a role in selectively testing high-priority chemicals but might not necessarily be amenable to widespread application.
ILSI-HESI Draft Proposals
The committee reviewed a testing strategy proposed by ILSI-HESI and various recommendations contained in its draft reports: Systemic Toxicity White Paper; Life Stages White Paper; and The Acquisition and Application of Absorption, Distribution, Metabolism, and Excretion (ADME) Data in Agricultural Chemical Safety Assessments. ILSI-HESI proposed substantive modifications of toxicity-testing requirements for pesticides and identified some potential omissions and redundancies in current pesticide testing. Recommendations included changing exposure durations of required toxicity tests, eliminating some required guideline studies, modifying some studies to enhance evaluation of specific health effects, and generating chemical-specific pharmacokinetic data to inform study design and data interpretation.
The committee supports the general approach used by ILSI-HESI to tailor testing to meet risk-assessment needs. Specifically, ILSI-HESI proposed using exposure considerations (such as the difference between doses that produce effects in animals and expected human exposure to pesticides) to provide a conceptual framework for guiding the selection and extent of testing. That approach, however, may not be useful for chemicals for which the degree and circumstances of human exposure are difficult to predict.
The committee supports the general ILSI-HESI approach of using existing databases to evaluate the importance of specific toxicity tests or their contribution to the dataset and endorses further broad retrospective reviews. However, the committee has concerns about the recommended elimination of some toxicity tests from first-tier testing. For example, ILSI-HESI proposed removing the rat teratology study and using an extended one-generation study and a rabbit teratology study to evaluate developmental effects. Although the proposed one-generation study substantially improves postnatal evaluation of many nonreproductive outcomes, it is unclear whether it would be as sensitive as a rat teratology study for prenatal developmental-toxicity outcomes or would adequately reveal the potential hazard and trigger a followup study. Furthermore, EPA often bases acute reference values on the rat teratology study. In contrast, postnatal effects in a one-generation study are not typically used for deriving acute reference values. The effect of eliminating the rat teratology study on hazard identification and on the setting of acute reference values should be evaluated if the proposal is pursued.
Overall, the changes proposed by ILSI-HESI may affect the probability of finding some effects and change the volume of evidence available to an assessor in judging the presence or importance of an effect. Cumulatively, it is unclear how the different aspects of the proposal would affect the overall fidelity of the testing process. The committee notes that the ILSI-HESI evaluation may have overlooked redundancy of testing as a critical part of the weight-of-evidence approach. More-limited testing and less redundancy could mean less confirmatory evidence and greater potential overall for reduced sensitivity of the testing strategy. Making decision-making more conservative, erring in the direction of false-positive results, or using greater uncertainty factors may address those issues. Corresponding adjustments of risk-assessment guidelines that emphasize positive results of multiple studies for confirmatory evidence also may address those issues.
The EU is engaged in a bold effort to restructure its approach to toxicity testing. The primary goal of the new approach, known as REACH (Registration, Evaluation and Authorisation of Chemicals), is outlined in the 2004 EU report The REACH Proposal Process Description. The goal is to collect data on and regulate about 30,000 chemicals produced or imported in excess of 1 metric ton per year on which there
are limited toxicity and environmental data. The new approach is based on production or importation volume, which serves as a surrogate of potential human exposure. It specifies a battery of tests or specific effects to be evaluated at each level without being prescriptive about how tests will be done. The committee notes that the approach enhances flexibility but may make comparison of results difficult. Also, although tonnage may be an initial rough surrogate of potential human exposure, other information (such as whether the chemical is an intermediate to which humans are unlikely to be exposed) may also be relevant.
The committee found that the REACH program focuses more on screening large numbers of chemicals than on generating in-depth information that is often needed for quantitative risk assessment. However, the REACH program does allow for greater depth of testing to be triggered on the basis of initial results. The REACH program has the advantage of generating at least some toxicity data on chemicals that are not now subject to testing in the United States.
NTP Roadmap for the Future
In its 2004 report The NTP Vision for the 21st Century, the NTP discussed its goals: to refine traditional toxicity assays; to develop rapid, mechanism-based predictive screens for environmentally induced diseases; and to improve the overall use of NTP toxicity-testing assays for public-health decisions. The NTP also described its current research initiatives:
To review and refine toxicity-testing protocols.
To incorporate new approaches, such as genomic analyses, into toxicity-testing strategies.
To improve the use of pharmacokinetic information in toxicologic evaluation.
To explore the use of nonmammalian alternatives to toxicity testing.
To expand the use of imaging technologies for detecting and quantifying molecular and cellular lesions and for improving the speed and precision of pathology reviews.
The committee found that the NTP’s near-term efforts to refine and extend its toxicity tests and to improve the use of pharmacokinetic information promise to increase the depth of toxicity information on chemicals
assayed and to provide greater insight in applying the findings to humans. However, as acknowledged by the NTP, the resulting portfolio would still be resource-intensive and incapable of addressing large numbers of chemicals that require some level of toxicity assessment. That problem emphasizes the importance of the NTP’s long-term goal to develop screening strategies that use nonanimal models. Such a focus by an agency like the NTP is needed if those approaches are to become viable alternatives to traditional toxicity testing in animals.
The committee identified several recurring themes and questions in the various reports that it was asked to review. The recurring themes included the following:
The inherent tension between breadth, depth, animal welfare, and cost of toxicity testing and the challenge to address any one of these issues without worsening another.
The importance of distinguishing between testing protocols and testing strategies as one considers modifications of current testing practices.
The need to be cautious in adding testing requirements for the sake of theoretical thoroughness.
The possible dangers in making tests so efficient, such as by eliminating all overlap, that there are no means to verify results.
The role of both uniform testing protocols and strategies to enhance comparability and chemical-specific tailored testing in deepening understanding of a particular chemical’s mode of action.
The importance of recognizing that toxicity testing for regulatory purposes should be conducted primarily to serve the needs of risk management.
The recurring questions that arose during the committee’s review and its initial observations are provided below. The questions and observations will help to frame the discussion for the committee’s second report, which will provide a long-range vision and strategic plan for advancing the practices of toxicity testing and human health risk assessment.
Which environmental agents should be tested? All new and existing environmental agents should be evaluated; however, the intensity and
depth of testing should be based on practical needs, including the use of the chemical, the likelihood of human exposure, and the scientific questions that such testing must answer to support a reasonable science-policy decision. Fundamentally, the design and scope of a toxicity-testing approach need to reflect risk-management needs.
How should priorities for testing chemicals be set? Priority-setting should be a key component of any testing strategy that is designed to address a large number of chemicals, and a well-designed scheme is essential for systematic testing of industrial chemicals on which there are few data. It makes sense to consider exposure potential in designing test strategies. Chemicals to which people are more likely to be exposed or to which some populations may receive relatively high exposures—whether they are pesticides or industrial chemicals—should undergo more indepth testing. This concept is embedded in several existing and proposed strategies. In some strategies, production volume is the primary measure of potential human exposure; but production volume alone may not be the best surrogate of human exposure. Other important factors to consider are use, exposure patterns, and a chemical’s environmental persistence and bioaccumulation, which is important because of the potential for increasing exposure over time and continuing exposure even after use has ceased.
What strategies for toxicity testing are the most useful and effective? Current approaches to toxicity testing include testing batteries, tiered testing, tailored testing, and a combination of the three. The committee finds that there are pros and cons of various approaches but leans toward tiered testing with the goal of focusing resources on the evaluation of the more sensitive adverse effects of exposures of greatest concern rather than full characterization of all adverse effects irrespective of relevance for risk-assessment needs. The committee, however, notes that tiered-testing approaches should be designed to expedite regulatory decisions and to discourage toxicity testing that is not used to address regulatory questions.
How can toxicity testing generate data that are more useful for human health risk assessment? Many have criticized existing approaches to toxicity testing on the grounds that the data generated are often not ideal for conducting human health risk assessment. Extrapolations are often made with weak scientific justifications, and uncertainty factors are used to bridge the gaps. The current proposals to improve toxicity-testing strategies, discussed above, are unlikely to solve the fundamental problem. The committee cautions against indiscriminately generating
large amounts of data with an eye to creating optimal datasets for characterizing risks posed by single chemicals. Emerging technologies and approaches, such as “-omics” technologies and computational toxicology, may help to address the problem.
How can toxicity testing be applied to a broader universe of chemicals, life stages, and health effects? There are major gaps in current toxicity-testing approaches. The importance of the gaps is a matter of debate and depends on whether effects of public-health importance are being missed by current approaches. However, it is impractical to test every chemical for every possible health effect over all life stages. The emphasis should be on chemicals that have the greatest potential for human exposure. The emerging technologies may help to screen chemicals more rapidly and to indicate a need for further testing.
How can environmental agents be screened with minimal use of animals and efficient expenditure of time and other resources? One strategy that can be applied to reduce animal use is the grouping of chemicals of similar structural class and the in-depth testing of only one or a few representative chemicals; risk assessments of all chemicals in the class would be based on the resulting data. In grouping chemicals, known modes of action should be emphasized. Such strategies should address any data needed to support application of study findings to other chemicals in the group. Newer approaches also have great promise.
How should tests and testing strategies be evaluated? Testing strategies may be evaluated in terms of the value of information they provide in light of the four objectives—increasing depth of knowledge for more accurate risk assessment; increasing coverage of chemicals, life stages, and end points; preserving animal welfare; and minimizing cost. In evaluating new tests and testing strategies, there remains the difficult question of what is to serve as a “gold standard” for performance. Simply comparing the outcomes of new tests with the outcomes of current tests may not be the best approach; whether it is will depend on the reliability and relevance of the current tests. Ideally, regulations and risk-assessment guidelines will evolve with testing capabilities and scientific understanding. That issue will increase in importance with greater use of screening approaches (for example, in vitro tests, gene arrays, and mode-of-action screens) that produce indirect evidence on both cancer and noncancer end points.