Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
EXECUTIVE SUMMARY Abstract: A "select universe" of 65,725 substances that are of possible concern to the National Toxicology Program (NrIP) because of their potential-for human exposure was identified. Through a random sampling process, 675 substances covering seven major intenued-use categories were selected. From this sample, a subsample of 100 substances was selected by screening for the presence of at least some toxicity information. In-uepth examination of this subsamp~e led to the conclusion that enough toxicity and exposure information is available for a complete health-hazard assessment to be conducted on only a small fraction of the subsample. On the great majority of the substances, data considered to be essential for conducting a health-hazard assessment are lacking. By inference, similar conclusions were made for the select universe from which the sample and the subsample were drawn. This report presents criteria for selecting substances and determining toxicity-testing needs, provides estimates of those needs, and describes some useful criteria for assigning priorities for toxicity testing. The potential public-health impacts of chemicals lead society to seek information for determining the probability and magnitude of such impacts. Such information is based primarily on predictions from results of toxicity studies. The development of a strategy for obtaining appropriate information requires an estimation of the quantity and quality of available toxicity data applicable to the assessment of human health hazard, as well as knowledge of the number of substances on which necessary experimental data are not yet available. A characterization of the magnitude of needed testing would be valuable to those who allocate resources for such testing. However, because resources for Developing sound scientific bases for identifying public-health hazards are limited, it is important to establish priorities among chemicals and to select those known or expected to have the greatest impact on human health. A major function of the National Toxicology Program (NTP) is the selection and testing of chemicals for toxicity. NTP has under continuing review candidate chemicals for testing, as they are nominated by the federal agencies served by the program, by state and local governments, and by academic, industrial, and labor groups. Such candidates are of interest to NTP because of their potential for human exposure and public-health impact. In September 1980, NTP, through the National Institute of Environmental Health Sciences, contracted with the National Research Council (NRC) and the National Academy of Sciences for a study. This study was undertaken because NTP recognizes that the number of substances, both natural and man-made, in the human environment is very large and is increasing, with no clear indication of the nature and amount of toxicity information that might be needed on these substances 1
to ascertain their potential for adverse effects on human health. It is useful for NTP to know as precisely as practicable the toxicity-testing needs for substances to which humans are potentially exposed. NTP asked NRC to address these matters in two major objectives of the study: (1) To characterize the toxicity-testing needs for substances to which there Is known or anticipated human exposure, so that federal agencies responsible for the protection of public health will have the appropriate information needed to anticipate the extent of testing needs. (2) To develop and validate uniformly applicable and wide-ranging criteria by which to set priorities for research on substances with potentially adverse public-health impact. The study, titled "Identification of Toxic ana Potentially Toxic Chemicals for Consideration by the National Toxicology Program, " was established in the Board on Toxicology and Environmental Health Hazards of the NRC Commission on Life Sciences. In this report, the Committee on Sampling Strategies and the Committee on Toxicity Data Elements describe in detail the criteria and procedures they used to determine the nature and extent of toxicity testing and their collective judgment on the testing needs for a "select universe" of chemical substances. The underlying strategy for the characterization of toxicity-testing needs involved four major steps: (1) Definition of the select universe of substances that might be of interest to NTP because of their potential for human exposure. (2) Drawing of a random sample of representative substances from the select universe. (3) Statistical analysis of the sample to determine the quantity and quality of available information and detailed description of testing needs for the sample. (4) Predictions, based on the sample analysis, of the testing needs for the select universe. The Committee on Priority Mechanisms presents criteria and a decision-making framework that could be usea to set priorities for research on substances with a potential for adverse public-health impact. 2
SELECT UNIVERSE OF SUBSTANCES According to an estimate based on the Chemical Abstracts Service (CAS) Registry, the universe of known chemicals consists of over 5 million entities. To define toxicity-testing neeas for substances in the human environment, it was necessary to select a manageable subset of the universe that would include most of the substances to which humans are likely to be exposed in the United States. The construction of this "select universe" of substances--a core that would be the reference for the study--relied on a search for lists of substances preselected for human exposure potential and computerized for reasonably easy access. A search for such lists revealed several that could be assembled to form the select universe, provided that most duplications of substances on the combined lists could be identified to permit statistical adjustments. The lists used included the Toxic Substances Control Act (TSCA) Inventory of 48,523 chemical substances in commerce; a list of 3,350 pesticides (active and ine rt ingredients) registered for use by the Environmental Protection Agency (EPA); a list of 1,815 prescription and nonprescription drugs approved by the Food and Drug Administration (FDA) and excipients used in drug formulations; a list of 8,627 food additives, including those approved for use by FDA; and a list of 3, 410 cosmetic ingredients from the Cosmetic, Toiletry and Fragrance Association. This select universe did not systematically include environmental decomposition products, manufacturing contaminants, or natural substances (e.g., plant pollens and foods). The sum of the above, 65,725 entries from the lists, was taken as the select universe for the purposes of this study. Statistical adjustment for duplications indicated that the select universe contained about 53,500 distinct entities. The Committee on Toxicity Data Elements and the Committee on Sampling Strategies regarded the contents of the select universe as closely approximating the expected universe of interest to NTP. CHARACTERI ~ I NG TOX IC I TV-TEST In NEEDS During the planning stages of the study, it was recognized that the Committee on Toxicity Data Elements would not be able to examine the information on all 53,500 substances in the select universe, because of limitations of available resources. Therefore, the Committee on Sampling Strategies developed a method for drawing from the select universe a sample that was (1) small enough to be thoroughly examined for completeness, quality, and utility of information within the limitations of available resources and (2) designed to permit an extension of the committees' findings from the sample back to the select universe from which it was drawn. With a stratified random process, 675 substances were selected from the 65,725 listings. A random subsample of 100 substances with at least minimal toxicity information (described in Chapter 2 of Part 1) as prescribed by the Committee on Toxicity Data Elements was then selected from the random sample. The select universe, the sample, and the subsample contained representatives of seven categories of substances 3
defined by the lists that make up the select universe: (1) pesticides and inert ingredients of pesticide formulations, (2) cosmetic ingredients, (3) drugs and excipients in drug formulations, (4) food additives, and chemicals in commerce, which were divined into (5) those with 1977 production of 1 million pounds or more, (6) those with 1977 production of less than 1 million pounds, and (7) those whose 1977 production was unknown or inaccessible because of manufacturers' claims of confidentiality. The sizes of each category in the select universe, the sample, and the subsample are presented in Figure 1. The lists of substances making up the select universe were compiled on the basis of intended use, rather than toxicity. The intended-use theme was preserved throughout sample selection, data analysis, and inference-making, and it is reflected in the conclusions. Some structural classes of chemicals might be "overrepresented" in the subsample (e.g., cottonseed oil, linseed oil, and peanut oil). However, the Committee on Toxicity Data Elements made no attempt to define the select universe by chemical-structure classes. Rather, it relied on the probabilities inherent in small random samples to choose an appropriate number of substances from each major chemical grouping. It is important to recognize that the subsample of 100 is drawn from the seven categories defined above and that these categories often include "inert" substances that are used to formulate "active" substances. Furthermore, these categories are not defined by chemical structure, so structurally similar substances in the subsample of 100 should not be combined for inference to the select universe. Similarly, inferences about a category should not be limited to the "active" substances in the category (e.g., drugs), but rather should be applied to all the substances (e.g., drugs and excipients in drug formulations) in that category. Nor is the subsample representative of substances involved in specially publicized episodes of toxic effect, such as those associated with thalidomide or dioxins. These kinds of substances are often selected for toxicity testing, because there is a particular interest in them--e.g., because some toxic effect has been observed. 'Thus, selection of substances that have already had some testing aoes not necessarily constitute random sampling of al1 possible substances in the select universe . The Committee on Toxicity Data Elements developed a well-structured, stepwise approach to the determination of toxicity-testing needs for substances in the select universe. This required agreement on a strategy for judging the adequacy of toxicity data, establishment of guidelines for assessing the quality of individual toxicity studies, and creation of a decision-making system for reviewing and evaluating the total data base on the hazard of a substance--its toxicity, its exposure potential for humans, and its chemical and physical characteristics. 4
Oth'C gt8't ,,~ , ~ , , .~d Og OOt OOL _ "I o , ~ ~ ~ I "` ~ ~ w s
Results of these efforts were applied to the subsample to establish the extent of additional toxicity testing that might be needed. Data on the sample and the subsample were then used by both committees to estimate toxicity-testing needs for the entire select universe. The Committee on Toxicity Data Elements judged the quality and completeness of the toxicity data base on each substance in the subsample. To ensure quality, available information was checked against established reference guidelines for toxicity-testing protocols (e.g., those of the Organisation for Economic Co-operation ana Development) that have been widely reviewed and generally accepted. The committee also relied on the accumulated experience and expertise of its members, whose combined judgment was used to determine the adequacy of an individual study if it did not meet the standards of the reference protocol guidelines. m e committee's determination of the adequacy of toxicity testing for conducting a health-hazard assessment was based on information derived from experiments performed according to the reference protocol guidelines and other information that met the committee's own basic criteria for evaluating scientific methods (described in Chapter 4 of Part 1~. Using this combination, the committee assessed the adequacy of the toxicity-testing protocols and the need for further toxicity tests in detailed evaluations, tabulations, and analyses for all substances in the subsample. m e committees recognize that regulatory agencies' standards and requirements for testing may differ from those used in this study. As analysis of the data bases proceeded, the Committee on Toxicity Data Elements established a working document with a standardized format, content, and method of reporting for each of the 100 substances in the subsample as the focal point for all document control efforts and all evaluations of testing adequacy. The working document or dossier became the unit of record for all committee decisions and actions. The approach developed to collect data on each substance included searches of open literature (primary sources) through automated, on-line data retrieval files; secondary-source literature, such as reference manuals and textbooks; government technical reports; files of U.S. regulatory agencies; and files provided by some chemical manufacturers and trade associations. The data obtained from searches of the primary and secondary open literature constituted the bulk of the information in the dossiers. Search strategies were carefully developed to ensure the most efficient screening of the data bases selected. The findings of the Committee on Sampling Strategies and the Committee on Toxicity Data Elements are based on analyses of the sample of 675 substances randomly chosen from the select universe and the subsample of 100 randomly chosen from the sample that had at least what the latter committee defined as prescribed minimal toxicity information. Some specific analyses are derived solely from the sample or the 6
subsample. Others are derived from combined information on both the sample and the subsample. Confidence limits are given for the results of analyses in Chapter 5 of Part 1. In some cases, the confidence limits are wide. The committees recognize that, despite extensive efforts to obtain all information, they might not have had access to results of some toxicity tests. Toxicity-testing information on the subsample of 100 substances was sought from industries and other interested parties via a Federal Register notice and by direct contact with manufacturers and importers of sampled chemicals in commerce, but some industrial information probably remained unavailable to the committees. Similarly, the committees were not able to examine toxicity, physical, and chemical information on cosmetic ingredients, drugs, excipients in drug formulations, and food additives that may be in the files of FDA, except in the case of food additives listed as substances generally regarded as safe (GRAS). The documentation for decisions about the quality of tests that have been conducted and about toxicity-testing needs lends particular strength to this study. Scientists have varied opinions about protocol guidelines for toxicity tests, about testing needs for specific uses of substances, and about grounds for claims of adequacy or inadequacy of a particular test as it was performed. Where such varied opinions are important, they can become an integral part of the decision-making process to provide new estimates of testing needs. In the context of this study, scientific judgments used by the committees were recorded and subjected to peer review in a flexible study framework that would accommodate changes in estimates brought about by the presentation of new data. QUANTITY AND NATURE OF TESTING It was recognized from the beginning that the quantity and nature of testing needs were such that they could never be fulfilled adequately only by the use of specific testing regimens. Although tests of substances will always be needed, a better understanding of the ''how" and "why" of toxic injury itself at the subcellular, cellular, organ, and whole-animal levels will be necessary in the future to fulfill the needs in the most efficient and economical manner. The Committee on Toxicity Data Elements used a battery of toxicity tests as the basic "measuring stick" for quantitation of testing needs. At the same time, it rejected the concept that every substance in the select universe required the adequate performance of a complete battery of toxicity tests for a human-health hazard assessment, even if that were practical. Thus, other criteria, including data from human exposures, were also used for judgments about testing adequacy. The Committee on Toxicity Data Elements recognizes that meeting the testing needs will require the establishment of priorities for the tests and the substances needing them. ?
In the seven categories of the sample of 67S substances, testing for acute ana subchronic effects was generally present more frequently than testing for chronic, mutagenic, or reproductive and developmental effects (see Table 7 in Chapter 5 of Part 1~. On the basis of an analysis of the randomly selected sample of 675 substances, 75% of the drugs and excipients in drug formulations in the select universe have at least some information on acute toxicity and 62% have information on subahronic testing. For pesticides and inert ingredients of pesticide formulations, these values are 59% and 51%, respectively. Testing was absent most frequently for substances on the TSCA list of chemicals in commerce, particularly for chronic, reproductive, and developmental effects. More specifically, substances in the subsample of 100 were most frequently tested with acute oral rodent studies and acute parenteral studies (see Table 8 in Chapter 5 of Part 1~. Except for drugs and excipients in drug formulations, the next most commonly conducted test was for genetic toxicity. Dermal and eye irritation studies had often been done with substances in the three production categories of chemicals in commerce. QUALITY OF TEST ING The Committee on Toxicity Data Elements tabulated the quality ratings from evaluations of a total of 664 tests of the 100 substances in the subsample, without regard for either intended-use category or type of test conducted (see Appendix H of Part l). When judged against currently accepted standards for toxicity testing, only 8% of the tests in the subsample met the standards of the reference protocol guidelines and another 19% of tests performed were judged to be adequate by the committee's standards. The percentages are based on the one study of highest quality when two or more studies of the same type were done. The quality of design, execution, and reporting of toxicity studies was not uniform among the various types of experiments. Some test types (acute oral administration in rodents, acute dermal application, acute eye irritation and corrosivity, guinea pig skin sensitization, and subchronic dermal application for 90 days) were deemed not to require repetition in most cases where they had been conducted. Four acute tests of substances In the select universe were often of acceptable quality: acute oral administration in rodents (831), acute aermal application (878), acute aermal irritation and corrosivity (81~), and acute eye irritation and corrosivity (76~. Fewer chronic test types were of acceptable quality; these included multigeneration reproduction in rodents (33~), carcinogenicity in rodents (528), chronic toxicity (38~), and combined carcinogenicity and chronic toxicity in rodents (50%~. Overall, more testing is needed for chronic toxicity than for acute toxicity. These findings should be viewed in perspective: the comparison is of simple acute tests with more complex chronic tests; 8
far fewer chronic tests were performed than acute tests; and, although the percentages themselves are high, they are derived, on the whole, from small numbers of evaluated tests, particularly in the case of chronic studies. Evaluation of individual study protocols by the Committee on Toxicity Data Elements was always accompanied by documentation of the reasons for the particular ratings given to the studies. For the most part, these reasons were statements of specific adequacies or inadequacies in a testing protocol. They were collectively tabulated for analysis to assess which deficiencies were most prevalent and what values should be placed on the deficiencies when the overall value of a study was assessed. Some of the more common deviations from reference protocol guidelines that nevertheless resulted in a test's being rated as adequate included the use of too few animals per dosage group, the use of too few or improper doses, and the absence of observations (e.g., in clinical chemistry or histopathology). In most cases, such tests were considered to have been conducted adequately because more information would not be expected to alter the conclusions, because existing data were sufficient to evaluate toxicity or calculate an acceptable LD50, or because doses were high enough to give positive results or exceed the limit test prescribed in the guidelines. The Committee on Toxicity Data Elements notes that reference protocols are Developed for general application before it is known which of their results will be important. The committee judged the quality of studies after they had been performed and in the light of the results obtained. Tests that were rated as inadequately conducted often lacked required observations (e.g., test animal description, diet analysis, chemical analysis, clinical chemistry, and histopathology), used too few doses, or lacked sufficiently detailed end points, such as data tabulation and statistical analysis of data. Occasionally, the committee recommended that these studies not be repeated, either because toxicity was sufficiently well established or because more information would be of slight value. TOX ICITY-TEST ING NEEDS For pesticides and inert ingredients of pesticide formulations, the Committee on Toxicity Data Elements considered 18 test types to be necessary according to the standards it adopted. In this category, all studies of acute oral administration in rodents were judged not to require repetition. Some of the 17 remaining test types needed repetition or were not done at all from 20% to 73% of the time. For cosmetic ingredients, this ranged from 67% to 100~; for drugs and excipients in drug formulations, from 25% to 60%; for food additives, from 33% to 80%; and for chemicals in commerce, from 45% to 100~. This information indicates that, for each category of intended use, substantial testing or retesting remains to be performed for all categories of substances if information gaps are to be 9
filled. The major gaps in testing result from failure to do tests required by the committee according to the standards it adopted, rather than from conducting tests Improperly (see Tables 12-18 in Chapter 5 of Part 1~. If the unknown amount of information that was not available to the committee had been available, the "untested" category would be somewhat smaller than reported here. In general, chronic studies, inhalation studies, and more complex studies with specific end points (e.g., neurotoxicity, genetic toxicity and effects on the conceptus) are most frequently needed. These were among the test types considered by the Committee on Toxicity Data Elements to be necessary for conducting a health-hazard assessment according to the standards it adopted. There are some differences in gaps in toxicity information from one category of substances to another To some extent, these may reflect the spectrum of individual tests that the committee prescribed as necessary to meet its criteria for adequacy of information in each category. The three greatest testing needs for health-hazara assessment of pesticides ana inert ingredients of pesticide formulations were in teratology, neurobehavioral toxicity, and genetic toxicity (see Table 19 in Chapter 5 of Part 11. For cosmetic ingredients, testing was found to be needed most for subchronic eye toxicity and subchronic neurotoxicity. A large variety of test types were found to be needed for drugs and excipients in drug formulations, for food additives, and for the three production categories of chemicals in commerce. HEALTH-HA;6ARD ASSESShENT The Committee on Toxicity Data Elements and the Committee on Sampling Strategies made judgments to describe their ability to make health-hazara assessments of substances in each of the seven categories of the select universe as complete, partial, or none. A complete health-hazara assessment was defined as one that provided a full estimate of hazard associated with the safe use of a substance. A partial health-hazard assessment was defined as one that provided a limited characterization of the hazard associated with the safe use of a substance. Therefore, a partial health-hazard assessment had a broad range extending from very limited (e.g., acute-toxicity evaluation by one route of administrations to almost complete (e.g., full acute- and chronic-toxicity evaluation, except for inadequate neurobehavioral-toxicity determinations. The estimates of percentages for health-hazard assessment combine data obtained from the sample of 675 substances used to measure the existence of minimal toxicity information and the 100 with minimal toxicity information that were examined for the quality of test protocols. Results of this analysis indicate not only the percentage of substances in each of the seven categories in which sufficient data of adequate quality are available for a heath-hazard assessment when judged against the current standards for protocols, but also the percentage that would require additional testing if an assessment were to be performed. 10
The overall status of toxicity information and of the ability to conduct a health-hazard assessment for each use category is presented in Figure 2. In general, proportionately more testing has been undertaken on pesticides and inert ingredients of pesticide formulations and on drugs and excipients in drug formulations than on other substances. In these two categories, 36% and 39% of substances met the requirements for minimal toxicity information, respectively. The Committee on Toxicity Data Elements judged it possible to make at least a partial health-hazaru assessment for 94% and 92% of the substances with minimal toxicity information in each of these categories, respectively. Cosmetic ingredients ana food additives have been somewhat less well tested. Minimal toxicity information requirements were met by 26% and 20% of substances in these categories, respectively, and at least a partial health-hazard assessment was judged possible for 62% ana 95% of the substances with minimal toxicity information in these categories, respectively. In contrast, only about 20% of the substances in each of the three categories of chemicals in commerce have minimal toxicity information. In all three categories, at least a partial health-hazard assessment was possible for about half the substances having minimal toxicity information. Virtually all the substances in the three subsampie categories of chemicals in commerce with minimal toxicity information would require additional toxicity testing if a complete health-hazard assessment were needed. The frequency and quality of testing of chemicals in commerce were not related to production volume. Chemicals in commerce produced in quantities of 1 million pounds or more in 1977 have not been tested more often or more adequately than those produced in smaller quantities. INTERPRETATION OF PHYSI COCHEMI CAL AND EXPOSURE DATA The committees attempted to relate the quantity and quality of toxicity testing of each substance in the subsample to breadth of known exposure, expected trends in exposure, physicochemical properties and chemical fate of the substance, strength of evidence of toxicity in humans, and severity of reported chronic human toxicity. In audition, the committees sought information on occupational and environmental exposure and attempted to relate it to the extent of testing needs. It became evident as the dossiers for the 100 substances in the subsample were examined that characterization of the substances with respect to each of these factors, when possible, was basea on scanty information. Most of the available information was on the physicochemical properties of the substances; the least was on exposure. However, no comprehensive method of gathering the needed information could be identified, and, in the end, the principal basis for characterizing exposure was the knowledge and expertise of the committee members. The immediate use of substances In the synthesis of new substances may not result in a reduction of exposure intensity, but will reduce the number of persons exposed. 11
Category Size of Estimated Mean Percent Category in the Select Universe Pesticides and Inert Ingredients of Pesticide Formulations Cosmetic Ingredients Drugs and Excipients Used in Drug Formulations Food Additives 3,350 3,410 1.815 8,627 Chemicals in Commerce: 1 2,860 At Least 1 Million Pou nds/Year Chemicals in Commerce: 13,91 1 Less than 1 M il I ion Pou nds/Year Chemicals in Commerce: 21 ,752 Production Unknown or I naccessible 24 _.... :.:: ; ~, , ~: ::: ::::: ::::::: ~:~:::::~:::::::~:~::::::::::::: ................ 10 .:~.~:~.~:::::i /// A::::::::::: ::. // [ ~- 2 26 38 2 14 10 18 56 18 18 3 36 5 14 1 34 25 ] 46 ....... ........ ....... ·-::::-:-:-:-: 1 1 i: 1 12 12 .::::: 777 ·....~... ~ A:::::: /// ...... 10 8 1 78 76 1 82 Complete Health Hazard Assessment Possible Partial Health Hazard Assessment Possible Minimal Toxicity I nformation Available Some Toxicity I Information Available (But below Minimal) No Toxicity I nformation Available FIGURE 2 Ability to conduct health-hazard assessment of substances in seven categories of select universe. 12 l
It should be emphasized that the stuay was Resigned to characterize the status of tox~city-testing needs for substances to which there is known or anticipated exposure, without regara in the selection process to the extent of that exposure. The 100 selected substances contained, in each category, few of the substances known to be produced in the greatest volumes. Hence, this Study may not provide an accurate estimate of the status of toxicity information on the principal substances to which humans are exposed. The following observations emerge from the committees' analysis: · Of the 100 substances in the subsample, 42 (those with at least minimal toxicity information) were considered to involve widespread exposure. An additional 14 were considered to have limited exposure potential, which would be intensive for specific groups. · Physiocochemical data on 20 of the 100 substances lea to a high concern about potential adverse human health effects. For 32 additional substances, the concern was moderate. · There was no relation between the amount of testing that had been performed and the degree of concern about a substance based on physicochemical information. Among the seven categories, it is the chemicals in commerce that have the smallest amount of information relevant to human exposure in the workplace and in the general environment. This is of particular concern, inasmuch as the primary motivation for testing chemicals in commerce is their potential for environmental and occupational exposure. The committees suggest that a coordinated effort be made to collect information needed to assess potential human exposure in the workplace and in the general environment. Development of analytic methods; systems for monitoring ambient air, water, soil, ana togas; personal monitoring systems; and highly sensitive and selective instrumentation for the evaluation of human exposure should be integral parts of this effort. APPROACHES TO PRIORITY-SETTI NG Part 1 of this report shows that, of tens of thousands of commercially important chemicals, only a few have been sub~ectea to extensive toxicity testing and most have scarcely been tested at all. Many other constituents of the human environment, including natural chemicals and various contaminants, are also potential
candidates for testing. Although it can be convincingly argued that many chemicals do not need to be tested, because of their low potential for human exposure or for toxic activity, it is clear that thousands or even tens of thousands of chemicals are legitimate candidates for toxicity testing related to a variety of health effects. Many government and private institutions have responsibilities for toxicity testing, and the National Toxicology Program has a special mission to develop testing methods and to fill in the gaps left by other institutions. However, the resources available to NTP for testing--whether expressed in terms of budget, staff, or facilities-- are limited. Hence it must decide which chemicals to test and which tests to perform. The need for priority-setting is especially acute for lifetime bioassays, which may cost up to a million dollars for a single chemical. Priority should presumably be assigned to chemicals and tests that are in some sense the most important. The Committee on Priority Mechanisms interpreted its charge to include defining "most important" and indicating how to identify the chemicals and tests that satisfy the definition. That is, the committee concluded that definition of the goal of the testing program was essential to designing a priority-setting system and that the goal would largely drive the logic of the design. Testing priorities have traditionally been assigned on the basis of expert judgment, which is now supplemented with a variety of analytic, data-based techniques, such as scoring systems. The committee believes that this basic pattern should continue, with further improvement in techniques to allow expert judgment to be most effective. The committee recognized that no priority system, scheme, or procedure can be perfect, because the knowledge needed for unerring selection of the most important chemicals and tests is the same as the knowledge resulting from a complete and accurate testing program for all chemicals, which would of course make priority-setting unnecessary. The priority-setting system and the testing program form a continuum whose overall objective is to yield information of maximal value about the overall hazards of chemicals. In examining traditional approaches, including expert judgment and mechanical priority-setting systems, the committee found some common themes that can be considered conventional wisdom and with which it agrees: · Long lists of candidate chemicals need to be reduced to short lists through screening, which yields increasing amounts of information on decreasing numbers of chemicals and possible tests. · me two key elements for screening are estimated human exposure and suspicion of toxic activity. (This priority-setting effort is oriented to human health and not to effects in other species, except insofar as they point toward human effects.) 14
· Chemicals that have already been tested adequately for a given effect are of low priority for further testing for that effect. Although documentation on the goals of most systems is somewhat vague, all systems seem to use the goal of reducing the uncertainty about the hazards of the population of chemicals in the human environment as rapidly as possible within the limits of available resources. The key elements embodied in this goal are hazard - (determined by both exposure and toxicity) and reduction of uncertainty. Testing is needed most where uncertainty is greatest; there is no need to continue testing for hazards that are already well known. The above objective not only seems to be the common denominator of current procedures for priority-setting, but is obviously a worthy goal, because it allows society the best chance to make decisions about chemicals that will reduce their hazards or at least to accept them with full knowledge of the magnitude of their hazards. Of course, there are other legitimate goals of the testing program and therefore of priority-setting. On the one hand, general improvement in the understanding of chemical toxicity is worthy in itself; on the other, testing directed at chemicals of great public concern to confirm or deny the concern, and thus reduce anxiety, is worthy. These goals are best addressed by subjective exercise of expert judgment and were not addressed further by the Committee on Priority Mechanisms. Given a goal for the priority-setting system, the committee needed to decide whether improvements over current procedures for selecting chemicals for testing were possible. It concluded that improvements were possible--at least at the margin--by injecting additional systematic information-gathering and -processing procedures. Many current procedures skirt the issue of the goal of the testing program and therefore are somewhat inconsistent in approaching the goal. In particular, the concept of the value of information is an important contribution to systematic priority-setting. In brief, this concept asserts that the value of any information-gathering activity, such as toxicity testing or searching for information on human exposure to chemicals, lies in the value of the resulting information in guiding decisions. The contribution of this concept is in making explicit that the goal of the testing program should be embodied in the priority-setting system. In the realm of chemical hazards, the "cost" of not knowing the degree of toxicity of a chemical (or the degree of human exposure to it) lies in misclassifying its hazard--e.g., believing that it is innocuous, when it is actually toxic. Maximizing the value of information about chemical hazards--or, equivalently, minimizing the cost of misclassification of them--is therefore essentially the same as the goal emphasized earlier. Thus, incorporating the va2ue-of-information concept explicitly in the priority-setting system provides an advantage, even though current procedures often use it implicitly, even if erratically. 15
The committee identified a second category of potentially important improvements. It includes provisions for validating some key estimates produced by the priority-sett~ng system and thereby allowing for a self-improvement cycle to modify the system accordingly. Current systems are difficult to validate, because they rarely yield estimates that are directly verifiable, but simply indicate which chemicals to test for which effects. The committee proposes to redefine the elements of the priority-sett~ng system to allow them to be checked. For example, an ideal system would estimate the percentage of chemicals that would yield positive results if tested; as experience accumulated, it would be possible to modify the system on the basis of the errors in the estimates. We note again that a priority-setting system cannot be free of errors in selecting chemicals for testing or in classifying them by degree of hazard when the test results are in, but any good system should reduce the frequency of errors as information accumulates and improvements become possible. With these broad kinds of improvement in mind, the Committee on Priority Mechanisms decided to outline an illustrative system that would incorporate the stated goal and general features, building on the experience of previous priority-setting procedures, but trying to make them more systematic, defensible, and robust. The committee recognized that its own resources were inadequate to develop a fully operational priority-setting system with all the desirable features, but it hoped to provide NTP with sufficient guidance and examples to enable it to improve its current selection system while adhering to its institutional operating principles. Several design principles became evident and were used by the committee in developing an illustrative priority-setting system: O Any rational system can be conceptually divided into stages, with more information on fewer chemicals and fewer potential tests in each succeeding stage. The committee describes how a four-stage system might be designed. In this system, Stage 1 acts as a coarse screen and depends almost totally on automated information sources; Stage 2 begins the use of expert judgment; Stage 3 relies heavily on traditional expert judgment with only minor changes; and Stage 4 is the testing program itself. o Both exposure and suspected toxicity considerations are useful in every stage of priority-setting. Information on either would necessarily be relatively crude (that is, there would be relatively little information about degree of hazards at early stages, but would be correspondingly less expensive to acquire. 0 Many indicators of exposure and toxicity are available--e.g., for exposure: production volume, use patterns, and persistence; for toxicity: chemical structure and the results of an acute test.
Whether or not to use a specific indicator at a particular stage of priority-setting depends on its cost, its individual value in characterizing exposure or toxicity, and its combined value with other indicators in characterizing degree of chemical hazards. · me performance of a system should be evaluated according to its ability to characterize the hazards of groups of chemicals, not only its ability to indicate test-no test decisions. · To accomplish test objectives, a system must take into account the frequency of occurrence of various properties (e.g., carcinogenicity) and of various indicator values (e.g., a positive result of an acute test) in groups of chemicals. · An ideal system would be capable not only of dealing with a relatively small number of chemicals nominated to NTP by agencies--as in current practice--but also of dealing with a much larger number of chemicals in the total select universe of concern (53,500, as stated in Part 1 of this report). This capability does not necessarily imply that NTP should be the entity that operates the long-list part of the system. · Expert judgment is essential for operation of the system beyond the earliest stage, where judgment enters into the design but not into the operation. Simply put, not enough is known about chemical hazards to specify a purely mechanical system, and humans need to integrate diverse data into judgments about the degrees of exposure and suspected toxicity. However, these judgments should be made at the lowest level of aggregation needed, because humans have Difficulty in integrating information and concepts that are far outside their normal range of experience. Beyond these broad principles, possible designs for a priority-setting system are multiple, and the specific choices for design--let alone operation--depend on expert judgments. The Committee on Priority Mechanisms offers in this report a possible design (admittedly sketchy and incapable of immediate implementation) that illustrates the key departures from current practice that seem warranted. The design may look unfamiliar, because of its description in mathematical terms, but it attempts to capture how a rational person or group would set priorities for testing if able to gather, assimilate, and integrate all the relevant pieces of information in a completely informed and objective manner. me reasoning behind the committee's design is presented in Chapter 2 and Appendix B of Part 2. This reasoning depends in part on a "model" of the system that allowed its operation to be simulated--in highly simplified form--under a variety of assumptions. The reasoning and system simulation are complex and may well challenge the reader; however, they allowed the committee to explore the overall performance 17
of various possible system designs with the goal of reducing uncertainties about chemical hazards--an evaluation that has not been typical in the past. The result of exercising the model is the simplified system description given in Chapter 3 of Part 2, in which the operation of the illustrative system is described from the viewpoint of an outside observer--what happens, but not why. Thus, Chapter 3 presents the illustrative system as a "black box," whereas Chapter 4 and Appendix B of Part 2 describe the "wiring diagram" for the interior of the box. The committee believes that a fully developed version of the outlined system not only is a plausible extension of current practice, but also would provide some improvements over existing priority setting procedures toward the goal defined earlier. Obviously, it might not provide improvements toward other goals, but it should not impede them. Even at the margin, the improvements would probably easily justify the costs of developing, implementing, and operating the system. However, the implementation of these concepts in the illustrative system or one of similar scope would require adjustments in the established patterns of thinking about testing priorities. Specifically, full application of the proposed analytic techniques will require that each information-gathering procedure be described quantitatively with respect to its ability to identify and to characterize potentially toxic chemicals. This requirement is not readily fulfilled in our present state of knowledge. Hence, efforts toward further quantification of the performance characteristics of toxicologic methods would be essential to full implementation of the priority-setting approach proposed herein. For this reason the approach can be pursued initially on a pilot scale, with further implementation depending on the development and availability of the necessary data. The committee believes that it should be possible to institute changes in current procedures gradually without irreversibly committing resources to the novel features of its suggestions. 18