Read "Toxicity Testing: Strategies to Determine Needs and Priorities" at NAP.edu

« Previous: 3. OPERATING POLICIES FOR IDENTIFICATION, ACQUISITION, AND ORGANIZATION OF DATA

Page 55 Cite

Suggested Citation:"4. DATA EVALUATION." National Research Council. 1984. Toxicity Testing: Strategies to Determine Needs and Priorities. Washington, DC: The National Academies Press. doi: 10.17226/317.

Page 56 Cite

Page 57 Cite

Page 58 Cite

Page 59 Cite

Page 60 Cite

Page 61 Cite

Page 62 Cite

Page 63 Cite

Page 64 Cite

Page 65 Cite

Page 66 Cite

Page 67 Cite

Page 68 Cite

Page 69 Cite

Page 70 Cite

Page 71 Cite

Page 72 Cite

Page 73 Cite

Page 74 Cite

Page 75 Cite

Page 76 Cite

Page 77 Cite

Page 78 Cite

Page 79 Cite

Page 80 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4 DATA EVALUATION THE DOS SIER CONCEPT The available information on each of the 100 substances was organized in a working document or dossier with a standardized format, content, and method of reporting. These dossiers were the focal point for the committees' operating policies, all document control efforts, and all evaluations of data adequacy. Each dossier was the unit of record for all committee decisions and actions. Each dossier contained a synopsis of the substance's physicochemical properties, manufacturing processes, production and consumption volumes, chemical fate, intended and other uses, and exposure potential; a summary of the toxicity data base; and a statement of adequacy of the complete data base. As the evaluation of the toxicity information progressed, additional documentation was added to each dossier until, in addition to the synopsis, it contained: · A summary of adequacy ratings for tests that were required for a substance's intended uses according to the standards adopted by the Committee on Toxicity Data Elements, as well as the tests required for occupational and environmental exposure, indicating which tests had been performed, the documents in which they were reported, and judgments of adequacy of the test protocols. · A summary of the amount and quality of all information in the dossier for assessment of the substance's potential hazard to human health. o Each complete document (or identifying first pages or English summaries thereof) dealing with the substance's human toxicity and exposure and evaluations of the toxicologic information contained in confidential files and, for each toxicity study: -- An annotated comparison of the study's protocol with the appropriate reference protocol guidelines ~~ ~~ chapter). [described later in this -- A cover sheet preceding each study and its protocol comparison identifying the type of testing reported in the paper, the committee's judgments of adequacy, and reasons for judgments. 0 A data sheet detailing chemical and physical properties, chemical reactivity in nonbiologic systems, bioavailability, analytic methods available for detection, and known uses and exposure. · A list of synonyms for the compound found in the Chemline, CIS, Chemname, RTECS, and TDB automated data bases and/or the Merck Index, Hawley's Condensed Chemical Dictionary, and the Cosmetic Ingredient Dictionary. 55

· The names of manufacturers listed in the TSCA Inventory. The major components of a dossier and their contents are presented in Appendix L. GENERAL PRINCIPLES FOR EVALUATION OF TOXICITY-TESTING PROTOCOLS An ideal data base on the toxicity of a chemical would contain enough information to permit the assessment of hazards and safety associated with anticipated use and other exposure. Toxicity information obtained from the experience of exposed humans usually is not available, and it is common practice to use information obtained from tests on laboratory animals. Deficiencies in a toxicity data base do not always invalidate the use of the information to predict at least some human health effects, but may reduce the certainty of a health-hazard estimate for that substance. The Committee on Toxicity Data Elements used three steps to develop a suitable approach to the determination of toxicity-testing needs. First, the committee reached an agreement on a strategy for judging the adequacy of toxicity data. Second, it established guidelines for assessing the quality of individual toxicity studies. Third, it created a decision-making system to review and evaluate the total data base on the toxicity of a substance. These three steps were used to determine the extent of needed additional toxicity testing for the subsample of 100 substances. Results were used to estimate testing needs for the select universe. Answers to three fundamental questions describe the adequacy of the toxicity data base on a substance: · What toxicity tests are needed for the substance? · What tests have been performed and how well have they been done? · Does the quality of the information permit assessment of the human health hazard? Although these three questions are fundamental to the overall procedure for evaluating the adequacy of a data base, several additional, more detailed or specific questions may be asked as each substance is examined: · Is there at least a minimal amount of toxicity information on the substance? · Is there exposure information on the substance? · Have all the tests identified as necessary been conducted? 56

· Has each required toxicity test been conducted in a manner conforming to reference protocols or, if not, did its quality satisfy basic criteria of scientific methods? ~ If so, are the nature and quality of test data adequate for the assessment of health hazard? · What documentation supports the conclusion that available data are of sufficient quality for a health-hazard assessment or that more tests are required? The committee developed a procedure for determining the adequacy of available toxicity information on a substance (see Figure 31. First, a substance was chosen from the select universe on the basis of the availability of minimal toxicity data, as described earlier. The next step was a search for pertinent information, as listed in Table 5, followed by a determination of intended uses. Next, on the basis of the category to which the substance belonged and the exposure settings, specific tests required to define the toxicity of the substance for each exposure setting were identified (see Appendixes B through G). After establishing which tests were required, the committee examined the available information to identify both the availability and the quality of the required tests. To estimate quality, the report of each test was compared with a set of reference protocol guidelines. Finally, the information was judged to be sufficient to assess the health hazard, in which case further testing would not be needed, or insufficient, in which case further testing would be needed. The committee not only used data from laboratory studies for hazard assessment, but also examined any epidemiologic studies and information on the extent of exposure to a substance. The committee felt not only that the results of animal experiments may provide guidance for planning epidemiologic investigation, but, more importantly, that animal data can be most valuable when epidemiologic evidence is weak, nonspecific, or relatively insensitive. Conversely, good epidemiologic data minimize the need for animal data. CONSIDERATION OF EXPOSURE Three exposure situations largely determine the type of potential hazard and hence the spectrum of data appropriate to evaluate a hazard: exposure via intended use, occupational exposure, and ambient environmental exposure. For example, food additives are meant to be ingested, cosmetics are applied to the skin, and drugs are administered in several forms by several appropriate routes. Humans can also be exposed to food additives, cosmetics, and drugs unintentionally during their manufacture and purification; during packaging, transportation, and storage before their intended use; and during disposal of residues and wastes. There are few intentional exposures of people to most pesticides 57

Select a substance for evaluation Committee determines whether all the tests identified as necessary for the following use or exposure situations have been done Exposure by intended use Occupational exposure Did the tests that were done follow reference protocols? 1 Environmental exposure Yes Are there factors that preclude a health-hazard assessment? Yes No No Is the information sufficient to allow a health-hazard assessment? r ~ Y;s No No further testing needed. Document and evaluate adequacy of information for specific use or exposure situations and types of tests Further testing needed. - Document and evaluate specific inadequacies of information for specific use or exposure situations and types of tests FIGURE 3 Outline of procedure for decision-making in evaluating adequacy of toxicity information on specific substance 58

TABLE 5 Information Sought in Exhaustive Literature Search for Each Substance in Subsample of 100 Information Category Information Chemistry Synonyms, trade names, structural formula, molecular formula, CAS Registry number, purity, identification and quantity of contaminants, melting and boiling points, specific gravity, vapor pressure, particle size, water solubility, Volubility in organic solvent, complexity of the chemical species, partition coefficient, pH, dissociation constant, shelf-life, stability, potential for undergoing oxidation and reduction, potential for undergoing hydrolysis under various pH conditions, photolytic reactivity, absorptivity, desorptivity Process Synthetic pathways (chemical origin, starting materials, stage of appearance in pathways, final product in pathways) Production Companies that produce substance, sites of production, quantity volume (per site total); percent imported, volume trend Use Percent produced for commercial use and for consumer uses, percent degraded, number and kinds of uses, unintentional release (during storage, transport, disposal, packaging, manufacture, industrial use) Chemical fate Demographic and geographic distribution, environmental pathway, environmental stability, turnover (half-life), degradation, persistence, partition (in soil, water, air), bioaccumulation, environmental transport, environmental bioavailability Human exposure Routes, form, mode (occupational, consumer, etc.), number exposed, frequency of exposure, extent of contact (each episode, total), dose and duration of dose (each episode, total), rate of absorption Toxicity Summary of all available toxicity information (see Appendixes B through G) 59

and many other chemicals in commerce, but exposures do occur during production, distribution, use, and disposal. The term "environmental exposure" is used to include all potential human exposures other than those related to the workplace or inherent in the intended use. The tests that the committee selected to support health-hazard assessments for substances in various classes of use are listed in Appendixes B through G. Batteries of required tests from among the 33 test types listed are identified for direct and indirect food additives (including colors), drugs and excipients in drug formulations (oral, parenteral, dermal, inhalation, ophthalmic, vaginal-rectal, over-the-counter, and veterinary), pesticides and inert ingredients of pesticide formulations, cosmetic ingredients, and other chemicals in commerce. To the extent feasible, the committee selected tests with routes of exposure similar to routes of exposure of humans under various circumstances. The Committee on Toxicity Data Elements recognizes that duration of exposure, as well as route, is intrinsically important in the manifestation and intensity of toxicity in test species and in the prediction of hazards to humans. It therefore incorporated duration of exposure--acute, subahronic, and chronic--into its selection of toxicity tests for predicting hazard. For example, if a substance is believed to be present consistently in common foods and lifetime exposure of humans is highly likely, data from chronic-feeding studies are appropriate for the substance. Similarly, if a substance is likely to be in the environment of women of child-bearing age, laboratory studies that investigate possible reproductive/developmental injury are appropriate for assessing hazards to humans. During the construction of the dossiers, the quality of a given toxicity-testing protocol was evaluated without regard for the different potential uses and different exposure settings of the substance. These two factors were taken into consideration during later judgments as to toxicity-testing needs for each substance. In the test summaries, different measures of quality might have been used for different exposure settings (intended-use, occupational, and environmental) because the adequacy of a protocol might vary with the setting (e.g., a protocol might be considered adequate for low-level environmental exposure, but inadequate for high-level occupational exposure). PURITY OF SELECTED SUBSTANCES Chemical purity is a nonquantifiable variable that must be considered in each evaluation, and some impurities might have toxicity very different from that of the selected substance. There are three reasons for such variability: (1) the names of some substances were not 60

clearly stated in the lists or by investigators studying them; (2) impurities might vary in composition or concentration with different methods of production or from lot to lot; and (3) some of the substances selected (e.g., vegetable oils) may contain other compounds (e.g., pesticide residues). Although this variability impedes attempts to attain consistency in judgments of adequacy, it would affect any other judgments of toxicity equally and might be useful to the extent that it reflects exposure of humans to similarly contaminated or undefined substances. The committee also recognized that exposure is often to mixtures of substances, rather than to single chemical entities. Mixtures have the potential for synergistic interactions that potentiate or antagonize the toxic effects of individual components. Whether special studies of toxic interactions are necessary for adequate evaluation of health hazards to humans is a matter of scientific judgment. In this report, the terms "chemical" and "substance" refer to any item that appears on any of the lists that constitute the select universe, although many of these items are not single chemical compounds. Undefined substances drawn from the select universe presented problems early in the review of the subsample of 100. Some of them were chemically so undefined (e.g., "solvent dewaxed, light paraffinic petroleum distillates") or were so variable (e.g., "zeolites containing calcium, iron, magnesium, or vanadium") that they could not be evaluated according to the established procedure. The statistical analyses and estimates reflect this procedure and inferences from the subsample apply strictly only to better-defined substances. This limitation does not apply to inferences from the sample. GUIDELINES FOR ASSESSING THE QUALITY OF INDIVIDUAL STUDIES BASIC CRITERIA FOR SCIENTIFIC METHODS The Committee on Toxicity Data Elements believes that it is not appropriate to judge the adequacy of past and future studies solely by matching them against protocols that are considered acceptable today. The committee suggests that a study be considered adequate for use in a health-hazard assessment if it meets the following basic criteria: · All elements of exposure are clearly described, including characteristics of the substance's purity and stability, and dose, route, and duration of administration. · Results in test subjects are predictive of human responses and test subjects are sensitive to the effects of the substance. In toxicity tests of a substance involving several species, data obtained 61

with the most sensitive species are often used for making health-hazard estimates. This is often a conservative approach. When metabolic activation is necessary to produce toxicity and there is evidence that the metabolic pathway in the most sensitive species is different from that in man or the target species, results in a species with metabolic pathways similar to those of man should be given particular _ consideration. · Controls are comparable with test subjects in all respects except the treatment variable. Depending on the study, appropr late controls may be positive, negative, or historical. Historical controls, however, rarely meet this criterion. · End points answer the specific question addressed in the study and observed effects are sufficient in number or degree to establish a dose-response relationship that can be used in estimating the hazard to the target species. · Due consideration in both the design and the interpretation of studies must be given for appropriate statistical analysis of the data. Although these criteria do not capture all potentially critical aspects of scientific judgment, the available data on a given substance may be considered of adequate quality if tests have been performed and reported according to these basic scientific principles. Several additional factors, although not often critical in deciding whether a given test is adequate, are highly desirable and should be taken into account: · Subjective elements in scoring should be minimized. Quantitative grading of an effect should be used whenever possible. Sometimes, this is not feasible, as when pathologists attempt to judge the nature and extent of a malignant neoplasm. Such evaluations depend on the _ experience and training of the pathologists. · Peer review of scientific papers and of reports is desirable and increases confidence in the adequacy of the work. · Reported results have increased credibility if they are supported by findings in other investigations. · Similarity of results to those of tests conducted on structurally related compounds increases credibility. · Evidence of adherence to good laboratory practices improves confidence in the results. SELECTION OF REFERENCE PROTOCOLS The quality of individual toxicity tests may be assessed by answering the question: Does the quality of the information permit a 62

health-hazard assessment that is acceptable? In recognition of the need for accepted and reproducible standards, the committee chose as its first step in the qualitative evaluation of toxicity data on a given substance a comparison of the study with a reference protocol. Because a requirement for inclusion of the substances in the subsample was the existence of minimal toxicity information, there were no selected substances without some information for the assessment of the quality of testing protocols. However, for each substance in the subsample, some toxicity information was missing or some data were derived from studies that did not meet the reference protocol guidelines. A comparison of available tests with reference protocols, combined with the judgment of the committee relative to the basic criteria of scientific methods, enabled the categorization of substances with respect to the quality of toxicity-testing protocols. In selecting reference protocols for judging the quality of individual studies, the committee used various resource documents on short-term and long-term toxicity testing, with emphasis on those constructed through national and international collaborative efforts. The committee identified the reference protocols of the Organisation for Economic Co-operation and Development (1979, 1981), the Interagency Regulatory Liaison Group (1981a, 1981b, 1981c, 1981d, 1981e), and the National Research Council (1975, 1977a, 1977b, 1980) as the most appropriate in this regard (see Appendix H.). It should be understood that it was not the committee's intent to endorse any particular test protocol. Rather, on a pragmatic basis, particular tests were selected as appropriate for judging the adequacy of testing of chemicals. Although over-rigid protocols are impractical, the reference protocols provide descriptions of standard test methods with sufficient detail to establish a basis for sound study design while permitting flexibility where scientific judgment was advantageous. The committee used the most current documents, sometimes with changes or additions based on its own judgment, as presented in Appendixes I through K. The committee believes that these modifications and additions will be useful for future development of a data base for heath-hazard assessment. A published document describing each modified test system is cited in Appendix H. Not every toxicologist might agree on every detail in the guidelines, but only reference protocols widely reviewed and generally accepted were used in this study. The list is not intended to reflect the attitudes or practices of regulatory agencies. Because some toxicity reports did not contain terminology directly compatible with the specifications of the reference protocol guidelines, it was often necessary to make judgments on whether the study adequately followed the guidelines. In general, these judgments were relatively easy to make and engendered little or no controversy within the committee. 63

Reference Protocol Guidelines for Neurobehavioral-Toxicity Tests The committee recognized that the neurotoxicity-testing protocols developed by the OECD (Organisation for Economic Co-operation and Development, 1981) are appropriate only for evaluating the neurotoxicity of organophosphorus compounds. These protocols cannot be used to evaluate mammalian neurotoxicity for other substances, nor are they appropriate for studying functional behavioral changes produced by substances other than organophosphates for which no specific neural lesion has been identified. The OECD expert group on neurotoxicity also recognized this matter and, at its meeting in April 1982, took two actions: it changed the titles and scopes of the neurotoxicity tests proposed in the OECD guidelines to reflect their applicability only to organophosphorus compounds, and it recommended the development of guidelines for more general neurotoxicity testing. There was a consensus in the OECD group that neurotoxicity testing should include ..~;~1 ~=ha`F;^r=1 ==c-.c~m~nt.n outside the laboratory holding facility and neuropathologic examination of various neural tissues after in situ perfusion. me Committee on Toxicity Data Elements agrees. ~ . . ~ ^ in. . ~ _ i_ . . ~ . _ _ _ _ ~ For delayed-neurotoxicity tests of organophosphorus compounds, the committee used previously established reference protocols. For other classes of compounds, a detailed protocol for neurobehavioral-toxicity testing has not been completed and approved by OECD. Therefore, the committee adopted for its own interim use an alternative set of protocol guidelines that have attained some degree of general acceptance in the scientific community (Appendix I). Reference Protocol Guidelines for Genetic-Toxicity Tests After the start of this study, the OECD drafted guidelines for 10 genetic-toxicity tests. These were later adopted by the committee. The lo tests were the Ames Salmonella/liver microsome reverse-mutation assay, Escherichia cold reverse-mutat~on assay, rodent micronucleus assay, in vitro chromosomal-aberration assay in mammalian cells, sex-linked recessive-lethal assay in Drosophila melanogaster, forward gene-mutation assay in mouse lymphoma L5178Y (TKT' ) cells, rorwara gene-mutation assay in Chinese hamster ovary (HGPRT) cells, forward gene-mutation assay in Chinese hamster V79 (HGPRT) cells, in viva chromosomal-aberration analysis in rodent bone marrow, and rodent dominant-lethal assay. The committee also adopted a policy of judging the testing protocol of each genetic-toxicity study for its adequacy and then of judging the overall adequacy of all genetic-toxicity protocols for- a given substance according to the requirements described in Appendix J. 64

PROCEDURES FOR EVALUATION OF THE DATA BASE INITIAL CONSIDERATIONS Existing information was evaluated against two sets of criteria to judge its quality and completeness. The first set was a series of reference protocol guidelines that have received widespread review and general acceptance. This array of protocols was selected not as the most reliable and efficient group of tests, but rather, by convention, as the best available for chemical-safety assessments. The second set of criteria was based on the accumulated experience and expertise of committee members, whose combined judgment was used to determine the adequacy of an individual study if it did not meet the reference protocol guidelines. The second set of criteria was established by the committee in the expectation that the data bases of only a few substances would meet all the requirements of the reference protocol guidelines, partly because much toxicity information was generated before the guidelines were developed. The committee expected that sufficient data might often be available for evaluation, even though some toxicity information would be missing and some data would be derived from experimental designs other than those prescribed in the reference protocols. Therefore, the committee intended that its determination of the adequacy of toxicity-testing data for conducting a health-hazard assessment would be based sometimes on information derived from experiments that followed the reference protocols and sometimes on other information that met the committee's own subjective criteria for evaluating scientific methods. Using this combination, the committee assessed the adequacy of the toxicity-testing protocols for all chemicals in the sample. me committee felt that the evaluation of toxicity data bases to predict hazard to human health must be approached with caution and flexibility. In general, data from properly conducted animal studies are often predictive of the degree of hazard to humans; however, for individual substances, such laboratory investigations may be misleading with regard to target organ, potency, or type of effect. Thus, expert judgment to ensure the proper use of all available data is an essential part of each analysis. For example, the metabolism of a toxicant may differ between test species and humans in ways that produce false-negative or false-positive results with regard to possible human hazard. The appropriate test battery may be incompletely performed, but there may be other data, such as extensive information on the mechanisms of action in several species, to obviate a need for additional tests. And data from human studies, both epidemiologic and clinical, may be essential in deciding whether to conduct a test on a substance merely for the purpose of completing the recommended battery of tests for that substance. For example, there may already have been human studies and exposure of sufficient breadth and sensitivity to reduce the need for toxicity studies in laboratory animals, or clinical studies may have 65

detected skin sensitization or toxicity so that similar investigations in laboratory models would be unnecessary. To the extent feasible, therefore, the committee analyzed data available from human experience (including case studies and retrospective and prospective epidemiologic studies) to delineate the need for further testing. The nature of the substance examined might also affect the type and amount of toxicity tests required to assess human health hazard. If natural products were examined in a rote, rigid fashion, they might appear to be inadequately tested; however, a long history of widespread use without reported toxicity might suggest that no additional testing is needed, even though most recommended tests had not been conducted or had not been conducted according to the reference protocol guidelines. Alternatively, it is not always appropriate to assume that toxicity data are adequate and of satisfactory quality just because a substance is a natural product or has a long history of apparently safe use. Furthermore, adequate toxicity testing of a substance in the intended-use setting is not always a sufficient basis for concluding that there are adequate data on occupational exposure (such as industrial exposure during its manufacture) or environmental exposure derived from its liberation during use, disposal, or destruction. An example of such a substance might be a drug intended for one-time or very limited use, but on which additional information might be needed to evaluate its potential toxicity for the workers who are chronically exposed while they produce or package it. The committee's judgments on the quality of a substance's toxicity-testing protocols involved complex decisions. Substances were considered case by case on their own merits. Adequacy Ratings Evaluation of the quality of the toxicity-testing protocols that have been used for each of the 100 substances in the subsample required that the information be obtained, assembled in a documentable form, reviewed, and judged for adequacy of the data base. The Committee on Toxicity Data Elements always used studies of highest quality, even when other studies of the same test type were done. However, the committee recognized that very few studies would be performed according to current guidelines and developed a ranking system that assessed the quality of the toxicity-testing protocols of each study: · G. for a study that was performed according to current reference protocol guidelines. · A, for a study that was not performed according to reference protocol guidelines, but was nevertheless adequate for conducting a health-hazard assessment. o IN, for a study protocol that was inadequate for conducting a health-hazard assessment, but was judged not to need repetition. This rating was assigned in either of two situations: 66

(1) Where the available information was deemed to be sufficient to allow an assessment of health hazard of the substance, tests not done or done inadequately were considered by the committee to be no longer required or not to require repetition. (2) When the observations of one test type not done or done inadequately were encompassed in the observations of another test type done adequately, the former test type was considered by the committee to be no longer required or not to require repetition. · IR, for a study protocol that was inadequate for conducting a health-hazard assessment and judged to need repetition. · C, for a test of indeterminate quality whose adequacy could not be assessed. This rating was most frequently given to abstracts, review articles, or other reports in which protocols were not fully described. Measures of Adequacy Guidelines As discussed above, toxicity tests that did not precisely follow reference protocol guidelines may still have been judged adequate if the committee determined that the deviations were not important. For example, if the number of animals used in a study was short of that specified by the appropriate reference protocol guidelines, but the results were so definitive that the addition of more test animals would almost certainly not have affected the conclusion, the study was considered to be adequate for conducting a health-hazard assessment. This example illustrates that adequacy is necessarily judged in the context of results; the fact that many tests were judged to be adequate despite deviations from reference protocols does not mean that the reference protocols are unnecessarily rigid. Sound protocols are most important precisely when results are less than clear-cut--a matter that cannot be known at the time the protocol is selected. Other judgments involved evaluations of chemical and physical properties, study design, study execution, selection of dose or exposure, statistical analysis, and reporting completeness. Studies acceptable in those respects, but deficient in one or a few guideline specifications, were likely to be judged adequate. Studies that deviated from several or many guidelines were judged to be inadequate. Precise reasons for these expert judgments could usually be given only case by case, and they varied with test type (e.g., subchronic toxicity, reproductive/developmental toxicity, skin and eye irritation, and carcinogenicity). However, some of the more common deficiencies for all types of toxicity testing were in numbers of animals used, extent of histopathologic examination, extent of clinical-chemistry laboratory measurements, mathematical treatment of data, scoring systems for irritancy, and survival of animals in long-term studies. 67

Evaluation of the toxicokinetic properties of chemicals presented special problems. A completely adequate toxicokinetic study would include not only data on absorption, distribution, and excretion, but also identification of metabolites of the substance and their distribution and excretion. Seldom, if ever, were all these data presented in a single publication; such information was more typically published in a series of communications (by the same or different authors). In most cases, the one or two toxicokinetic studies reviewed by the committee may have been adequately performed, but they did not in themselves constitute an adequate--i.e., complete--toxicokinetic study. Thus, an assessment of adequacy for data on this item referred to the whole of available information, not to any single report. Assessment of studies on reproductive/developmental toxicity requires the interpretation of results and effects on the conceptus. mese are described in Appendix K. Frequently, only one study related to developmental effects of a substance was found. Such a study was considered to be adequately conducted if it was a "limit" test--showing a lack of developmental toxicity at a dose of at least 1,000 mg/kg of body weight--so that no further testing of developmental effects may have been necessary. However, in the absence of a "limit" test, the study was considered to be inadequate for determining potential hazard to the conceptus if the highest dose reported did not produce an adverse effect on the dam or if the lowest could not be reasonably interpreted as a no-observed-effect level (NOEL) for the conceptus. Once the minimal dose or concentration of a test substance needed to produce overt toxicity in the adult was established and the NOEL in the conceptus, later studies might be judged adequate for identifying the most sensitive species to be used for interspecies extrapolation and health-hazard estimation. Superseding all such individual variations was the overall assessment by the experts on the committee, taking into account not only the types of variation and their magnitudes, but also the potential for interactions among the variables. DOSSIER REVIEW PROCESS During the review process, each document in each dossier was evaluated according to the criteria developed above. The evaluations were recorded in the form shown in Table 6 with the reasons for assigning a specific study to a particular rating category. This permitted the acceptance of convincing experimental protocols that did not meet every detail of the guidelines and provided an opportunity for identifying serious flaws that diminished the credibility of the results. Dossier analysis required that a very large amount of information be collected, documented, reviewed, and judged for adequacy within a short 68

a) £ U o ., o he ~lo' UO I' x a, O EN ~ ~ o 0 in Ed o o o Pi O Q in ~ H En EN 1 ·,' 8 at; Q ~ O CQ a) EN .,, O ~ m u, a) a) U] · - u' Q Q4 I) Q a) 0 V] ~ Q. Did - set 3 ._, ~ ~3 .,' .,' 0 ~ O 0 ·' ~ ~ ~ 11 ~ O ~ ~ O Z ~ :^ ~ O ~ Q4 ·' O S ~ ~ ~ aJ ~ "l 3 ~ ~ ~ ~ ~ ~ 11 ~ ~ ,1 a Q4 t~ ;~; ~ U] 0 (1) H 0) (g U] ~ ~ ~ _ o V 0 ~a' ^ a, - 0 ~J ~ ~ U] 0 ~1 a~ ~v _d ·- u 0 O u, a) ~ <1) t? tl5 a, ~ ~ ~ ~ ~ O ~ ~ ~ ~ S 0 a ~ mc ~ ~ ~ ~ 1 a) -t Ll .~ o v U] 0 U] ~ 0 ~ 69

time. Once the data were obtained, the method and depth of their evaluation by the committee and the consistency of judgment had to be determined. The judgments made by the committee were coded and recorded in preparation for subsample analysis and extrapolation to the select universe. The committee recognized that the quality of its comprehensive literature search and its detailed evaluations of the data bases would be the most important determinants in estimating testing needs in the select universe. Recognizing that this was a large and important task, the committee established five working panels, each with a designated leader and two or three other committee members. Assignments were not considered effectively random, because several substances were selected by panel leaders who were familiar with the substances' toxicity data bases. Remaining substances in the subsample were assigned in rotation among the five panels (20 each), including substances from each of the seven categories. Each group then had responsibility for reviewing the data bases. At a series of planning meetings, the panel leaders collectively established standard procedures for data review and evaluation and developed practices to ensure consistency in decision-making. The review process was time-consuming. The committee recognized early in the second year that it could not carry out the entire review itself on a volunteer basis. Therefore, to expedite the process, the initial phases of the review were carried out by NRC staff and consultants. It is estimated that these initial phases required about 1.5 scientific person-years of effort. The procedure for the review of the data base on a substance consisted of the following steps: · Each document was individually reviewed and compared with the appropriate reference protocol. o A summary sheet was prepared for each document, outlining the pertinent details of the protocol, assigning a preliminary ranking for the quality of the study and reasons for this judgment, stating the nature of the document (abstract, review, etc.), and stating which of the prescribed 33 test types was (were) reported in the document. · In many cases, the quality of individual study protocols was determined by individual panel members who had applicable expertise. All reviewers were required to document their findings and to provide reasons for them based on the criteria established by the committee. Such documentation was especially important when a reviewer had intimate knowledge of a substance. The dossier prepared by NRC staff and consultants, including these judgments on individual report, was reviewed by the appropriate panel leader, who then presented it to the panel members for review and modifications deemed necessary. Twelve panel-approved dossiers were discussed by the entire committee to ensure that there was 70

concurrence in the approaches used. The other 88, after review by a panel's leader and members, were reviewed by a subcommittee of at least five designated committee members. Important issues concerning any dossier were placed before the entire committee. Otherwise, judgments of the subcommittee concerning review of dossiers from the panels were regarded as final. The above process ensured that each Recision with regard to the quality of every study was reviewed at three levels: by a panel chairman, the panel's other members, and either the entire committee or its designated review subcommittee. The relevant data base was present or easily accessible at each step of this multistage review process; that allowed the panel leader, the members of each panel, the subcommittee, or the committee to conduct an independent review of the original material when any person deemed it necessary. The committee recognized the need to maintain uniformity and to ensure quality in the review of documents. Standardized procedures for documenting decisions regarding data adequacy provided quality control for decision-making. Variations in the consistency of decisions were reduced first by judging a study's adequacy against the uniformly applied set of reference protocol guidelines. These standards were used for studies of the same test type across all substances and by all persons making the Judgments. In effect, all reviewers were making measurements with the same yardstick. Deviations from the guidelines were then noted according to the scheme presented in Table 6, so that one person's reasons for judgment on a chemical could be examined by others making similar judgments on other chemicals. To ensure consistency, the five panel leaders often compared their reasons for judgments on the quality of protocols. Because committee members often had to exercise scientific Judgment when information was inconclusive, it was necessary to provide mechanisms to document their judgment and to ensure that they remained consistent and that testing protocols and other information were always judged as uniformly as possible. The system of multilevel review described above was designed to reduce the errors and differences involved in the committee's use of scientific judgment. The process led to decisions of whether further testing of a given substance was needed. Before such decisions were reached, the committee considered the types of exposures to substances likely to be encountered, their chemical and physical characteristics, their manufacturing processes, their production volumes, their uses, their chemical fates, their toxicity in animals, and their potential or known toxicity in humans. The committee's detailea evaluation of the data base for each substance in the subsample included determination of the adequacy of each required toxicity test specified in Appendixes B through G. The completed dossiers collectively were used as the committee's record to characterize the subsample. 71

The committee analyzed the decisions about the quality, quantity, and extent of the subsample's toxicity data base to assess the toxicity- testing needs related to the larger select universe. This extrapolation was a joint effort of the Committee on Toxicity Data Elements and the Committee on Sampling Strategies. The tabulations and interpretations of the evaluated data bases were used as a bridge for applying statistical inferences derived from the subsample to the select universe from which it was drawn. LIMI TATIONS OF THE DATA GATHERING PROCESS The approach developed to collect data on each substance included searches of the open literature through automated, on-line data files, such secondary sources as reference manuals and textbooks, government technical reports, files of the regulatory agencies (where available), and files provided by some chemical manufacturers and trade associations. The data obtained from the searches of the primary ana secondary open literature accounted for the bulk of the information in the dossiers. Search strategies were carefully developed to ensure the most efficient screening of the selected data bases. However, some of the data bases failed to include the most recent research. The degree of accessibility of government agency files to the committee varied. Some information was obtained from the regulatory agencies, and several research reports from military sources yielded useful information, especially on exposure of humans. At times, confidential data were made available to selected NRC staff members or to specific committee members. However, nonconfidential health and safety data embedded in commercial confidential files possessed by the FDA Bureau of Drugs were unavailable (see Appendix M for further detail). Responses from manufacturers and trade associations were also mixed. A few manufacturers were extremely cooperative in providing information that supplemented the open literature; however, the total amount of information from this source was relatively small. The committee believes that some relevant but unreviewed toxicity information, especially of a confidential nature, exists in the files of manufacturers. Evaluations of the 100 substances in the subsample therefore were based largely, but not exclusively, on published data or other publicly available information, which may be somewhat short of the amount and diversity of data contained in the confidential files. The absence of specific information from the dossiers reflected both the inaccessibility of some data bases and the lack of relevant testing. Data not available for the committee's confidential review are presumably not available for legitimate review by other interested parties; hence, in an operational sense, they do not exist. 72

Most of the exposure estimates were based on intended uses and knowledge about products that contain the materials of interest. Some of the occupational- and environmental-exposure estimates were based on production volumes, environmental fate, and disposal data, but few data of these types were available. Other kinds of information that were rarely encountered concerned production trends, production processes, and percentage of total production allocated to each intended end use. More information of this type would have contributed greatly to estimates of exposure. Again, it was assumed that much of this information exists, but access to it was limited or restricted. The committee found little or no epidemiologic information or information on environmental fate (e.g., biodegradation and bioaccumulation) for most compounds in the sample. The data base was limited by the paucity of information on toxic effects in humans. Because observational studies on humans are expensive and involve special difficulties, they cannot be undertaken routinely. Even if extensive resources became available, it would be impossible to acquire conclusive data on many possible outcomes under all different conditions of exposure. Epidemiologic studies involve factors that are different from animal toxicologic protocols. Investigators must know not only the chemical and physical properties of substances and the quantitative and qualitative toxicity data from studies in animals, but also the extent of human exposure, its intensity, and other qualities of exposure that are needed to conduct an adequate study. It is necessary to define pathologic end points or effects, define a control population, conduct followup or retrospective studies, ensure that there is a suitable exposed population with enough exposed subjects to provide reasonable statistical power, and develop mechanisms to minimize or quantify sources of confounding or bias. Sometimes, epidemiologic studies in different settings have each contributed information to increase the credibility of a cause-and-effect relationship, but are flawed because exposure to the substances under study and exposure to some other possible cause of the same end point have occurred simultaneously. For these reasons, the committee did not assess the adequacy of most observational studies of humans, but it did consider information from case reports citing adverse effects in humans and, for some classes of substances, data from human sensitivity tests and available epidemiologic information. Extensive data might exist on human exposure to some substances (e.g., drugs) with intended uses limited to a few exposures in a lifetime. Limited toxicity testing might be adequate for such end uses, but insufficient for developing safety standards to protect industrial workers producing or using the materials or medical personnel who might handle them frequently in the course of their professional activities. Very few data on potential occupational or environmental exposure were accessible. 73

For a substance to be reviewed, it had to be well defined, readily characterizable, and identifiable. Thus, some large classes of substances (such as plant products, minerals or ores, and unidentified mixtures) were excluded from consideration in the subsample. They were, however, included in the sample. Three deficiencies of the TSCA Inventory as a source of chemicals in commerce were most apparent during sample selection and evaluation: · Poorly defined chemical mixtures in the sample were not sufficiently uniform in composition or could not be sufficiently characterized to determine the extent of toxicity testing performed, much less its adequacy. · Some substances, according to manufacturers, were no longer in production in 1977 and therefore were no longer "chemicals in commerce." · Hany companies listed in the Inventory as manufacturers of chemicals in 1977 claimed never to have mace those chemicals, although in some cases they hat made related substances. Therefore, there was little information on many chemicals in commerce, and it was often impossible to determine whether a substance had minimal toxicity information. Substances are often selected for toxicity tests because there is a particular interest in them (e.g., because some toxic effect has been observed). Thus, selection of substances that have already had some testing must not be considered a random sampling of al1 substances in the select universe. INTERPRETATION OF DATA ON TESTING QUALITY Although the presence of toxicity information on each of the 100 substances in the subsample and the reviews of that information may have some intrinsic value, the reviews were not intended to provide specific information concerning the need for additional testing of these specific compounds. They served only as a basis for inference about the select universe and the seven categories of substances in it. The substances in the select universe are themselves a nonrandom sample of all substances in the entire universe of chemicals known or used at a specific time Mid-1981. Because of the manner in which toxicity testing can be conducted and has been in the past, some groups of substances were not included as specific classes among the substances in the select universe. Examples are some natural products of largely undefined nature or chemical structure, some mixtures of chemicals, and some industrially used chemicals of variable composition. 74

Furthermore, the review reported herein was not undertaken as an "audit" of the adequacy of past regulatory policies or procedures, and the data developed in this study are not likely to be immediately useful for regulatory purposes. For example, some regulatory decisions might have been based on information developed from long-standing exposure data not available to the committee during its evaluation of the toxicity data base. It would be inappropriate to judge past decisions against current standards of toxicity evaluation. The committee also recognizes that regulatory standards must be set in accordance with law and federal agency policies. These standards are based on more than toxicity data, and regulations, once set, cannot be lightly or easily changed on the basis of a modest increase in information about effects. In some cases, regulatory decisions are based on proprietary information that is available only to the concerned agencies. CHARACTERIZING THE SAMPLE AND OPTIONS FOR DRAWING INFERENCES TO THE SELECT UNIVERSE In the second year of operation, the Committee on Toxicity Data Elements worked with the Committee on Sampling Strategies to identify encoded data in the dossiers that were critical to analysis of the sample and to extrapolation of the analysis to the select universe. Some of the basic critical data in the dossiers described the toxicity tests conducted and their quality. Descriptions of this nature--combined with information on intended use, physical and chemical properties, and potential exposures--were tabulated by the two committees in the third year of the study. The tables serve as springboards to more detailed analyses of the categories in the sample. Algorithms for these analyses were developed by the Committee on Sampling Strategies. The select universe was sampled in two phases. First, independent systematic random samples were drawn from each of the seven categories. The components of each sample were arranged in random order and then examined one by one for the existence of minimal toxicity information (as defined for each of the five intended-use classes, described earlier) until a specified number of substances with at least minimal toxicity information were identified. The substances were not reviewed further if the literature search uncovered less than the prescribed minimal toxicity information or if the substances were so ill-defined as to preclude evaluation. The latter group included substances in the select universe whose names referred to sets of substances (e.g., "alkyl derivatives of dimethylbenzylammonium chloride") with possibly different toxic properties, on which it would be impossible to characterize the quality ~ ~for this reason or toxicity information. The substances set aside remained in the sample and are an important part of the data base for evaluating the extent to which substances meet the minimal-toxicity-data criteria defined by the Committee on Toxicity Data Elements. 75

CONSTRUCTION OF TABLES FOR ANALYSIS By examining data in both the sample and the subsample, the Committee on Sampling Strategies could obtain an estimate of the amount and quality of information on toxicity testing of substances in the select universe. m e process of examining these substances generated other information of interest in the examination of toxicity testing. Some of these data, such as the frequency with which a given toxicity-test type could be found for each of the 100 substances in the subsample, were available from machine-readable files. Other information, such as the quality of reporting of toxicity tests or the ways in which the Committee on Toxicity Data Elements determined the adequacy of a given toxicity test, was not suitable for statistical analysis, but is presented in a qualitative form in the conclusions and recommendations. STATISTICAL ANALYSI S OF DATA The dossiers on substances in the subsample provided sets of tabulations, measures, and various descriptive items of information that the Committee on Sampling Strategies used to estimate the testing adequacy for substances in each of the seven intended-use categories of the select universe or in other well-defined sets of substances in that universe. ESTIMATES BASED ON THE SAMPLE ALONE To estimate the properties of substances in a well-defined category of the select universe and the variances of those estimates for the sample, the Committee on Sampling Strategies adopted procedures that allowed for the use of all available information on substances in any well-defined subset of the select universe. All substances in the select universe could be placed in different combinations of the original seven categories. For any substance, j, it is straightforward to determine the probability, pj, of selection into the sample. Because samples were drawn independently from each of the seven categories, i, the probability 1 ~ ~ ~ ~ ~ 1~ - pi, of not being selected is precisely the product or the probabilities of not being selected from any category, so that if Pij is the probability that substance j is selected from category i, then 7 pj 1 ~ =1 ( 1 Pij) 76 (1)

Note that, if substance j is not a member of category i, then Pij = 0, so the value of pj is unaffected by that category. The subcategory to which any substance belongs is defined by the set of categories of which it is a member. There are 64 possible combinations of categories (2 x 2 x 2 x 2 x 4), i.e., 63 possible subcategories with the exclusion of the one classification of chemicals that are not in the select universe because they do not fall into any of the categories. Some of the 63 possible subcategories may, of course, contain no substances from the select universe, and other subcategories may include substances from the select universe but none from the sample. For any specific analysis of the sample, the subcategories of interest are first determined. For example, an analysis of all substances on the list of drugs and excipients in drug formulations could include as many as 32 subcategories defined by being on that list but on or off the lists of pesticides and inert ingredients of pesticide formulations, food additives, cosmetic ingredients, or chemicals in commerce. Similarly, an analysis of the entire select universe from which the sample was drawn would include up to 63 subcategories. In this discussion, h = subcategory (h = 1,2, . . ., 63) and Nh = number of substances in subcategory h. Let H be the collection of subcategory members, h, that are of interest for a particular analysis. Let xh be the mean (here, the proportion) of some property of the sample substances in subcategory h. Then an unbiased estimate of that proportion over the whole set of subcategories, H. is ~ Nh H N xh' where N = ~ Nh, and the variance of this estimate is x H x~ (2) . (3) Unfortunately, this ideal formula is not usable in practice, because limited resources precluded exhaustive searches for duplication of substances among categories, so the values for Nh are not precisely known. Information on category-to-category duplication is, however, available for all items in the sample and, for all compounds, j, that were actually found in subcategory h, can be used to estimate Nh as ~1 N = ~. (4) h josh Pj 77

Replacement of Nh/N by the estimate Nh/ INh introduces an additional source of variation in x, in that, where ~ is the mathematical expectation value, Var (x) = E Var (x|{Nh}) + Var E (x|{Nh}) e (5) However, the committee believes that the second term in Equation 5 is likely to be small. Because, in all cases presented in the tables of this report, Xh (and hence ~) is the estimated proportion of substancesAwith a specific property, when nh is the sample size in subcategory h and Ph = Xh is the proportion observed in subcategory h, a- = Ph(1 - oh)/ x, . (6) Equation 6 assumes that the selection was essentially equivalent to a simple random sample, as discussed previously. Thus, the computing formulas that the Committee on Sampling Strategies used for estimates based on the sample are ~ H ~-^ 1 h _E Nh ~ and r 2 ~ H NAh . ~ me sample sizes for any category are sufficiently large, with the estimated proportions not too near O or 1, that the sample distribution for a category is approximately normal. Therefore, 90% confidence intervals are presented--that is, intervals with at least 90% probability of including the true value of the proportion, estimated as p- 1.65 J~ and ~ + 1.653~. ESTIMATES BASED ON THE SUBSAMPLE ALONE Most of the tables presented in this report give estimates of the proportions of substances in a category that have specified characteristics. Such estimates are based on the sample of substances selected from the corresponding segment of the select universe. Some substances belonging to a category do occur in the subsample selected from other lists and satisfy the minimal-toxicity-information screening criterion for the specified category. It would have been possible to use these additional sample substances in making the estimates; that would probably have resulted in somewhat smaller sampling errors. However, because the screening procedure was not identical 78

for each intended-use category, it would have introduced biases into the results. The Committee on Sampling Strategies therefore decided to base estimates for each category solely on the sample selected for that category. This limitation of analysis further implies that it is not valid to use the results given here to derive estimates across categories in the final sample. Such combined results can be calculated, but would require the preparation of special tabulations. The probability of selection within a category is constant, so an unbiased estimate of the proportion of screened substances in the ith category that could be evaluated and that have a given characteristic is pi = xi/ni, where ni is the sample size and xi denotes the number in the sample that have the characteristic. If Pi denotes the true proportion, the statistic U; has the Bernoulli distribution with parameters Pi, nit It is then possible to calculate two numbers, Li and Ui, as functions of ~ and ni, so that the probability that Li ~ Pi < Ui is at least 9 0% . These are the conf idence limits shown in the tables. ESTIMATES BASED ON BOTH THE SAh1P LE AND THE SUBSAMP LE Some estimates in this report make use of information provided by both the sample and the subsample. Such estimates are for the proportions of some category in the select universe that both satisfied the screening criteria and have one or more additional specified characteristics. The point estimate of such a proportion is the product of two factors that are statistically independent as a consequence of the sampling procedure. The first factor is the estimated proportion that satisfies the screening criteria for the category and is based on the sample alone. As statea above, its distribution is approximately normal, and its variance has been estimated from the sample results. m e second factor is the estimated proportion, among substances that satisfy the screen, that also have the specified characteristic; it has a Bernoulli distribution, and lower and upper confidence limits have been calculated. Confidence limits for the product of the two factors were approximated as follows. Let x denote the first of the two factors referred to in the preceding paragraph, and y the second. Where X and Y denote the mathematical expectations of x and y, respectively, note that the variance of the product of independent variates x and y is given by 2 2 2 2 2 2 2 = ~ ~ + X ~ + Y xy x y y x (9) Because the distribution of y is asymmetric in general, estimates of the lower and upper confidence limits are computed separately. To calculate the lower confidence limit, foxy in Equation 9 is calculated by replacing by with (y - ~/1.65, where Ly denotes the lower limit for Y. and X and Y in Equation 9 are replaced with their estimates x and y. The lower limit for the 79

product is then calculated as xy - 1.6S ~y. For the upper limit, By in Equation 9 is replaced with (Uy - y)/1.65, where Uy denotes the upper limit for Y. The upper limit for the product is then calculated with this new estimate of oxyas xy + 1.65 xy The very substantial costs and demands required to amass and analyze data on 100 substances in a short time permitted a subsample of only 100 substances for the seven categories of the select universe. As a result, confidence - intervals are large. Any analysis of the data should be based on an awareness of the limited statistical precision of results from the subsample. MACHINE-READABLE FI =S To facilitate data analysis, information on the substances in the sample and subsample was assembled in machine-readable files. The presence or absence of the five types of toxicity tests in the minimal-toxicity- information screen (acute, subchronic, chronic, reproductive/developmental, and mutagenicity) was tallied. In addition, the entire list of substances in the sample was scanned to determine which of the seven intended-use categories contained each substance. These 12 items and a numeric identifier for each item in the sample were entered into the computer for analysis. Dossiers compiled on substances in the subsample contained substantially more items of information than were available on the sample. These items were tallied to provide measures of the amount and adequacy of data available on substances in the subsample. The seven intended-use categories were expanded into partially overlapping subcategories, as listed in Appendixes B through G. A more complete roster of test types was available, and the test protocols for each test type deemed necessary for a substance's designated subcategories of intended use were evaluated for adequacy. Chemical and physical properties of each substance were sought, as well as its manufacturing process or processes, production volume, potential for exposure, and environmental fate. Overall judgments, such as the ability to assess the potential hazard to human health, were determined. Although much of this information was already in tabular form, some items in the dossier were descriptive. This type of information, primarily nontoxicologic, was intended to assist in assessment of potential for hazard. Although the presence or absence of this information was recorded for later numeric analysis, no judgment of the quality of the nontoxicologic data was made. 80

Next: 5. RESULTS »

Toxicity Testing: Strategies to Determine Needs and Priorities (1984)

Chapter: 4. DATA EVALUATION

Welcome to OpenBook!

Get Email Updates