Read "Toxicity Testing: Strategies to Determine Needs and Priorities" at NAP.edu

« Previous: 4. DETAILED DESCRIPTION OF THE OPERATION OF AN ILLUSTRATIVE SYSTEM

Page 285 Cite

Suggested Citation:"5. FUTURE DEVELOPMENT, IMPLEMENTATION, AND REFINEMENT OF THE SYSTEM." National Research Council. 1984. Toxicity Testing: Strategies to Determine Needs and Priorities. Washington, DC: The National Academies Press. doi: 10.17226/317.

Page 286 Cite

Page 287 Cite

Page 288 Cite

Page 289 Cite

Page 290 Cite

Page 291 Cite

Page 292 Cite

Page 293 Cite

Page 294 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

5 FUTURE DEVELOPMENT, IMPLEMENTATION, AND REFINEMENT OF THE SYSTEM The priority-setting approach presented in this report would require further elaboration and further development of methods and data bases before it could be implemented fully. This chapter describes the developments needed to make the system operational and then discusses strategies for implementation and refinement. DEVELOPMENT The system as illustrated is potentially capable of screening tens of thousands of substances and determining--after specified information- gathering steps--for which of them it would be warranted to apply a battery of short-term tests or a long-term carcinogenicity bioassay. To a lesser extent, it could screen the same universe of chemicals to select those which should be tested for other health effects. Before this system could be implemented, however, the following steps would need to be taken: · Refinement of the estimated frequencies of outcomes of various information-gathering activities. For example, the numbers of chemicals from the universe that would fall into the chemical classes described in Table 5 should be better determined. · Refinement of estimates of the accuracy of the data elements in all stages of the system. For example, we need to consider further the power of an RTECS entry as a data element to reflect carcinogenic hazard. Perhaps more important, what should be the impact of a subjective determination of "high" exposure by an NTP staff assessor? Outside verification that the data elements and their operating characteristics are reasonable is needed. · Refinement of the choice of penalties for incorrect assessments of public-health concern. More attention needs to be given to the relative importance of false-positive and false-negative findings and their implications for decisions of whether to consider chemicals further. mere is a need to integrate the intuitive decisions about selecting chemicals with the decisions derived from the mathematical model of the system--and it is probably necessary to depart from both to find a satisfactory synthesis. · Expansion of the model to incorporate more data elements and outcomes. For example, both ratings of degree of concern (exposure and toxicity) and confidence in the ratings need to be included in the model as outcomes of Stages 2 and 3. The confidence information has not yet been included. 285

· Testing of the model with a wider range of data and, by exploring the sensitivity of the model to those data, establishment of a set of design rules that either are intuitively satisfying or can be explained as stemming logically from the data, instead of from limitations in the model itself. · Expansion of the guidance to system operators in designing Stage 2 minidossiers and Stage 3 dossiers and strategies to search for information. · Development of further guidance for the evaluation of Stage 2 minidossiers with respect to exposure and toxicity ratings and their corresponding degrees of confidence. · Determination of which health effects are worthy of treatment by the techniques presented here for carcinogenicity. This will require both a value judgment as to the severity of an effect and a scientific judgment as to the availability of tests, the number of toxic substances in the universe, accuracy of the tests, and the importance of classifying chemicals correctly with respect to such effects. · Development of data elements, estimation of their accuracy, estimation of the number of chemicals that cause a health effect, determination of misclassification costs, and development of corresponding decision rules for the additional health effects selected. · Expansion of the system to include all toxicity tests for which chemicals are being given priorities. IMPLEMENTATION A priority-setting system based on value-of-information analysis may be implemented by several possible approaches, and these vary considerably in the time and resources required. At the most modest level, the approach to implementation is qualitative. A greater emphasis on estimated probabilities (Appendix D) may help in the evaluation of the uncertainties that play such a fundamental role in priority-setting, design of tier testing, and characterization of risk of chemicals. The discussion of possible data elements (Chapter 4) may suggest indicators to help select chemicals in the initial stages of a priority-setting system, where information is especially fragmentary. The walkthrough with the example chemical bisphenol A (Chapter 3) may suggest ways for chemical managers to pick from among the many possible sources of information, especially for Stages 2 and 3. Without describing the priority-setting system quantitatively, much of the value-of-information analysis can be implemented with little extra expenditure of time and resources. In its most modest form, the computer-assisted Stage 1 could be foregone in favor of a smaller select universe of chemicals. 286

system: At the most elaborate level, a complete implementation could be attempted from the start. The program for the model could be rewritten for a mainframe computer to accommodate several effects, additional Stage 4 candidate tests, and more Stage 2 and Stage 3 data elements and possible outcomes. me resulting large number of factors could be estimated and the software developed for reading the files for Stage 1. This effort might require a couple of years and considerable resources. With a gradual approach, implementation would start at a modest level and increase. Files and software for Stage 1 would be developed in steps, perhaps without addressing the entire universe of 70,000 chemicals immediately. It would probably be useful to rewrite the computer program for the optimization model, adapt it to available computers, and familiarize the technical staff with the algorithm. Carcinogenesis and perhaps another health effect could be included initially. A small number of data elements could be evaluated and their accuracy estimated during the first year. In later years, these estimates could be refined on the basis of new data (some from the tests), and other data elements and toxic effects could be added. With this more gradual approach, the system could be implemented in the first year at a low cost. In the process of setting priorities, information is collected on toxicity and exposure; some of it is recorded in the dossiers (including minidossiers and microdossiers), and some is embodied in the estimates of accuracy (e.g., the estimated false-positive and false-negative rates). Additional information is developed by the testing program itself. Putting this information into a usable format is an important potential contribution of NTP. For example, it is very helpful to use false-negative rates for estimating rates of detection, especially for clinical tests with negative outcomes. me following tasks would be required, to implement the demonstration ~ Training of staff in probabilistic thinking and value-of- information analysis. · Training of staff in the special skills needed for dossier design and evaluation. · Definition of a universe of substances and preparation or augmentation of data bases with environmental chemicals, food constituents, pyrolysis products, etc. · Development of software for building computerized files for Stage 1 operation, including provisions for maintaining status lists ("dormant,n "on test," "Stage 2 minidossiers complete," etch. 287

· Design of forms and procedures for moving substances through Stage 3. For example, how often does the expert evaluation committee need to meet, if the required frequency is different from the current NTP schedule? · Design of a feedback and control system for continuous updating of the system. · Provision of an effective interface between the established agency nomination process and the proposed "long-list" process. · Expansion of the role of the expert committee to include estimation of degrees of exposure and toxicity, in addition to test recommendations. FURTHER REFINEMENT Once the system becomes operational, a number of refinements can be contemplated, as experience with its use accumulates. The whole design is built on a series of estimates, such as the estimates of the number of carcinogens among the chemicals considered and the accuracy of the selection stages. Although by the time of implementation these estimates will have been subjected to a great deal of examination, they still will be quite uncertain. Thus, the system must be tested and refined as new information becomes available. These are some of the questions to be addressed: · How many* chemicals are unstructurable? · How many have no RTECS listing? · How many have no listed production? · How many reach a dormant list in Stage 1? · How much does it cost to construct a minidossier? · How often does an assessor score a chemical as having "high" exposure or "low" confidence in the toxicity rating? · Which is the most useful information for the Stage 3 committees? How much does it cost to retrieve? For how many chemicals does one find useful information? · How many chemicals are retained for short- and long-term tests? · How many of those prove positive? *That is, what fraction of the universe? 288

· What is the distribution of test recommendations among various types of tests? After the system has operated for a while, the answers to these and other questions can be assessed for relevance to estimates used to design the system. Changes can be made as deemed necessary, and the model used to calculate an improved set of rules to decide whether to consider chemicals further. Another activity to be considered is a continuing sensitivity analysis of the system. What would happen if a decision rule related to an RTECS code were changed? Would more chemicals reach Stage 2? Could one spend less per minidossier and still get a reasonably effective assessment from the minidossiers? These kinds of questions, which could be addressed without committing any of the testing budget, might suggest new ways of allocating resources in the next cycle of operating the priority-setting system. One can anticipate continuing development of possible data elements. For example, various empirical structure-activity systems--such as those described by Craig and Enslein (1980), Hodes (1981), Tinker (1981), and Klopman (1983--might be applied in Stage 2, or perhaps in Stage 1. The accuracy and costs of such data elements need to be explored. The new data elements and corresponding decisions could be added to the priority-setting system, if it were worth while. EVALUATION A distinctive feature of the proposed model is that each of the estimates used in the model may be verified by results of the priority- setting system. For example, the estimate of toxic chemicals among the chemicals considered by the system predicts the percentage of toxic chemicals that would be discovered if absolutely definitive tests were conducted on all chemicals in a particular class. As tests are conducted, by NTP and others, it will be possible to validate these predictions by comparing them with test outcomes. Doing so allows refinement of the estimates used in the model, hence improving its efficiency. It also helps to identify better sources of information about these parameters. If estimates of accuracy prove to be poor predictors of performance, it may be because the users of the tests are overestimating or underestimating what the tests can provide. Evaluation is essential to good science and good priority-setting. Not only will an effort to measure performance of the system generate unique information, but the credibility of any scientific or policy-making process is increased if it invites responsible evaluation. Designing an evaluation scheme will not be easy. One must, for example, consider what constitutes "truth" for the purposes of evaluation when there is uncertainty about the interpretation of the results of tests in terms of health effects in humans. 289

Potential users of the recommended system should take care to forestall misinterpretations of its output. For example, it is clear that priority-setting for testing is quite different from priority-setting for other kinds of action. Failure to consider a chemical for further testing can have several meanings, including its having been exonerated and its being indistinguishable from many other chemicals about which little is known. Although exposure and toxicity are grounds for further testing, both are needed for health effects; and concern about one is not in itself a cause for regulatory action. POSSIBLE NEW DATA ELEMENTS A central feature of the approach to priority-setting recommended here is its flexibility. It can, in principle, accommodate any set of values, any substances, and any kinds of tests. The last-named capability is illustrated by the inclusion of a wide variety of data elements in the demonstration scheme. Once the components of the priority-system have been described quantitatively, evaluation of a proposed data element requires only an estimate of its costs and accuracy to determine its feasibility and to emphasize that the approach is not limited to the data elements of the demonstration scheme. This section describes some additional sources of information as possibilities, rather than proposals. More detailed investigation may show that some are not worthy of incorporation, whereas others may require only time to become widely accepted. THE USE OF SURVEYS AND EPIDEMIOLOGIC INFORMATION There are a variety of methods for obtaining toxicologic information other than laboratory tests. mese include analysis of existing health records and questioning of people in particular jobs or neighborhoods. me costs and value of information collected by these techniques vary greatly; however, their utility may be examined in terms of accuracy and cost, just as more traditional types of information have been examined elsewhere in this report. Populations that might be surveyed include members of the general public, persons at risk because of their occupation or lifestyle, and medical or health personnel. Each group could be asked questions to ascertain its perceptions on exposure or health effects. Using techniques from epidemiology (Buffler and Sanderson, 1981), psychology (Barker, 1963), safety research (Rentos and Kamin, 1975), and organization analysis {Mintzberg, 1975), investigators could study the daily routine of respondents to identify potential exposures. Such techniques could serve as a basis for analyzing the accuracy of surveys in which responses to survey questions are not supplemented by direct confirming observations. Considerable care would be needed to design 290

questions that were scientifically meaningful yet comprehensible to respondents (National Research Council, 1981; Buffler, 1982~. The benefits of exposure surveys might include the identification of aspects not covered by traditional approaches, where existing data are often proprietary, dated, or mute about chemicals that are created in the home or are encountered only as intermediates in an industrial process. Positive reports might influence priority-settinq by increasing estimates of potential exposure. They might also prompt further studies to clarify degrees or exposure or might Increase tne perceived need for toxicity testing. Because toxicity testing is intended to serve the public good, it could be guided by what the public considers to be important. Two kinds of judgments are particularly significant for the priority-setting process. One concerns perceptions of the relative harmfulness of various health effects; in general, the amount of resources devoted to the study of a health effect varies as a function of the degree of public concern about it. The second issue concerns the relative costs of errors in misclassifying chemicals. Although Appendix E presents arguments from economic theory, the process of choosing chemicals to test necessarily involves value judgments as to the relative costs of misclassifying a chemical as safe (false-nenative) and misclassifying it as harmful I ~_1 _: lo; W - ~\ O,~ ~ ~_~ ~ ~.~ ~ _~: ~ ~ ~ .~: ~.,~ ~ ~ _ `' "1~=-~'~^V=J · ~"LVC]= "' '~L" ~11= W"] V' "~=e=~'lly ~11= views w~ a representative sample of the public on such matters and comparing them with the views of technical experts. It need not be assumed that all disagreements between the public and experts about the nature of risk reflect poorly on either group. Research has shown that what appear to be disagreements about the magnitude of risk can often be traced to other explanations, such as differences in the definition of key terms (e.g., risk) or political values (Fischhoff et al., in press). DEVELOPING VE=-LOW-COST (VLC ~ TESTS Of the universe of about 70,000 chemicals considered by the illustrative system, it appears that there is no characterizing information on tens of thousands except chemical name and (usually) structure. The Chemical Abstracts Service list contains about 6 million chemicals, on perhaps 1 million of which there is little or no information. If 1% of the noncharacterized chemicals are highly likely to have important health effects, then it may be assumed that some 6,000-10,000 of them are of sufficiently high concern to be candidates for further testing. 291

These large numbers preclude intensive testing of any substantial fraction of the noncharacterized chemicals. We believe that it would be useful to consider development of very-low-cost (VLC) data elements (or tests) that can provide at least rudimentary screening of large numbers of chemicals. Because data (sometimes even structural information) are lacking on many chemicals, this screening would involve testing the chemicals (albeit briefly). Taking a "quick look" at chemicals would yield some new knowledge, in contrast with the use of computerized data scoring, which capitalizes on old knowledge. Indeed, comparing the results of VLC tests with the results of computerized screening would be a way of validating both methods and uncovering potentially interesting discrepancies. The results of VLC tests, like the results of low-cost screening, would permit a limited characterization of each chemical. The precision of VLC tests would necessarily be low. They would be used solely for screening, rather than for refining one's degree of concern. The ideal test in this regard is one that is most sensitive to the most toxic chemicals. Because chemicals about which this test generated no concern would not be considered further, the cost of false-negatives would be high, but lower than if no "testing" of this population had been done at all. Because the next stage for a chemical that generated concern would be consideration for low-cost testing, the cost of false-positives would be relatively low. These costs of misclassification make it possible to choose between candidate VLC procedures once one has an idea of their accuracy. One direction that seems promising, but that requires considerable additional analysis before it can be evaluated for development, is to use simplified versions of existing tests and thus save money by avoiding the methodologic refinements needed for definitive reliable results. As unsatisfying as such tests might be--in contrast with traditional, more sophisticated forms--they can serve a purpose in priority-setting. One requirement is that their results be able legitimately to alter expert opinion regarding degree of concern about a chemical. Whether the potential for such alteration justifies even their low costs can be addressed systematically by value-of-information analysis in the framework described in this report. Indeed, one of the singular features of this framework is that it allows a defensible way to address such questions. Most oncogenes appear to exhibit base-pair substitution; hence, a test like the Salmonella/microsome assay {Salmonella with S-9, but with duplicate plating at only a single dose of chemical) would be a VLC test. With a similar TA-98 strain, both base-pair and frameshift mutagens might be identified in a VLC test. This example is only illustrative, not definitive; the idea is to use a simplified, and hence less expensive, test carefully conceived to screen and identify likely toxic chemicals of high potency. Examples of other possible 292

screening tests are unreplicated skin-exposure tests and unreplicated acute-toxicity tests. VLC tests might be used in any decision stage to improve the decision-making process. Such a direct approach may be the only means of ensuring that materials of unknown structure and mixtures are given some attention. Once the chemicals of high concern are identified and carried forward, the low-cost tests may be less effective, because the remaining chemicals presumably will have lower toxicity or exposure potential and hence be more difficult to detect, in view of test variability. RANDOM SAMPLING The most disturbing circumstances facing those who must set priorities are the magnitude of the problem and the deficiencies in information. When ignorance about all members of a set of chemicals is equal (and total), then one has no choice but to treat them equally from a priority-setting perspective. All should be advanced to the next stage, or all should be left behind. The decision will depend much more heavily on the resources available for testing than on the concern engendered by having a large set of poorly understood chemicals. One response to this situation is to delegate responsibility to the scientific community as a whole, on the assumption that the diversity of interests in that community will ensure that troublesome chemicals quickly come to attention. Once some disturbing information gives cause for concern about a chemical, it can be treated more specifically. It might be argued that this leaves too much to chance, in view of the rate at which unanticipated health hazards have emerged in the past. An alternative approach is to admit to ignorance and to sample at random from the universe of chemicals. The expected value of such sampling can be calculated roughly by assuming that evidence of toxicity will be found at the estimated rate of prevalence of toxicity in the population. The sampling would, of course, provide a basis for reviewing and refining that estimate. Ideally, the resources so invested would justify increasing concern about some chemicals that would not otherwise be considered and justify decreasing concern about many more. In addition, they would aid in refining the decision rules of the priority-setting system and thereby improve its functioning. There is a small probability that sampling would produce major scientific surprises. A careful analysis could reveal the proportion of the overall testing budget that could be usefully devoted to this kind of development activity. The analysis would also indicate which sampling strategy is most efficient {e.g., what tests in what order and whether to use stratified sampling). Testing a randomly chosen sample might also reveal biases in the use of a universe restricted to chemicals on which there is already some information. 293

Next: 6. CONCLUSIONS AND RECOMMENDATIONS »

Toxicity Testing: Strategies to Determine Needs and Priorities (1984)

Chapter: 5. FUTURE DEVELOPMENT, IMPLEMENTATION, AND REFINEMENT OF THE SYSTEM

Welcome to OpenBook!

Get Email Updates