| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 285
5
FUTURE DEVELOPMENT, IMPLEMENTATION,
AND REFINEMENT OF THE SYSTEM
The priority-setting approach presented in this report would require
further elaboration and further development of methods and data bases
before it could be implemented fully. This chapter describes the
developments needed to make the system operational and then discusses
strategies for implementation and refinement.
DEVELOPMENT
The system as illustrated is potentially capable of screening tens of
thousands of substances and determining--after specified information-
gathering steps--for which of them it would be warranted to apply a
battery of short-term tests or a long-term carcinogenicity bioassay. To
a lesser extent, it could screen the same universe of chemicals to select
those which should be tested for other health effects. Before this
system could be implemented, however, the following steps would need to
be taken:
· Refinement of the estimated frequencies of outcomes of various
information-gathering activities. For example, the numbers of chemicals
from the universe that would fall into the chemical classes described in
Table 5 should be better determined.
· Refinement of estimates of the accuracy of the data elements in
all stages of the system. For example, we need to consider further the
power of an RTECS entry as a data element to reflect carcinogenic
hazard. Perhaps more important, what should be the impact of a
subjective determination of "high" exposure by an NTP staff assessor?
Outside verification that the data elements and their operating
characteristics are reasonable is needed.
· Refinement of the choice of penalties for incorrect assessments of
public-health concern. More attention needs to be given to the relative
importance of false-positive and false-negative findings and their
implications for decisions of whether to consider chemicals further.
mere is a need to integrate the intuitive decisions about selecting
chemicals with the decisions derived from the mathematical model of the
system--and it is probably necessary to depart from both to find a
satisfactory synthesis.
· Expansion of the model to incorporate more data elements and
outcomes. For example, both ratings of degree of concern (exposure and
toxicity) and confidence in the ratings need to be included in the model
as outcomes of Stages 2 and 3. The confidence information has not yet
been included.
285
OCR for page 286
· Testing of the model with a wider range of data and, by exploring
the sensitivity of the model to those data, establishment of a set of
design rules that either are intuitively satisfying or can be explained
as stemming logically from the data, instead of from limitations in the
model itself.
· Expansion of the guidance to system operators in designing Stage 2
minidossiers and Stage 3 dossiers and strategies to search for
information.
· Development of further guidance for the evaluation of Stage 2
minidossiers with respect to exposure and toxicity ratings and their
corresponding degrees of confidence.
· Determination of which health effects are worthy of treatment by
the techniques presented here for carcinogenicity. This will require
both a value judgment as to the severity of an effect and a scientific
judgment as to the availability of tests, the number of toxic substances
in the universe, accuracy of the tests, and the importance of classifying
chemicals correctly with respect to such effects.
· Development of data elements, estimation of their accuracy,
estimation of the number of chemicals that cause a health effect,
determination of misclassification costs, and development of
corresponding decision rules for the additional health effects selected.
· Expansion of the system to include all toxicity tests for which
chemicals are being given priorities.
IMPLEMENTATION
A priority-setting system based on value-of-information analysis may
be implemented by several possible approaches, and these vary
considerably in the time and resources required. At the most modest
level, the approach to implementation is qualitative. A greater emphasis
on estimated probabilities (Appendix D) may help in the evaluation of the
uncertainties that play such a fundamental role in priority-setting,
design of tier testing, and characterization of risk of chemicals. The
discussion of possible data elements (Chapter 4) may suggest indicators
to help select chemicals in the initial stages of a priority-setting
system, where information is especially fragmentary. The walkthrough
with the example chemical bisphenol A (Chapter 3) may suggest ways for
chemical managers to pick from among the many possible sources of
information, especially for Stages 2 and 3. Without describing the
priority-setting system quantitatively, much of the value-of-information
analysis can be implemented with little extra expenditure of time and
resources. In its most modest form, the computer-assisted Stage 1 could
be foregone in favor of a smaller select universe of chemicals.
286
OCR for page 287
system:
At the most elaborate level, a complete implementation could be
attempted from the start. The program for the model could be rewritten
for a mainframe computer to accommodate several effects, additional Stage
4 candidate tests, and more Stage 2 and Stage 3 data elements and
possible outcomes. me resulting large number of factors could be
estimated and the software developed for reading the files for Stage 1.
This effort might require a couple of years and considerable resources.
With a gradual approach, implementation would start at a modest level
and increase. Files and software for Stage 1 would be developed in
steps, perhaps without addressing the entire universe of 70,000 chemicals
immediately. It would probably be useful to rewrite the computer program
for the optimization model, adapt it to available computers, and
familiarize the technical staff with the algorithm. Carcinogenesis and
perhaps another health effect could be included initially. A small
number of data elements could be evaluated and their accuracy estimated
during the first year. In later years, these estimates could be refined
on the basis of new data (some from the tests), and other data elements
and toxic effects could be added. With this more gradual approach, the
system could be implemented in the first year at a low cost.
In the process of setting priorities, information is collected on
toxicity and exposure; some of it is recorded in the dossiers (including
minidossiers and microdossiers), and some is embodied in the estimates of
accuracy (e.g., the estimated false-positive and false-negative rates).
Additional information is developed by the testing program itself.
Putting this information into a usable format is an important potential
contribution of NTP. For example, it is very helpful to use
false-negative rates for estimating rates of detection, especially for
clinical tests with negative outcomes.
me following tasks would be required, to implement the demonstration
~ Training of staff in probabilistic thinking and value-of-
information analysis.
· Training of staff in the special skills needed for dossier design
and evaluation.
· Definition of a universe of substances and preparation or
augmentation of data bases with environmental chemicals, food
constituents, pyrolysis products, etc.
· Development of software for building computerized files for Stage
1 operation, including provisions for maintaining status lists
("dormant,n "on test," "Stage 2 minidossiers complete," etch.
287
OCR for page 288
· Design of forms and procedures for moving substances through Stage
3. For example, how often does the expert evaluation committee need to
meet, if the required frequency is different from the current NTP
schedule?
· Design of a feedback and control system for continuous updating of
the system.
· Provision of an effective interface between the established agency
nomination process and the proposed "long-list" process.
· Expansion of the role of the expert committee to include
estimation of degrees of exposure and toxicity, in addition to test
recommendations.
FURTHER REFINEMENT
Once the system becomes operational, a number of refinements can be
contemplated, as experience with its use accumulates. The whole design
is built on a series of estimates, such as the estimates of the number of
carcinogens among the chemicals considered and the accuracy of the
selection stages. Although by the time of implementation these estimates
will have been subjected to a great deal of examination, they still will
be quite uncertain. Thus, the system must be tested and refined as new
information becomes available. These are some of the questions to be
addressed:
· How many* chemicals are unstructurable?
· How many have no RTECS listing?
· How many have no listed production?
· How many reach a dormant list in Stage 1?
· How much does it cost to construct a minidossier?
· How often does an assessor score a chemical as having "high"
exposure or "low" confidence in the toxicity rating?
· Which is the most useful information for the Stage 3 committees?
How much does it cost to retrieve? For how many chemicals does one find
useful information?
· How many chemicals are retained for short- and long-term tests?
· How many of those prove positive?
*That is, what fraction of the universe?
288
OCR for page 289
· What is the distribution of test recommendations among various
types of tests?
After the system has operated for a while, the answers to these and
other questions can be assessed for relevance to estimates used to design
the system. Changes can be made as deemed necessary, and the model used
to calculate an improved set of rules to decide whether to consider
chemicals further.
Another activity to be considered is a continuing sensitivity
analysis of the system. What would happen if a decision rule related to
an RTECS code were changed? Would more chemicals reach Stage 2? Could
one spend less per minidossier and still get a reasonably effective
assessment from the minidossiers? These kinds of questions, which could
be addressed without committing any of the testing budget, might suggest
new ways of allocating resources in the next cycle of operating the
priority-setting system.
One can anticipate continuing development of possible data elements.
For example, various empirical structure-activity systems--such as those
described by Craig and Enslein (1980), Hodes (1981), Tinker (1981), and
Klopman (1983--might be applied in Stage 2, or perhaps in Stage 1. The
accuracy and costs of such data elements need to be explored. The new
data elements and corresponding decisions could be added to the
priority-setting system, if it were worth while.
EVALUATION
A distinctive feature of the proposed model is that each of the
estimates used in the model may be verified by results of the priority-
setting system. For example, the estimate of toxic chemicals among the
chemicals considered by the system predicts the percentage of toxic
chemicals that would be discovered if absolutely definitive tests were
conducted on all chemicals in a particular class. As tests are
conducted, by NTP and others, it will be possible to validate these
predictions by comparing them with test outcomes. Doing so allows
refinement of the estimates used in the model, hence improving its
efficiency. It also helps to identify better sources of information
about these parameters. If estimates of accuracy prove to be poor
predictors of performance, it may be because the users of the tests are
overestimating or underestimating what the tests can provide.
Evaluation is essential to good science and good priority-setting.
Not only will an effort to measure performance of the system generate
unique information, but the credibility of any scientific or
policy-making process is increased if it invites responsible evaluation.
Designing an evaluation scheme will not be easy. One must, for example,
consider what constitutes "truth" for the purposes of evaluation when
there is uncertainty about the interpretation of the results of tests in
terms of health effects in humans.
289
OCR for page 290
Potential users of the recommended system should take care to
forestall misinterpretations of its output. For example, it is clear
that priority-setting for testing is quite different from
priority-setting for other kinds of action. Failure to consider a
chemical for further testing can have several meanings, including its
having been exonerated and its being indistinguishable from many other
chemicals about which little is known. Although exposure and toxicity
are grounds for further testing, both are needed for health effects; and
concern about one is not in itself a cause for regulatory action.
POSSIBLE NEW DATA ELEMENTS
A central feature of the approach to priority-setting recommended
here is its flexibility. It can, in principle, accommodate any set of
values, any substances, and any kinds of tests. The last-named
capability is illustrated by the inclusion of a wide variety of data
elements in the demonstration scheme. Once the components of the
priority-system have been described quantitatively, evaluation of a
proposed data element requires only an estimate of its costs and accuracy
to determine its feasibility and to emphasize that the approach is not
limited to the data elements of the demonstration scheme. This section
describes some additional sources of information as possibilities, rather
than proposals. More detailed investigation may show that some are not
worthy of incorporation, whereas others may require only time to become
widely accepted.
THE USE OF SURVEYS AND EPIDEMIOLOGIC INFORMATION
There are a variety of methods for obtaining toxicologic information
other than laboratory tests. mese include analysis of existing health
records and questioning of people in particular jobs or neighborhoods.
me costs and value of information collected by these techniques vary
greatly; however, their utility may be examined in terms of accuracy and
cost, just as more traditional types of information have been examined
elsewhere in this report.
Populations that might be surveyed include members of the general
public, persons at risk because of their occupation or lifestyle, and
medical or health personnel. Each group could be asked questions to
ascertain its perceptions on exposure or health effects.
Using techniques from epidemiology (Buffler and Sanderson, 1981),
psychology (Barker, 1963), safety research (Rentos and Kamin, 1975), and
organization analysis {Mintzberg, 1975), investigators could study the
daily routine of respondents to identify potential exposures. Such
techniques could serve as a basis for analyzing the accuracy of surveys
in which responses to survey questions are not supplemented by direct
confirming observations. Considerable care would be needed to design
290
OCR for page 291
questions that were scientifically meaningful yet comprehensible to
respondents (National Research Council, 1981; Buffler, 1982~.
The benefits of exposure surveys might include the identification of
aspects not covered by traditional approaches, where existing data are
often proprietary, dated, or mute about chemicals that are created in the
home or are encountered only as intermediates in an industrial process.
Positive reports might influence priority-settinq by increasing estimates
of potential exposure. They might also prompt
further studies to clarify
degrees or exposure or might Increase tne perceived need for toxicity
testing.
Because toxicity testing is intended to serve the public good, it
could be guided by what the public considers to be important. Two kinds
of judgments are particularly significant for the priority-setting
process. One concerns perceptions of the relative harmfulness of various
health effects; in general, the amount of resources devoted to the study
of a health effect varies as a function of the degree of public concern
about it. The second issue concerns the relative costs of errors in
misclassifying chemicals. Although Appendix E presents arguments from
economic theory, the process of choosing chemicals to test necessarily
involves value judgments as to the relative costs of misclassifying a
chemical as safe (false-nenative) and misclassifying it as harmful
I ~_1 _: lo; W - ~\ O,~ ~ ~_~ ~ ~.~ ~ _~: ~ ~ ~ .~: ~.,~ ~ ~ _
`' "1~=-~'~^V=J · ~"LVC]= "' '~L" ~11= W"] V' "~=e=~'lly ~11= views w~ a
representative sample of the public on such matters and comparing them
with the views of technical experts. It need not be assumed that all
disagreements between the public and experts about the nature of risk
reflect poorly on either group. Research has shown that what appear to
be disagreements about the magnitude of risk can often be traced to other
explanations, such as differences in the definition of key terms (e.g.,
risk) or political values (Fischhoff et al., in press).
DEVELOPING VE=-LOW-COST (VLC ~ TESTS
Of the universe of about 70,000 chemicals considered by the
illustrative system, it appears that there is no characterizing
information on tens of thousands except chemical name and (usually)
structure. The Chemical Abstracts Service list contains about 6 million
chemicals, on perhaps 1 million of which there is little or no
information.
If 1% of the noncharacterized chemicals are highly likely
to have important health effects, then it may be assumed that some
6,000-10,000 of them are of sufficiently high concern to be candidates
for further testing.
291
OCR for page 292
These large numbers preclude intensive testing of any substantial
fraction of the noncharacterized chemicals. We believe that it would be
useful to consider development of very-low-cost (VLC) data elements (or
tests) that can provide at least rudimentary screening of large numbers
of chemicals. Because data (sometimes even structural information) are
lacking on many chemicals, this screening would involve testing the
chemicals (albeit briefly). Taking a "quick look" at chemicals would
yield some new knowledge, in contrast with the use of computerized data
scoring, which capitalizes on old knowledge. Indeed, comparing the
results of VLC tests with the results of computerized screening would be
a way of validating both methods and uncovering potentially interesting
discrepancies.
The results of VLC tests, like the results of low-cost screening,
would permit a limited characterization of each chemical. The precision
of VLC tests would necessarily be low. They would be used solely for
screening, rather than for refining one's degree of concern. The ideal
test in this regard is one that is most sensitive to the most toxic
chemicals. Because chemicals about which this test generated no concern
would not be considered further, the cost of false-negatives would be
high, but lower than if no "testing" of this population had been done at
all. Because the next stage for a chemical that generated concern would
be consideration for low-cost testing, the cost of false-positives would
be relatively low. These costs of misclassification make it possible to
choose between candidate VLC procedures once one has an idea of their
accuracy.
One direction that seems promising, but that requires considerable
additional analysis before it can be evaluated for development, is to use
simplified versions of existing tests and thus save money by avoiding the
methodologic refinements needed for definitive reliable results. As
unsatisfying as such tests might be--in contrast with traditional, more
sophisticated forms--they can serve a purpose in priority-setting. One
requirement is that their results be able legitimately to alter expert
opinion regarding degree of concern about a chemical. Whether the
potential for such alteration justifies even their low costs can be
addressed systematically by value-of-information analysis in the
framework described in this report. Indeed, one of the singular features
of this framework is that it allows a defensible way to address such
questions.
Most oncogenes appear to exhibit base-pair substitution; hence, a
test like the Salmonella/microsome assay {Salmonella with S-9, but with
duplicate plating at only a single dose of chemical) would be a VLC
test. With a similar TA-98 strain, both base-pair and frameshift
mutagens might be identified in a VLC test. This example is only
illustrative, not definitive; the idea is to use a simplified, and hence
less expensive, test carefully conceived to screen and identify likely
toxic chemicals of high potency. Examples of other possible
292
OCR for page 293
screening tests are unreplicated skin-exposure tests and unreplicated
acute-toxicity tests. VLC tests might be used in any decision stage to
improve the decision-making process.
Such a direct approach may be the only means of ensuring that
materials of unknown structure and mixtures are given some attention.
Once the chemicals of high concern are identified and carried forward,
the low-cost tests may be less effective, because the remaining chemicals
presumably will have lower toxicity or exposure potential and hence be
more difficult to detect, in view of test variability.
RANDOM SAMPLING
The most disturbing circumstances facing those who must set
priorities are the magnitude of the problem and the deficiencies in
information. When ignorance about all members of a set of chemicals is
equal (and total), then one has no choice but to treat them equally from
a priority-setting perspective. All should be advanced to the next
stage, or all should be left behind. The decision will depend much more
heavily on the resources available for testing than on the concern
engendered by having a large set of poorly understood chemicals.
One response to this situation is to delegate responsibility to the
scientific community as a whole, on the assumption that the diversity of
interests in that community will ensure that troublesome chemicals
quickly come to attention. Once some disturbing information gives cause
for concern about a chemical, it can be treated more specifically. It
might be argued that this leaves too much to chance, in view of the rate
at which unanticipated health hazards have emerged in the past.
An alternative approach is to admit to ignorance and to sample at
random from the universe of chemicals. The expected value of such
sampling can be calculated roughly by assuming that evidence of toxicity
will be found at the estimated rate of prevalence of toxicity in the
population. The sampling would, of course, provide a basis for reviewing
and refining that estimate. Ideally, the resources so invested would
justify increasing concern about some chemicals that would not otherwise
be considered and justify decreasing concern about many more. In
addition, they would aid in refining the decision rules of the
priority-setting system and thereby improve its functioning. There is a
small probability that sampling would produce major scientific
surprises. A careful analysis could reveal the proportion of the overall
testing budget that could be usefully devoted to this kind of development
activity. The analysis would also indicate which sampling strategy is
most efficient {e.g., what tests in what order and whether to use
stratified sampling). Testing a randomly chosen sample might also reveal
biases in the use of a universe restricted to chemicals on which there is
already some information.
293
OCR for page 294
Representative terms from entire chapter:
toxic chemicals