The National Academies Press

Currently Skimming:

Pages 107-142

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 107... ... 5PCCL to CCL: Classification Algorithm INTRODUCTION The intrinsic difficulty of identifying potentially harmful agents for resource-intensive scrutiny such as the selection of a drinking water contaminant from a preliminary Drinking Water Contaminant Candidate List (PCCL) for inclusion onto a CCL raises the question of what kind of process or method is best suited to this judgment. Read the entire page →
From page 108... ... OVERVIEW OF CLASSIFICATION SCHEMES Expert Judgments Many decisions (and their associated classifications) are made on the basis of the collective experience of experts. Read the entire page →
From page 109... ... Delphi procedure. The Delphi technique was introduced more than 30 years ago to limit interaction among participants and thereby optimize the quality of decisions (Dalkey, 1969; Linstone and Turoff, 1975) Read the entire page →
From page 110... ... of the Delphi process, there are indications that it may not result in decisions much improved over those obtained through less structured processes (IOM, 1992) Read the entire page →
From page 111... ... judgments were intrinsically embodied in what appeared to be objective ranking schemes. For example, various kinds of information (such as chemical persistence, solubility, and toxicity) Read the entire page →
From page 112... ... enough or too much. Others have worked for months before they realized that they were trying to achieve different visions. Read the entire page →
From page 113... ... Application of a prototype scheme for constructing the CCL would consist of a training set of chemicals, microorganisms, and other types of (potential) drinking water contaminants that would clearly belong on the CCL, such as currently regulated chemicals (if they were not already regulated) Read the entire page →
From page 114... ... a lessor extent on rule-based prioritization schemes to identify and rank drinking water contaminants for regulatory and research activities. It is clear that contaminant-by-contaminant consideration by panels of experts as to whether something should be placed on a CCL is not possible if the entire universe of potential drinking water contaminants is to be considered, as recommended in the committee's second report (NRC, 1999b) Read the entire page →
From page 115... ... but how the weights are arrived at and the consequences of the modes of combination usually are not. As for the neural network approach, its outputs must also be justified for regulatory purposes. Read the entire page →
From page 116... ... more attributes or fewer. They can be different attributes entirely. Read the entire page →
From page 117... ... value of Y greater than the threshold would indicate that the contaminant belongs in the T=1 category, and a predicted value of Y less than the threshold would indicate that the contaminant belongs in the T=0 category. The following sections of this chapter present the details of such an analysis. Read the entire page →
From page 119... ... Attribute Scoring For each contaminant in the training data set and for each of the validation test cases, values between 1 and 10 were assigned to each of five health effects and occurrence attributes. The committee used the contaminant attributes and associated scoring metrics and guidance outlined in Chapter 4. Read the entire page →
From page 120... ... BOX 5–1 ATTRIBUTE SCORING FOR VALIDATION TEST CASES1 Arsenic Arsenic is an element that occurs naturally in rocks, soil, water, air, plants, and animals. It is a metalloid that exhibits both metallic and nonmetallic chemical and physical properties and has several valence states. Read the entire page →
From page 121... ... The data used for obtaining arsenic's prevalence score were taken from the Endocrine Disrupter Priority-Setting Database (EDPSD) currently being developed for use by EPA (ERG-EPA, 2000) Read the entire page →
From page 122... ... The RfD for nitrate given in EPA's IRIS database is 1.6 mg/kg (EPA, 2000d) Read the entire page →
From page 123... ... the right atrium) Read the entire page →
From page 125... ... There are only two pathogens other than G lamblia for which regulations are set (or proposed) Read the entire page →
From page 126... ... Organism N50 Adenovirus 4 1.66 Rotavirus 6.3 Giardia lamblia 34.8 Cryptosporidium parvum 165 Vibrio cholerae 243 Campylobacter jejuni 896 Salmonella (5 nontyphoid strains) a 23,600 Salmonella typhosa 3.60×106 Escherichia coli (6 nonenterohemorhagic strains) Read the entire page →
From page 127... ... FIGURE 5–1 Correlation plots of the values of the five attributes for contaminants in the training data set. Crosses represent T=1 contaminants and circles represent T=0 contaminants. Read the entire page →
From page 128... ... Y=w0+w1X1+w2X2+w3X3+w4X4+w5X5, (5–3) where wi is the weight for attribute i. Read the entire page →
From page 129... ... FIGURE 5–2 Single-neuron model with a vector input and single output. at each information node. Read the entire page →
From page 130... ... classification algorithms were developed using Matlab and the Matlab Neural Network Toolbox (Mathworks Inc., Natick, Massachusetts) Read the entire page →
From page 131... ... FIGURE 5–3 Histogram of Yi values for the training data set using the linear classifier. trend that contaminants in the T=0 category tend to have smaller predicted values of Yi. Read the entire page →
From page 132... ... FIGURE 5–4 Classification error as a function of threshold value, in which classification error is defined as the number of misclassified contaminants linear classifier) Read the entire page →
From page 133... ... The ability of the classification scheme to separate the training data set is one way to estimate the classification error that is expected when used for prediction. With a threshold value of 0.55, the error in misclassifying T=1 contaminants is 8 percent (5 out of 63) Read the entire page →
From page 134... ... performance is measured according to the minimum of mse (Equation 5–2) Read the entire page →
From page 135... ... FIGURE 5–7 Histogram of Yi values for the training data set using the neural network classifier. For the linear classifier it was possible to examine the values of the weights and their statistical significance to determine the relative importance of the different attributes in determining the classification outcome. Read the entire page →
From page 136... ... DEMONSTRATED USE OF THE TRAINED CLASSIFIER Examination of Misclassified Contaminants Misclassification of contaminants in the training data set can be interpreted in three ways: Either (1) the training data (i.e., the attribute scores) Read the entire page →
From page 137... ... TABLE 5–3 Contaminants in the Training Data Set That Were Misclassified Using the Linear Classifier Misclassified T=1 Contaminants Yi Misclassified T=0 Contaminants Yi o-Dichlorobenzene 0.47 Ethanol 0.57 trans-1,2-Dichloroethylene 0.54 Folic acid 0.72 Toluene 0.43 Olestra 0.63 HPC 0.28 Saccharin 0.70 Total coliforms 0.42 concentrations. Toluene is a chemical that is generally known to be rather prevalent, but the occurrence data available were insufficient to represent this fact. Read the entire page →
From page 138... ... TABLE 5–4 Contaminants in the Training Data Set That Were Misclassified Using the Neural Network Classifier Misclassified T=1 Contaminants Yi Misclassified T=0 Contaminants Ethylbenzene 0.50 -- HPC 0.08 -- Validation Test Cases The contaminants in the training data set in the T=1 category did not include all those that have MCLs. Five such chemical and microbial contaminants were withheld as validation test cases to examine the predictive accuracy of the classification algorithm as required in the second phase of study (see Preface to this report) Read the entire page →
From page 139... ... TABLE 5–5 Classification Prediction Accuracy for Validation Test Cases Validation Test Cases Predicted Yi Using Linear Classifier (Threshold = 0.55) Predicted Yi Using Neural Network (Threshold = 0.55) Read the entire page →
From page 140... ... a prototype classification approach that must first be trained (calibrated) using a training data set containing prototype contaminants and can then be used in conjunction with expert judgment to predict whether a new (PCCL) Read the entire page →
From page 141... ... a transparent and defensible process, the importance of which is discussed in Chapter 2. As recommended in Chapter 3, the creation of a consolidated database that would provide a consistent mechanism for recording and retrieving information on the contaminants under consideration would be of benefit. Read the entire page →
From page 142... ... is ultimately adopted and used to help create future CCLs. • Finally, EPA should realize that the committee is recommending a prototype classification scheme to be used in conjunction with expert judgment for the future selection of PCCL contaminants for inclusion on a CCL. Read the entire page →

From page 107...

... 5PCCL to CCL: Classification Algorithm INTRODUCTION The intrinsic difficulty of identifying potentially harmful agents for resource-intensive scrutiny such as the selection of a drinking water contaminant from a preliminary Drinking Water Contaminant Candidate List (PCCL) for inclusion onto a CCL raises the question of what kind of process or method is best suited to this judgment.