Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
D Statistical Approaches to Reducing the Probability of False Alarms While Improving the Probability of Detection This section suggests two statistical approaches to improve the detection probability and reduce the probability of a false positive. The first is based on some very basic statistical concepts of testing simple versus simple 1 hypothesis using the Neyman-Pearson (NP) lemma. The second is an evidential approach. The idea behind this approach is quite simple. Currently a bag is passed through the CT scanner once. After this single pass, the bag is either cleared or is sent to a human screener. The decision is based on the automatic feature recognition program. If the bag is sent to a human screener, this person looks at the CT scanner image and either clears it or sends it for inspection by hand. Another idea is to send the bag through the CT scanner more than once and depending on the number of times the bag is flagged as a threat, it is either cleared or sent to a human screener. The main assumption is that each time the bag passes through the CT scanner, it provides a different scan. This is reasonable because the bag almost certainly will get positioned somewhat differently at each pass because of the bumps on the conveyor belt. The CT scanner does not know it is the same bag that it is scanning, so the scans are independent of one another. In the following, the false positives are the number of bags sent to the human screener that are ânon-threatâ bags. Detection probability is the probability of detecting a threat when it exists. NOTATION The following notation is used: ⢠P (A bag is declared a âthreatâ by the CT scanner | the bag is a true threat) = ð¿. This is the ⢠P (A bag is declared a âthreatâ by the CT scanner | the bag is not a true threat) = ð¼. This is the probability of detecting a threat by the CT scanner in one scan. probability of falsely detecting a threat by the CT scanner in one scan. Thus, These probabilities are estimable from the experiments that TSA currently conducts. 1 When the true state of nature is dichotomous (i.e., threat or no threat), in the statistical testing of hypothesis terminology, testing for which state the data support the most is called a simple versus simple hypothesis testing problem. See also G. Casella and R.L. Berger, Statistical Inference, 2nd edition, Duxbury Press, Pacific Grove, Calif., 2002, Chapter 8. 81
PROBLEM FORMULATION The problem can be formulated as a statistical testing of hypotheses. Hypothesis 1: The bag is not a threat. Hypothesis 2: The bag is a threat. Given: Y, the number of times out of N the bag is declared a threat by the CT scanner Statistical Model It is obvious that (a) Y ~ Binomial (N,ð¼) under Hypothesis 1. (b) Y ~ Binomial (N,ð¿) uner Hypothesis 2. Decision Process Pass the bag N number of times. If q (or more) out of N tests are positive, send the bag to the human screeners and human inspectors. Relevant Probabilities 1. Probability of declaring a bag to be a threat when the threat exists (correct detection): 2. Probability of declaring a bag to be a threat when it is not a threat (false alarm): For the following calculations, assume that ð¼ = 0.2,ð¿ = 0.9. Any other appropriate values may be substituted in the above formulas to obtain the relevant probabilities (Table D-1). A typical entry in the table is read as: Suppose the decision rule is such that a bag is declared a threat if it tests positive at least three times out of the total of five scans. Such a decision rule will detect the threat correctly 99.14 percent Policy makers can decide the appropriate values for N and q based on the values of ð¿ and ð¼ and of the time and will give a false positive alarm 5.79 percent of the time. the desired probabilities of detection and the false alarm. Given the often cited approximate number of annual savings of $25 million per percentage point drop in the false alarm rate, this simple scheme represents a potential saving of $375 million per year. 82
TABLE D-1 Probabilities for Some Combinations of N and q N q Correct Detection False Alarm 2 1 0.99 0.36 2 0.81 0.04 3 1 0.999 0.488 2 0.972 0.104 3 0.729 0.008 4 1 0.9999 0.5904 2 0.9963 0.1808 3 0.9477 0.0272 4 0.6561 0.0016 5 1 0.99999 0.67232 2 0.99954 0.26272 3 0.99144 0.05792 4 0.91854 0.00672 5 0.59049 0.00032 Evidential Approach In the NP approach, we answered the question, given these data, what do I do? A somewhat different question may also be asked: Given these data, what strength of evidence do we have for the hypothesis âThis bag is not a threatâ vis-à -vis âThis bag is a threatâ? Using the law of the likelihood 2 this is given by the likelihood ratio ð(ð �lags out of N trial |No threat) ð¼ ð (1âα) ðâð ð(ð �lags out of N trial |Threat) ð¿ ð (1â δ) ðâð Strength of evidence for âno threatâ: = . The policy makers have to decide at which level of strength of evidence for âno threatâ the bag may be cleared. If the strength of evidence for no threat is below that level, the bag will be sent for hand inspection. For the sake of illustration, suppose we say that if the strength of evidence for no threat is larger than 4, the bag will be cleared; otherwise it will be sent to for hand inspection. Under the âno threatâ hypothesis, we can compute how often we would send the bag for hand inspection (probability of false positives) and under the âthreatâ hypothesis, how often would we clear the bag (probability of misleading evidence). Table D-2 was computed assuming a cut-off level of 4, where number 4 implies that the bag is four times more likely to not be a threat than to be a threat. 2 Ian Hacking, The Logic of Statistical Inference, Cambridge University Press, Cambridge, U.K., 1965; Richard M. Royall, Statistical Evidence: A Likelihood Paradigm, Capman and Hall, New York, 1997; Mark L. Taper and Subhash R. LeLe, eds., The Nature of Scientific Evidence: Statistical, Philosophical, and Empirical Considerations University of Chicago Press, Chicago, Ill., 2004. 83
TABLE D-2 Threat Cut-off at 4 Probability of Probability of If the number Probability of not clearing a clearing a bag Probability of of alarms is ⤠clearing a bag bag when it is when it is a detecting a Number of to this number, when it is not a not a threat threat (false threat when the trials clear the baga threat (false positive) negative) threat exists 1 0 0.8 0.2 0.1 0.9 2 0 0.64 0.36 0.01 0.99 3 1 0.896 0.104 0.028 0.972 4 1 0.819 0.181 0.0037 0.9963 5 2 0.942 0.058 0.009 0.991 NOTE: A typical entry in this table can be read as âIf we conduct five trials and fewer than three of them are positive, then we clear the bag.â If we follow this rule, the probability of a false positive is 0.058 and the probability of detecting the threat is 0.991. a This number is a function of (ð¼,ð¿,K). The decision rule will depend on the choice of K. The difference between the evidential approach and the NP approach is this: In the evidential paradigm the cut-off point is determined a priori and the error probabilities are calculated afterwards, whereas in the NP approach, the error probabilities are fixed a priori and the cut-off points are determined afterwards. In this particular situation, the author does not see any difference between following the NP approach or the evidential approach. Incorporating Perception of Threat The methodology described above does not incorporate the perception of threat. It was based simply on the data observed for a particular checked bag. If the perception of threat can be quantified, we can address the question, Given these data, how do I change my beliefs? The prior belief or perception of Let P(Threat) = ð denote the perceived probability of threat. This represents our prior belief that threat level can be incorporated into the above setup in the following fashion: the bag is a âthreatâ without having observed any data on a particular bag. This is equivalent to the epidemiological concept of âprevalence of a disease in the population.â Now, having observed that the bag was flagged as a threat q times out of N passes, we want to know, in the light of these data, how we change our perception about the threat that this particular bag poses. This can be calculated quite easily using standard conditional probability calculations 3 as follows: ð¿ ð (1â ð¿) ðâð ð ð¿ ð (1â δ) ðâð Ï+α q (1â α) ðâð (1â Ï) P(Threat | q out of N tests are positive) = . Notice that this probability depends strongly on the value of ð. Computing the probabilities of false negatives and correct detection depends on the specification of ð and the cut-off point above which we declare a bag to be a threat. Hence it is not possible to present a comparison that captures perception of threat in relation to either the NP or the evidential approach. 3 G. Casella and R.L. Berger, Statistical Inference, 2nd edition, Duxbury Press, Calif., 2002. 84
Comments Implementation of this scheme will need to take into account the costs involved in scanning the bags repeatedly and tracking the number of times they are declared âpositiveâ by the CT scanner. This may be facilitated easily by the RFID tag on each bag. The author does not feel qualified to comment on the feasibility of this aspect. The conveyor belts that handle the bags will need to be redesigned to allow the bags to be scanned repeatedly and in such a manner that at each pass the position of the bag is perturbed to some extent. This would seem to be a manageable mechanical engineering problem. However, again the author declares that he is not qualified to comment on the feasibility and the cost of such changes. This method can be easily modified if K different tests (machines) are used. This can also be done in a sequential fashion where N is random and at each scan the decision is made whether to pass the bag or to send it to the human inspectors or to run another test. In the authorâs opinion, sequential design is logistically more complicated than the fixed N design described above. The information from multiple scans, if made available to the human screener, might further reduce the number of bags that are sent for hand inspection. DISCUSSION Current technologies for scanning the checked baggage do a very respectable job. However, there are limits as to how much CT scanning technology and the feature detection algorithms can increase the probability of detection. The repeated scanning approach takes the current technology and significantly increases the probability of detection and decreases the probability of false alarm without requiring significant technological breakthroughs. 85