National Academies Press: OpenBook
« Previous: Appendix A: The Joint Explanatory Statement and the Statement of Task
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 58
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 59
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 60
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 61
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 62
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 63
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 64
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 65
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 66
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 67
Suggested Citation:"Appendix B: Performance Metrics for ASPs and PVTs." National Research Council. 2009. Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version). Washington, DC: The National Academies Press. doi: 10.17226/12699.
×
Page 68

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Appendix B Performance Metrics for ASPs and PVTs “Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise.” – John W. Tukey (1962), “The Future of Data Analysis,” Annals of Mathematical Statistics 33(1), p.1–67. (The citation appears on p.12.) When evaluating the performance of instruments to identify the system most well suited to a given task, one needs to consider the correct metric for making the comparison. In the case of systems such as the advanced spectroscopic portals (ASPs), conventional measures such as sensitivity and specificity provide useful information, but do not assess directly test performance in actual field operation. The metrics of interest concern the probabilities of making incorrect calls, i.e., the probability that the cargo actually contained dangerous material when the test system allowed it to pass (a false negative call), and the probability that the cargo actually contained benign material when the test system alarmed (a false positive call). This appendix describes the calculations leading to estimates of these probabilities, the uncertainties in these values, and how these estimated probabilities can be used to compare two systems under consideration. NOMENCLATURE Test system performance usually is characterized in terms of detection probabilities. The notation for these probabilities comes from the literature for comparing medical diagnostic tests, and we use the same notation here for radiation detection systems, so we begin with some terminology. In formal notation, the absolute probability of event A is written P{A}. The probability that event A happens given condition or event B is written as P{A|B}. The event after the vertical bar “|” is the event on which the probability is conditioned; i.e., the event that preexists. For the rest of this appendix, we define the following. A = cargo contains SNM B = Test system alarms Ac = The complement of A, cargo contains no SNM (benign) 47 Bc = The complement of B, test system does not alarm Sensitivity (S) = probability that the test system alarms, given that the underlying cargo truly contains special nuclear material (SNM). S = P{B|A} Specificity (T) = probability that the test system does not alarm, given that the underlying cargo truly contained benign material (non-SNM). T = P{Bc|Ac} 47 Some non-SNM radioactive material is not benign, but for simplicity, this appendix refers to non-SNM as benign material. 58

APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 59 Prevalence (p) = probability that cargo contains SNM. p = P{A} Positive predictive value (PPV) = probability that the underlying cargo truly contains SNM, given that the test system alarms. PPV = P{A|B} Negative predictive value (NPV) = probability that the underlying cargo truly contains non-SNM, given that the test system did not alarm. NPV = P{Ac|Bc} WHAT WE WANT TO KNOW AND WHAT WE CAN KNOW Although we might want to know the sensitivity and specificity of the detection systems, because their definitions rely on true knowledge of the cargo contents, we can estimate a system’s sensitivity and specificity only from a designed experiment. The experimenters insert into the cargo either SNM or benign material, and then run the cargo through the test systems; the proportion of SNM runs that properly set off the test system alarm is an estimate of the test’s sensitivity, and the proportion of benign runs that properly pass the test system is an estimate of the test’s specificity. Such design studies are artificial scenarios intended to represent a range of possible real-world events. In real life we do not know the cargo contents. We see only the result of the test system: either the test system alarmed, or it did not alarm, and the probability of getting an alarm given that SNM is present is not necessarily the same as the probability that SNM is present given that the system alarmed (P{B|A} ≠ P{A|B}). Operationally, if the system alarms, SNM is suspected; if the system does not alarm, the cargo is allowed to pass. We are concerned especially with this question: Given that the test system did not alarm, what is the probability that the cargo contained SNM? That is, what risk do we take by allowing a “no-alarm” container to pass? From the standpoint of practical operational effectiveness, this false negative call probability (FNCP = P{A|Bc} = 1-NPV) 48 has grave consequences. As shown below by Bayes’ Theorem, it is a function of sensitivity (S) and specificity (T), as well as of prevalence (p = P{A}), but a comparison between two test systems on the same scenario (i.e., the same threat) involves the same prevalence, so prevalence does not enter into the comparison of effectiveness for the two test systems. Accurate estimation of sensitivity and specificity is important, in that it allows us to compare accurately the performance of two test systems using the relevant, practically meaningful metric. As noted above, from designed studies we can estimate S and T, such as those conducted at the Nevada test site. We also can derive confidence limits on S and T from such designed experiments, and hence we can estimate (1–NPV) and associated confidence intervals. More importantly, we can compare the two systems via a ratio, say the FNCP ratio (1-NPV1)/(1-NPV2). A FNCP ratio whose lower confidence limit exceeds 1 indicates preference for test system 2, while a ratio whose upper confidence limit falls below 1 indicates preference for test system 1. Note that these ratios may differ for different scenarios; a table of these ratios may suggest strategies for associating the ratios with the threat levels presented by different scenarios. Notice also that the probability of making a false positive call (FPCP) is likewise of interest for purposes of evaluating costs and benefits: Too many false positive calls can also be costly by slowing down commerce, diverting CBP personnel from potential threats as they spend 48 Some analyses refer to “false discovery rate” and “false non-discovery rate,” which are related to (1–PPV) and (1– NPV), respectively, but their definitions are slightly different (see Box B.1).

60 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT Box B.1: A comment on notation We denoted by FNCP the probability of making a false positive call and by FPCP the probability of making a false positive call; i.e., FNCP = P{ true + | test calls “–” } FPCP = P{ true – | test calls “+” } . We related these probabilities to the following generic two-way table of test outcomes (notation from Benjamini and Hochberg, 1995, p.291, referred to as BH95, is in parentheses): Test calls Test calls Total Truth “Positive” “Negative” Tests N   m  m0 True POSITIVE N  (V) N  (U) N   m0 True NEGATIVE N  (S) N  (T) Total calls R m–R M We estimated the false negative call probability via the proportion of negative-call tests (m  R) that were in fact positive (N+), or U/(m R) in BH95 notation. Similarly, we estimated the false positive call probability via the proportion of positive-call tests (R) that were in fact negative (N+), or V/R in BH95 notation. BH95 address the situation known as “multiple testing,” where one is conducting many hypothesis tests (e.g., hundreds or thousands of tests as occurs in gene expression experiments), and wants to control the frequency with which one declares as “significant” (e.g., “positive”) tests which in fact are negative. Hence Benjamini and Hochberg (1995) define the expected proportion of false positive calls, E(V/R), as the “false discovery rate,” or FDR. They provide a procedure based on the m p-values from the m tests so that one has assurance that, on average, the proportion of "declared significant" tests that in fact are not significant remains below a pre-set threshold (e.g., 0.05). If we estimate the FPCP as V/R, we can think of this estimated FPCP as an estimate of Benjamini and Hochberg’s FDR. In analogy with E(V/R)=FDR, some have termed E(U/(m R)) the “false non-discovery rate.” Our situation differs from the multiple testing situation in two ways. First, our two-way table arises from a designed experiment where values of m0 and m are set by design. Second, our bigger concern lies not with false positive calls but rather with false negative calls; i.e., with the probability that a cargo declared “safe” (negative) actually is dangerous (true positive). The table suggests that we can estimate FNCP as U/( m R)). Some authors have called the expected value of this ratio, E(U/(m R)), the “false non-discovery rate” (see Genovese and Wasserman 2004; Sarkar 2006). But with both FNCP and FPCP, one needs further information about the frequency of true “positives” and true “negatives” (in the form of p = probability that cargo contains SNM or other threatening material) beyond the m tests given in the design. In fact, as further tests are conducted, better estimates of FNCP and FPCP can be obtained by incorporating better estimates of sensitivity and specificity, as well as p, into the formulas for FNCP and FPCP. For that reason, we have chosen to derive the relevant probabilities using Bayes’ formula, rather than using the terms “false discovery rate” and “false non-discovery rate,” which often are estimated from only the table of outcomes from multiple tests. For further information, see the references listed at the end of Appendix B.

APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 61 time investigating benign cargo, reducing confidence in the value of the system and increasing the likelihood that operators might not take results seriously. Two detection systems that have exactly the same probability of a false negative call for a given scenario, but substantially different values of the probability of making a false positive call, may indicate a preference for one system over the other. The probability of making a false positive call equals 1–PPV. We illustrate these calculations from hypothetical data below. Suppose we have 24 trucks, into 12 of which we place SNM and leave only benign material in the remaining 12 trucks. We run all 24 trucks through two test systems, and observe the following results: Test System 1 Test System 2 No Total No Total Alarm Alarm Alarm Runs Alarm Runs SNM in 10 2 12 11 1 12 cargo Non-SNM 4 8 12 2 10 12 in cargo 14 10 24 13 11 24 Sensitivity is the probability that the system alarmed, given the presence of SNM in the cargo: among the 12 trucks that contained SNM, 10 alarmed for test system 1 (estimated sensitivity S1 = 10/12) and 11 alarmed for test system 2 (estimated S 2 = 11/12). Similarly, we estimate specificity for the two test systems as 8/12 and 10/12, respectively (the fraction of “no alarm” results out of the 12 non-SNM trucks). Because we specified the number of runs in each condition ( n1 =12 for SNM runs and n2 =12 for non-SNM runs), we can estimate the uncertainties in these probabilities using the conventional binomial distribution. In this case, the lower 95% confidence bounds determined from the binomial distribution based on n1  n2  12 are: Test System 1 Test System 2 0.833 (10/12) 0.917 (11/12) Estimated Sensitivity (0.516, 0.979) (0.615, 0.998) 95% confidence interval 0.667 ( 8/12) 0.833 (10/12) Estimated Specificity (0.349, 0.901) (0.516, 0.979) 95% confidence interval (The wide intervals result from the small sample sizes.) More importantly, the negative predictive value (NPV, the probability that the truck truly did not contain SNM, given that the alarm did not sound) is 8/10 for test system 1 and 10/11 for test system 2, and hence we estimate the probability of making a false negative call for the two systems as • proportion of cases where test system 1 did not alarm (10 cases) but actually contained SNM cargo (2 cases) = 2/10 = 0.20 • proportion of cases where test system 2 did not alarm (11 cases) but actually contained SNM cargo (1 case) = 1/11 = 0.09

62 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT Clearly, test system 1 appears to be less reliable than test system 2. The calculation of the lower bounds on these estimated probabilities is not as straightforward as using the binomial distribution, as was done for sensitivity and specificity, because the denominator (10 in the outcome of the performance tests of system 1 and 11 in the outcome of the performance tests on system 2) arose from the test results, not from the number of trials set by the study design. That is, the denominator “10” for test system 1 (and “11” for test system 2) is the sum of two numbers that might differ if the test were re-run. Confidence bounds can be obtained as a function of sensitivity (S) and specificity (T) (see Box B.2). Bayes’ rule (Navidi, 2006) states: P  A  B c   P  B c  A  P  A  [( P  B c  A  P  A)  ( P  B c  Ac   P  Ac )]       FNCP =  (1)      where P{A|Bc} = probability that event A occurs, given confirmation that event Bc has occurred (here, P{cargo contains SNM | test system does not alarm} = 1 – NPV) P{Bc|A} = probability that event Bc occurs, given confirmation that event A has occurred (here, P{test system does not alarm | cargo contains SNM} = 1 – S) P{Bc|Ac} = probability that event B occurs, given confirmation that event Ac has occurred (here, P{test system does not alarm | cargo contains no SNM} = T). Box B.2: Uncertainty in the ratio FNCP1/FNCP2 The uncertainty in the ratio FNCP1/FNCP2 ≈ [(1-S1)/(1-S2)][T2/T1] can be approximated using propagation of error formulas. Let ratio = N/D denote a generic ratio (N = Numerator, D = Denominator). Var ( N ) Var ( D) SE (ratio)  SE ( N / D)  ratio   N2 D2 When T and S have binomial distributions, Var(T1)=T1(1-T1)/n1, Var(S1)=S1(1-S1)/n1 and likewise for Var(T2) and Var(S2), where n1 [n2] is the number of trials on which S1 and T1 [S2 and T2] are estimated (in experimental runs at Nevada Test Site, n1n212 or 24). Hence, the standard error (square root of the variance) of (1-S1)/T1 is approximately 1  S1   1  T1 S1  n1 1  S1  n1T1 T1 so the standard error of the ratio of false negative call probabilities (when p is tiny) is approximately  FNCP1   FNCP  Var FNCP  Var FNCP2   FNCP    FNCP  SE  1 1 .   FNCP 2 FNCP22 2 2   1 So, T2 1  S1   S1   S  1  T2  1  T1  SE FNCP1 / FCNP2    n2  .  n1    2    T2  T1    T1 1  S 2   1  S1   1  S 2   

APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 63 The sensitivity (S = P{B|A}) and specificity (T = P{Bc|Ac}) can be estimated from the experimental test runs and, factoring the prevalence, p, we can calculate FNCP: 1  S  p 1 FNCP = P{A|Bc} =  , (2) 1  S  p  T 1  p  1  y where y = [T/(1S)][(1p)/p]. Systems with lower values of P{A|Bc}, i.e., with higher values of y, are preferred. Denoting by S1 , T1 , S 2 , T2 the sensitivities and specificities of systems 1 and 2, respectively, system 1 is preferred over system 2 if FNCP1 < FNCP2 ; i.e., if y1 > y2 i.e., if  T1  1  p   T2  1  p   1  S  p    1  S  p       1  2     which is the same as either T1 T 2 (3) 1  S1 1  S 2 or T1 1  S1  . (4) T2 1  S 2 That is, a comparison of FNCP for test system 1 (FNCP1) with that for test system 2 (FNCP2) reduces to a comparison of [(specificity)/(1-sensitivity)] for the two systems. We can estimate uncertainties on our estimates of sensitivity and specificity (based on the binomial distribution; see above discussion). Hence, we can approximate the uncertainty in [(1 – S)/(T)], and ultimately the uncertainty in the ratio of false negative call probabilities (see Box B.2) — which does not involve assumptions on p (likelihood of the threat). Notice that test system 1 is always preferred if T1 ≥ T2 and S1 ≥ S2, because T1 ≥ T2 implies that the left-hand side of Equation (4) exceeds or equals 1, and S1 ≥ S2 implies that the right-hand side of Equation (4) is less than or equal to 1; hence Equation (4) is satisfied. (If T1 = T2 and S1 = S2, then the test systems are equivalent, in terms of sensitivity, specificity, and false negative call probability, so either can be selected.) In real situations, however, one test system may have a higher test sensitivity but a lower specificity. For example, if T1 = 0.70 and T2 = 0.80 (test system 2 is more likely to remain silent on truly benign cargo than test system 1), but S1 = 0.950 and S2 = 0.930 (test system 1 is slightly more likely to alarm if the cargo truly contains SNM), then (4) says that

64 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT test system 1 is preferred, because T1/T2 = 0.875 and (1-S1)/(1-S2) = 0.05/0.07 = 0.714. The FNCP for the two systems are 1 1  FNCP1    T1  1  p   14.00  1  p    1  1          1  S1  p   p  1 1  FNCP2   11.43  1  p     T2  1  p  1  1     p      1  S2 p     so clearly FNCP1<FNCP2. Calculations for this example (S1 = 0.95, S2 = 0.93, T1 = 0.70, T2 = 0.80), for different threat levels p, are: • p = 0.10: FNCP1 = 0.007874 and FNCP2 = 0.009629 (ratio = 0.81777); • p = 0.05: FNCP1 = 0.003745 and FNCP2 = 0.004584 (ratio = 0.81701); • p = 0.01: FNCP1 = 0.000721 and FNCP2 = 0.000883 (ratio = 0.81646); p = 0.001: FNCP1 = 0.7150´10-4 and FNCP2 = 0.8758´10-4 (ratio = 0.81634); • p = 0.0001: FNCP1 = 0.7142´10-5 and FNCP2 = 0.8751´10-5 (ratio = 0.81633). • The prevalence p has little effect on the ratio of FNCPs, but its effect on the absolute rate (magnitude) of the FNCP is noticeable. Regardless of its value, however, the probability of a FNC will be very small whenever the probability of a threat is small (e.g., less than 0.1). When the differences in sensitivities are much higher, the FNCPs also are quite different. Consider the case when S1 = 0.90, S2 = 0.30, T1 = 0.70, T2 = 0.90, for the same threat levels: • p = 0.10: FNCP1 = 0.015625 and FNCP2 = 0.079545 (ratio = 0.19643); • p = 0.05: FNCP1 = 0.007463 and FNCP2 = 0.038326 (ratio = 0.18977); • p = 0.01: FNCP1 = 0.001441 and FNCP2 = 0.007795 (ratio = 0.18485); • p = 0.001: FNCP1 = 0.000143 and FNCP2 = 0.000780 (ratio = 0.18379); p = 0.0001: FNCP1 = 0.1429´10-4 and FNCP2 = 0.7778´10-4 (ratio = 0.18369). • Here, even with a higher specificity, the increase in sensitivity from 0.3 (test 2) to 0.9 (test 1) results in a five-fold decrease in the FNCP. With either test, the FNCP is small, even when the threat level is 0.01 (1 in 100 trucks carry threatening cargo). Calculations for the probability of a false positive call (FPCP, 1-PPV) are similar. Again from Bayes’ Theorem: P  A  B c   P  B c  A  P  A  [( P  B c  A  P  A)  ( P  B c  Ac   P  Ac )] (5)             where

APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 65 Ac = complement of A = event that cargo does not contain SNM Bc = complement of B = event that test system does not alarm P{Ac|B} = probability that event Ac occurs even though B occurred(here, P{cargo contains no SNM | test system alarms} = 1 – PPV) P{Bc|Ac} = probability that event B occurs, given confirmation that event Ac has occurred (here, P{test system does not alarm | cargo contains no SNM} = T). P{Bc|A} = probability that event Bc occurs, given confirmation that event A has occurred (here, P{test system does not alarm | cargo contains SNM} = 1 – S) FPCP =(1-T)(1-p)/[(1-T)(1-p)+Sp] = 1/(1+z) where z = (Sp)/[(1-T)(1-p)]. So test system 1 would be preferred, in these terms, over system 2, if  S1  p   S 2  p   1  T  1  p    1  T  1  p       2  1     i.e., if  1  T1   1  T2   S    S .     1 2 To calculate the magnitude of FPCP (not just the ratio of the probabilities for the two systems), consider that p is likely small and that S1 (or S2) may not be orders of magnitude larger than (1-T1) (or (1-T2). In this case, the “1 +” in the denominator does matter for the absolute magnitude of this FPCP. For the example above, where S1 = 0.95, S2 = 0.93, T1 = 0.70, T2 = 0.80, the corresponding FPCP for p=0.10, p= 0.05, p = 0.01, p = 0.001, p = 0.0001 are: p = 0.10: FPCP = 1/[1 + 0.31579(1/9)] = 0.96610, FPCP2 = 0.97666 (ratio = • 1 0.9892) p = 0.05: FPCP  0.98365 , FPCP2  0.98881 (ratio = 0.99478) • 1 p = 0.01: FPCP  0.99682 , FPCP2  0.99783 (ratio = 0.99899) • 1 p = 0.001: FPCP  0.99968 , FPCP2  0.99978 (ratio = 0.99990) • 1 p = 0.0001: FPCP  0.99997 , FPCP2  0.99998 (ratio = 0.99999). • 1 For these examples, the chance of having to re-inspect every sounded alarm, only to find benign material, is virtually identical in both systems (and very close to 1 for both). The same is true when S1  0.90 , S 2  0.30 , T1  0.60 , T2  0.80 : p = 0.01: FPCP  0.95294 , FPCP2  0.93103 (ratio = 1.02353) • 1 p = 0.05: FPCP  0.97714 , FPCP2  0.96610 (ratio = 1.01143) • 1 p = 0.01: FPCP  0.99553 , FPCP2  0.99331 (ratio = 1.00223) • 1 p = 0.001: FPCP  0.99956 , FPCP2  0.99933 (ratio = 1.00022) • 1 p = 0.0001: FPCP  0.99996 , FPCP2  0.99993 (ratio = 1.00002). • 1

66 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT The DNDO criteria for “significant improvement in operational effectiveness” involve comparisons of sensitivity and specificity. As noted above, a test system that has higher sensitivity and higher specificity will have a lower false negative rate. But the above calculations also demonstrate that “nearly equal” sensitivities and specificities result in nearly equivalent systems, and hence offer rather limited benefit for the cost. For completeness, we re-write the DNDO criteria for “significant improvement in operational testing” (see Box 2, pp 40–41) using the S, T notation (for sensitivity and specificity). Let S A1) SNM , noNORM  denote the sensitivity of the ASP system in primary (1) ( screening when the cargo truly contains SNM and no NORM; i.e., S A1) SNM , noNORM  = ( P{ASP alarms | cargo contains SNM, no NORM}. Likewise, let S P1) SNM , noNORM  denote ( the sensitivity of the current (PVT+RIID) system in primary (1) screening when the cargo truly contains SNM and no NORM; i.e., S P1) SNM , noNORM  = P{PVT alarms in primary screening | ( cargo contains SNM, no NORM} Using T to denote specificity, let TP( 2) SNM , noNORM  = P{PVT/RIID does not alarm in secondary screening | cargo contains no SNM, but possibly NORM} (specificity). (1) (1) Denote by SA and SP the sensitivities of ASP and PVT+RIID combination, respectively, (1) (1) in primary screening, and TA and TP the specificities of ASP and PVT+RIID, respectively; superscript (2) indicates secondary screening. DNDO has specified its criteria for “operational effectiveness” as follows (see Sidebar 3.1 on page 29): 1. S A1) SNM , noNORM  ≥ S P1) SNM , noNORM  ( ( 2. S A1) SNM  NORM  ≥ S P1) SNM  NORM  (different version of criterion 1 ( ( above) 3. TA1) ( MI  Iso)  TP(1) ( MI  Iso) (where “MI-Iso” indicates “licensable medical or ( industrial isotopes). 4. 1  TA1) ( NORM )  0.20[1  TP(1) ( NORM )] (  T A1) ( NORM )  0.8  0.2(TP(1) ( NORM )) . ( 5. 1  S A2 ) ( SNM )  0.5(1  S P2 ) ( SNM ))  S A2 ) ( SNM )  0.5(1  S P2) ( SNM )) . ( ( ( ( 6. Time in secondary for ASP ≤ time in secondary for RIID (no connection to sensitivity/specificity). Since criterion 4 is more stringent than criterion 3 and criterion 5 is more stringent than criterion 1, we concentrate on values of sensitivity and specificity that satisfy criteria 4 and 5. When these two conditions are satisfied (i.e., TA ≥ 0.8 + 0.2TP and SA ≥ 0.5 + 0.5SP), the ratio of false negative call probabilities (A to B) can be as small as 1:900 – almost 1000 times smaller. For such improvements, the ratio of both the sensitivities and the specificities must be on the order of 0.99/0.10 or 0.95/0.10; in such cases, the false negative call probabilities are on the order of (10-8 to 10-5). Tables of values of the probabilities of both false negative calls and false positive calls were calculated when T A , S A , TP , and S P were set equal to 0.1, 0.2, ..., 0.8, 0.9, 0.95, 0.99; of the 114 = 14,641 combinations, only 858 satisfied criteria 4 and 5. These 858 combinations were set along with 5 different values of p = 0.01 (cargo is present in 1 of 100 trucks), 0.001, 0.0001, 0.00001, 0.000001 (1 in 1,000,000 trucks). A plot of the smaller false

APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 67 negative call probability (denoted FNCP2 in the figure) versus the larger one (denoted FNCP1) is shown in Figure B.1. (the red dashed line corresponds to the line where the two false negative call probabilies are equal). The upper left corner shows the cases where the FNCPs are most different ( 0.00112  FNCP1 / FNCP2  0.00311 ), which occurred in 26 of the 858 cases (26 5 points are shown, corresponding to 5 values of p). More frequently, the ratio is less dramatic ( 0.00317  FNCP / FNCP2  0.03161 for 257 of 858 cases; 0.0316  FNCP1 / FNCP2  0.3162 1 for 535 of the 858 cases; 0.3165  FNCP / FNCP2  0.4819 for 40 of the 858 cases). In each 1 case, the absolute magnitudes of the false negative call probabilities are quite small, and the ratios of the false positive call probabilities are almost 1. Figure B.1: Plot of FNCP2 versus FNCP1 for cases satisfying the criteria TA  0.8  0.2TP and S A  0.5  0.5S P , for different levels of p (1 x 10-2, 1 x 10-3, 1x 10-4, 1 x 10-5, 1x 10-6). The red dashed line corresponds to FNCP  FNCP2 . The results are stratified by magnitude of the ratio 1 FNCP1 / FNCP2 (specifically, rounded values of the common logarithm of the ratio: –3, –2, –1, 0, respectively, for the four plots). APPENDIX B REFERENCES Benjamini, Y. and Y. Hochberg. 1995. Controlling the false discover rate: A practical and powerful approach to multiple testing, Journal of the Royal Statistical Society, Series B 57: 289–300.

68 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT Genovese, C.R. and L. Wasserman. 2004a. Controlling the false discovery rate: Understanding and extending the Benjamini-Hochberg Method, http://www.stat.cmu.edu/genovese/talks/pitt-11-01.pdf. Genovese, C.R. and L. Wasserman. 2004. A stochastic process approach to false discovery control, Annals of Statistics 2004: 32(3), 1035–1061. Ku, H.H. 1962. Notes on the Use of Propagation of Errors Formulae. Journal of Research of the National Bureau of Standards 70C(4), p.269. Navidi, W. 2006. Statistics for Engineers and Scientists, McGraw-Hill. Pawitan, Y., S. Michels, S. Koscielny, A. Gusnanto, and A. Ploner. 2005. False discovery rate, sensitivity, and sample size in microarray studies, Bioinformatics 21(13), 3017–3024. Sarkar,S.K. 2006. False discovery and false non-discovery rates in single-step multiple testing procedures, Annals of Statistics 34(1), 394–415. Vardeman, S.B. 1994. Statistics for Engineering Problem Solving, PWS Publishing, Boston, Massachusetts: p.257.

Next: Appendix C: The Value of Factorial Experiments »
Evaluating Testing, Costs, and Benefits of Advanced Spectroscopic Portals for Screening Cargo at Ports of Entry: Interim Report (Abbreviated Version) Get This Book
×
Buy Paperback | $29.00 Buy Ebook | $23.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

To improve screening of containerized cargo for nuclear and radiological material that might be entering the United States, the Department of Homeland Security (DHS) is seeking to deploy new radiation detectors, called advanced spectroscopic portals (ASPs). The ASPs are intended to replace some or all of the current system of radiation portal monitors (called PVT RPMs) used in conjunction with handheld radioisotope identifiers (RIIDs) to detect and identify radioactive material in cargo. The U.S. Congress required the Secretary of Homeland Security to certify that ASPs will provide a 'significant increase in operational effectiveness' over continued use of the existing screening devices before DHS can proceed with full-scale procurement of ASPs for deployment. Congress also directed DHS to request this National Research Council study to advise the Secretary of Homeland Security about testing, analysis, costs, and benefits of the ASPs prior to the certification decision.

This interim report is based on testing done before 2008; on plans for, observations of, and preliminary results from tests done in 2008; and on the agency's draft cost-benefit analysis as of October 2008. The book provides advice on how DHS' Domestic Nuclear Detection Office (DNDO) can complete and make more rigorous its ASP evaluation for the Secretary and the nation.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!