Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 58
Appendix B
Performance Metrics for ASPs and PVTs
“Far better an approximate answer to the right question, which is often vague, than
an exact answer to the wrong question, which can always be made precise.”
– John W. Tukey (1962), “The Future of Data Analysis,” Annals of Mathematical
Statistics 33(1), p.1–67. (The citation appears on p.12.)
When evaluating the performance of instruments to identify the system most well suited
to a given task, one needs to consider the correct metric for making the comparison. In the case
of systems such as the advanced spectroscopic portals (ASPs), conventional measures such as
sensitivity and specificity provide useful information, but do not assess directly test performance
in actual field operation. The metrics of interest concern the probabilities of making incorrect
calls, i.e., the probability that the cargo actually contained dangerous material when the test
system allowed it to pass (a false negative call), and the probability that the cargo actually
contained benign material when the test system alarmed (a false positive call). This appendix
describes the calculations leading to estimates of these probabilities, the uncertainties in these
values, and how these estimated probabilities can be used to compare two systems under
consideration.
NOMENCLATURE
Test system performance usually is characterized in terms of detection probabilities. The
notation for these probabilities comes from the literature for comparing medical diagnostic tests,
and we use the same notation here for radiation detection systems, so we begin with some
terminology.
In formal notation, the absolute probability of event A is written P{A}. The probability
that event A happens given condition or event B is written as P{A|B}. The event after the vertical
bar “|” is the event on which the probability is conditioned; i.e., the event that preexists. For the
rest of this appendix, we define the following.
A = cargo contains SNM
B = Test system alarms
Ac = The complement of A, cargo contains no SNM (benign) 47
Bc = The complement of B, test system does not alarm
Sensitivity (S) = probability that the test system alarms, given that the underlying
cargo truly contains special nuclear material (SNM). S = P{B|A}
Specificity (T) = probability that the test system does not alarm, given that the
underlying cargo truly contained benign material (non-SNM). T = P{Bc|Ac}
47
Some non-SNM radioactive material is not benign, but for simplicity, this appendix refers to non-SNM as benign
material.
58
OCR for page 59
APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 59
Prevalence (p) = probability that cargo contains SNM. p = P{A}
Positive predictive value (PPV) = probability that the underlying cargo truly
contains SNM, given that the test system alarms. PPV = P{A|B}
Negative predictive value (NPV) = probability that the underlying cargo truly
contains non-SNM, given that the test system did not alarm. NPV = P{Ac|Bc}
WHAT WE WANT TO KNOW AND WHAT WE CAN KNOW
Although we might want to know the sensitivity and specificity of the detection systems,
because their definitions rely on true knowledge of the cargo contents, we can estimate a
system’s sensitivity and specificity only from a designed experiment. The experimenters insert
into the cargo either SNM or benign material, and then run the cargo through the test systems;
the proportion of SNM runs that properly set off the test system alarm is an estimate of the test’s
sensitivity, and the proportion of benign runs that properly pass the test system is an estimate of
the test’s specificity. Such design studies are artificial scenarios intended to represent a range of
possible real-world events.
In real life we do not know the cargo contents. We see only the result of the test system:
either the test system alarmed, or it did not alarm, and the probability of getting an alarm given
that SNM is present is not necessarily the same as the probability that SNM is present given that
the system alarmed (P{B|A} ≠ P{A|B}). Operationally, if the system alarms, SNM is suspected;
if the system does not alarm, the cargo is allowed to pass. We are concerned especially with this
question: Given that the test system did not alarm, what is the probability that the cargo
contained SNM? That is, what risk do we take by allowing a “no-alarm” container to pass? From
the standpoint of practical operational effectiveness, this false negative call probability (FNCP =
P{A|Bc} = 1-NPV) 48 has grave consequences. As shown below by Bayes’ Theorem, it is a
function of sensitivity (S) and specificity (T), as well as of prevalence (p = P{A}), but a
comparison between two test systems on the same scenario (i.e., the same threat) involves the
same prevalence, so prevalence does not enter into the comparison of effectiveness for the two
test systems. Accurate estimation of sensitivity and specificity is important, in that it allows us to
compare accurately the performance of two test systems using the relevant, practically
meaningful metric.
As noted above, from designed studies we can estimate S and T, such as those conducted
at the Nevada test site. We also can derive confidence limits on S and T from such designed
experiments, and hence we can estimate (1–NPV) and associated confidence intervals. More
importantly, we can compare the two systems via a ratio, say the FNCP ratio (1-NPV1)/(1-NPV2).
A FNCP ratio whose lower confidence limit exceeds 1 indicates preference for test system 2,
while a ratio whose upper confidence limit falls below 1 indicates preference for test system 1.
Note that these ratios may differ for different scenarios; a table of these ratios may suggest
strategies for associating the ratios with the threat levels presented by different scenarios.
Notice also that the probability of making a false positive call (FPCP) is likewise of
interest for purposes of evaluating costs and benefits: Too many false positive calls can also be
costly by slowing down commerce, diverting CBP personnel from potential threats as they spend
48
Some analyses refer to “false discovery rate” and “false non-discovery rate,” which are related to (1–PPV) and (1–
NPV), respectively, but their definitions are slightly different (see Box B.1).
OCR for page 60
60 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
Box B.1: A comment on notation
We denoted by FNCP the probability of making a false positive call and by FPCP the probability of
making a false positive call; i.e.,
FNCP = P{ true + | test calls “–” }
FPCP = P{ true – | test calls “+” } .
We related these probabilities to the following generic two-way table of test outcomes (notation from
Benjamini and Hochberg, 1995, p.291, referred to as BH95, is in parentheses):
Test calls Test calls Total
Truth “Positive” “Negative” Tests
N m m0
True POSITIVE N (V) N (U)
N m0
True NEGATIVE N (S) N (T)
Total calls R m–R M
We estimated the false negative call probability via the proportion of negative-call tests (m R) that
were in fact positive (N+), or U/(m R) in BH95 notation. Similarly, we estimated the false positive
call probability via the proportion of positive-call tests (R) that were in fact negative (N+), or V/R in
BH95 notation. BH95 address the situation known as “multiple testing,” where one is conducting many
hypothesis tests (e.g., hundreds or thousands of tests as occurs in gene expression experiments), and
wants to control the frequency with which one declares as “significant” (e.g., “positive”) tests which in
fact are negative. Hence Benjamini and Hochberg (1995) define the expected proportion of false positive
calls, E(V/R), as the “false discovery rate,” or FDR. They provide a procedure based on the m p-values
from the m tests so that one has assurance that, on average, the proportion of "declared significant" tests
that in fact are not significant remains below a pre-set threshold (e.g., 0.05). If we estimate the FPCP as
V/R, we can think of this estimated FPCP as an estimate of Benjamini and Hochberg’s FDR. In analogy
with E(V/R)=FDR, some have termed E(U/(m R)) the “false non-discovery rate.”
Our situation differs from the multiple testing situation in two ways. First, our two-way table arises
from a designed experiment where values of m0 and m are set by design. Second, our bigger concern
lies not with false positive calls but rather with false negative calls; i.e., with the probability that a cargo
declared “safe” (negative) actually is dangerous (true positive). The table suggests that we can estimate
FNCP as U/( m R)). Some authors have called the expected value of this ratio, E(U/(m R)), the “false
non-discovery rate” (see Genovese and Wasserman 2004; Sarkar 2006). But with both FNCP and FPCP,
one needs further information about the frequency of true “positives” and true “negatives” (in the form
of p = probability that cargo contains SNM or other threatening material) beyond the m tests given in the
design. In fact, as further tests are conducted, better estimates of FNCP and FPCP can be obtained by
incorporating better estimates of sensitivity and specificity, as well as p, into the formulas for FNCP and
FPCP. For that reason, we have chosen to derive the relevant probabilities using Bayes’ formula, rather
than using the terms “false discovery rate” and “false non-discovery rate,” which often are estimated
from only the table of outcomes from multiple tests. For further information, see the references listed at
the end of Appendix B.
OCR for page 61
APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 61
time investigating benign cargo, reducing confidence in the value of the system and increasing
the likelihood that operators might not take results seriously. Two detection systems that have
exactly the same probability of a false negative call for a given scenario, but substantially
different values of the probability of making a false positive call, may indicate a preference for
one system over the other. The probability of making a false positive call equals 1–PPV.
We illustrate these calculations from hypothetical data below. Suppose we have 24
trucks, into 12 of which we place SNM and leave only benign material in the remaining 12
trucks. We run all 24 trucks through two test systems, and observe the following results:
Test System 1 Test System 2
No Total No Total
Alarm Alarm
Alarm Runs Alarm Runs
SNM in
10 2 12 11 1 12
cargo
Non-SNM
4 8 12 2 10 12
in cargo
14 10 24 13 11 24
Sensitivity is the probability that the system alarmed, given the presence of SNM in the
cargo: among the 12 trucks that contained SNM, 10 alarmed for test system 1 (estimated
sensitivity S1 = 10/12) and 11 alarmed for test system 2 (estimated S 2 = 11/12). Similarly, we
estimate specificity for the two test systems as 8/12 and 10/12, respectively (the fraction of “no
alarm” results out of the 12 non-SNM trucks). Because we specified the number of runs in each
condition ( n1 =12 for SNM runs and n2 =12 for non-SNM runs), we can estimate the
uncertainties in these probabilities using the conventional binomial distribution. In this case, the
lower 95% confidence bounds determined from the binomial distribution based on n1 n2 12
are:
Test System 1 Test System 2
0.833 (10/12) 0.917 (11/12)
Estimated Sensitivity
(0.516, 0.979) (0.615, 0.998)
95% confidence interval
0.667 ( 8/12) 0.833 (10/12)
Estimated Specificity
(0.349, 0.901) (0.516, 0.979)
95% confidence interval
(The wide intervals result from the small sample sizes.)
More importantly, the negative predictive value (NPV, the probability that the truck truly
did not contain SNM, given that the alarm did not sound) is 8/10 for test system 1 and 10/11 for
test system 2, and hence we estimate the probability of making a false negative call for the two
systems as
• proportion of cases where test system 1 did not alarm (10 cases) but actually
contained SNM cargo (2 cases) = 2/10 = 0.20
• proportion of cases where test system 2 did not alarm (11 cases) but actually
contained SNM cargo (1 case) = 1/11 = 0.09
OCR for page 62
62 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
Clearly, test system 1 appears to be less reliable than test system 2. The calculation of the
lower bounds on these estimated probabilities is not as straightforward as using the binomial
distribution, as was done for sensitivity and specificity, because the denominator (10 in the
outcome of the performance tests of system 1 and 11 in the outcome of the performance tests on
system 2) arose from the test results, not from the number of trials set by the study design. That
is, the denominator “10” for test system 1 (and “11” for test system 2) is the sum of two numbers
that might differ if the test were re-run. Confidence bounds can be obtained as a function of
sensitivity (S) and specificity (T) (see Box B.2).
Bayes’ rule (Navidi, 2006) states:
P A B c P B c A P A [( P B c A P A) ( P B c Ac P Ac )]
FNCP = (1)
where
P{A|Bc} = probability that event A occurs, given confirmation that event Bc has occurred
(here, P{cargo contains SNM | test system does not alarm} = 1 – NPV)
P{Bc|A} = probability that event Bc occurs, given confirmation that event A has occurred
(here, P{test system does not alarm | cargo contains SNM} = 1 – S)
P{Bc|Ac} = probability that event B occurs, given confirmation that event Ac has occurred
(here, P{test system does not alarm | cargo contains no SNM} = T).
Box B.2: Uncertainty in the ratio FNCP1/FNCP2
The uncertainty in the ratio FNCP1/FNCP2 ≈ [(1-S1)/(1-S2)][T2/T1] can be approximated using
propagation of error formulas. Let ratio = N/D denote a generic ratio (N = Numerator, D =
Denominator).
Var ( N ) Var ( D)
SE (ratio) SE ( N / D) ratio
N2 D2
When T and S have binomial distributions, Var(T1)=T1(1-T1)/n1, Var(S1)=S1(1-S1)/n1 and likewise for
Var(T2) and Var(S2), where n1 [n2] is the number of trials on which S1 and T1 [S2 and T2] are estimated
(in experimental runs at Nevada Test Site, n1n212 or 24). Hence, the standard error (square root of
the variance) of (1-S1)/T1 is approximately
1 S1 1 T1
S1
n1 1 S1 n1T1
T1
so the standard error of the ratio of false negative call probabilities (when p is tiny) is approximately
FNCP1 FNCP Var FNCP Var FNCP2
FNCP FNCP
SE
1 1
.
FNCP 2 FNCP22
2
2
1
So,
T2 1 S1 S1 S
1 T2
1 T1
SE FNCP1 / FCNP2 n2 .
n1 2
T2
T1
T1 1 S 2 1 S1 1 S 2
OCR for page 63
APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 63
The sensitivity (S = P{B|A}) and specificity (T = P{Bc|Ac}) can be estimated from the
experimental test runs and, factoring the prevalence, p, we can calculate FNCP:
1 S p 1
FNCP = P{A|Bc} = , (2)
1 S p T 1 p 1 y
where
y = [T/(1S)][(1p)/p].
Systems with lower values of P{A|Bc}, i.e., with higher values of y, are preferred.
Denoting by S1 , T1 , S 2 , T2 the sensitivities and specificities of systems 1 and 2,
respectively, system 1 is preferred over system 2 if FNCP1 < FNCP2 ; i.e., if
y1 > y2
i.e., if
T1 1 p T2 1 p
1 S p 1 S p
1 2
which is the same as either
T1 T
2 (3)
1 S1 1 S 2
or
T1 1 S1
. (4)
T2 1 S 2
That is, a comparison of FNCP for test system 1 (FNCP1) with that for test system 2
(FNCP2) reduces to a comparison of [(specificity)/(1-sensitivity)] for the two systems. We can
estimate uncertainties on our estimates of sensitivity and specificity (based on the binomial
distribution; see above discussion). Hence, we can approximate the uncertainty in [(1 – S)/(T)],
and ultimately the uncertainty in the ratio of false negative call probabilities (see Box B.2) —
which does not involve assumptions on p (likelihood of the threat). Notice that test system 1 is
always preferred if T1 ≥ T2 and S1 ≥ S2, because T1 ≥ T2 implies that the left-hand side of
Equation (4) exceeds or equals 1, and S1 ≥ S2 implies that the right-hand side of Equation (4) is
less than or equal to 1; hence Equation (4) is satisfied. (If T1 = T2 and S1 = S2, then the test
systems are equivalent, in terms of sensitivity, specificity, and false negative call probability, so
either can be selected.) In real situations, however, one test system may have a higher test
sensitivity but a lower specificity. For example, if T1 = 0.70 and T2 = 0.80 (test system 2 is more
likely to remain silent on truly benign cargo than test system 1), but S1 = 0.950 and S2 = 0.930
(test system 1 is slightly more likely to alarm if the cargo truly contains SNM), then (4) says that
OCR for page 64
OCR for page 66
OCR for page 67
OCR for page 68
64 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
test system 1 is preferred, because T1/T2 = 0.875 and (1-S1)/(1-S2) = 0.05/0.07 = 0.714. The
FNCP for the two systems are
1 1
FNCP1
T1 1 p 14.00 1 p
1
1
1 S1 p p
1 1
FNCP2
11.43 1 p
T2 1 p
1
1
p
1 S2 p
so clearly FNCP1
APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 65
Ac = complement of A = event that cargo does not contain SNM
Bc = complement of B = event that test system does not alarm
P{Ac|B} = probability that event Ac occurs even though B occurred(here, P{cargo
contains no SNM | test system alarms} = 1 – PPV)
P{Bc|Ac} = probability that event B occurs, given confirmation that event Ac has occurred
(here, P{test system does not alarm | cargo contains no SNM} = T).
P{Bc|A} = probability that event Bc occurs, given confirmation that event A has occurred
(here, P{test system does not alarm | cargo contains SNM} = 1 – S)
FPCP =(1-T)(1-p)/[(1-T)(1-p)+Sp] = 1/(1+z) where z = (Sp)/[(1-T)(1-p)].
So test system 1 would be preferred, in these terms, over system 2, if
S1 p S 2 p
1 T 1 p 1 T 1 p
2
1
i.e., if
1 T1 1 T2
S S .
1 2
To calculate the magnitude of FPCP (not just the ratio of the probabilities for the two
systems), consider that p is likely small and that S1 (or S2) may not be orders of magnitude larger
than (1-T1) (or (1-T2). In this case, the “1 +” in the denominator does matter for the absolute
magnitude of this FPCP. For the example above, where S1 = 0.95, S2 = 0.93, T1 = 0.70, T2 = 0.80,
the corresponding FPCP for p=0.10, p= 0.05, p = 0.01, p = 0.001, p = 0.0001 are:
p = 0.10: FPCP = 1/[1 + 0.31579(1/9)] = 0.96610, FPCP2 = 0.97666 (ratio =
• 1
0.9892)
p = 0.05: FPCP 0.98365 , FPCP2 0.98881 (ratio = 0.99478)
• 1
p = 0.01: FPCP 0.99682 , FPCP2 0.99783 (ratio = 0.99899)
• 1
p = 0.001: FPCP 0.99968 , FPCP2 0.99978 (ratio = 0.99990)
• 1
p = 0.0001: FPCP 0.99997 , FPCP2 0.99998 (ratio = 0.99999).
• 1
For these examples, the chance of having to re-inspect every sounded alarm, only to find
benign material, is virtually identical in both systems (and very close to 1 for both). The same is
true when S1 0.90 , S 2 0.30 , T1 0.60 , T2 0.80 :
p = 0.01: FPCP 0.95294 , FPCP2 0.93103 (ratio = 1.02353)
• 1
p = 0.05: FPCP 0.97714 , FPCP2 0.96610 (ratio = 1.01143)
• 1
p = 0.01: FPCP 0.99553 , FPCP2 0.99331 (ratio = 1.00223)
• 1
p = 0.001: FPCP 0.99956 , FPCP2 0.99933 (ratio = 1.00022)
• 1
p = 0.0001: FPCP 0.99996 , FPCP2 0.99993 (ratio = 1.00002).
• 1
66 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
The DNDO criteria for “significant improvement in operational effectiveness” involve
comparisons of sensitivity and specificity. As noted above, a test system that has higher
sensitivity and higher specificity will have a lower false negative rate. But the above calculations
also demonstrate that “nearly equal” sensitivities and specificities result in nearly equivalent
systems, and hence offer rather limited benefit for the cost. For completeness, we re-write the
DNDO criteria for “significant improvement in operational testing” (see Box 2, pp 40–41) using
the S, T notation (for sensitivity and specificity).
Let S A1) SNM , noNORM denote the sensitivity of the ASP system in primary (1)
(
screening when the cargo truly contains SNM and no NORM; i.e., S A1) SNM , noNORM =
(
P{ASP alarms | cargo contains SNM, no NORM}. Likewise, let S P1) SNM , noNORM denote
(
the sensitivity of the current (PVT+RIID) system in primary (1) screening when the cargo truly
contains SNM and no NORM; i.e., S P1) SNM , noNORM = P{PVT alarms in primary screening |
(
cargo contains SNM, no NORM} Using T to denote specificity, let TP( 2) SNM , noNORM =
P{PVT/RIID does not alarm in secondary screening | cargo contains no SNM, but possibly
NORM} (specificity).
(1) (1)
Denote by SA and SP the sensitivities of ASP and PVT+RIID combination, respectively,
(1) (1)
in primary screening, and TA and TP the specificities of ASP and PVT+RIID, respectively;
superscript (2) indicates secondary screening. DNDO has specified its criteria for “operational
effectiveness” as follows (see Sidebar 3.1 on page 29):
1. S A1) SNM , noNORM ≥ S P1) SNM , noNORM
( (
2. S A1) SNM NORM ≥ S P1) SNM NORM (different version of criterion 1
( (
above)
3. TA1) ( MI Iso) TP(1) ( MI Iso) (where “MI-Iso” indicates “licensable medical or
(
industrial isotopes).
4. 1 TA1) ( NORM ) 0.20[1 TP(1) ( NORM )]
(
T A1) ( NORM ) 0.8 0.2(TP(1) ( NORM )) .
(
5. 1 S A2 ) ( SNM ) 0.5(1 S P2 ) ( SNM )) S A2 ) ( SNM ) 0.5(1 S P2) ( SNM )) .
( ( ( (
6. Time in secondary for ASP ≤ time in secondary for RIID (no connection to
sensitivity/specificity).
Since criterion 4 is more stringent than criterion 3 and criterion 5 is more stringent than
criterion 1, we concentrate on values of sensitivity and specificity that satisfy criteria 4 and 5.
When these two conditions are satisfied (i.e., TA ≥ 0.8 + 0.2TP and SA ≥ 0.5 + 0.5SP), the ratio of
false negative call probabilities (A to B) can be as small as 1:900 – almost 1000 times smaller.
For such improvements, the ratio of both the sensitivities and the specificities must be on the
order of 0.99/0.10 or 0.95/0.10; in such cases, the false negative call probabilities are on the
order of (10-8 to 10-5). Tables of values of the probabilities of both false negative calls and false
positive calls were calculated when T A , S A , TP , and S P were set equal to 0.1, 0.2, ..., 0.8, 0.9,
0.95, 0.99; of the 114 = 14,641 combinations, only 858 satisfied criteria 4 and 5. These 858
combinations were set along with 5 different values of p = 0.01 (cargo is present in 1 of 100
trucks), 0.001, 0.0001, 0.00001, 0.000001 (1 in 1,000,000 trucks). A plot of the smaller false
APPENDIX B: PERFORMANCE METRICS FOR ASPs AND PVTs 67
negative call probability (denoted FNCP2 in the figure) versus the larger one (denoted FNCP1) is
shown in Figure B.1. (the red dashed line corresponds to the line where the two false negative
call probabilies are equal). The upper left corner shows the cases where the FNCPs are most
different ( 0.00112 FNCP1 / FNCP2 0.00311 ), which occurred in 26 of the 858 cases (26 5
points are shown, corresponding to 5 values of p). More frequently, the ratio is less dramatic
( 0.00317 FNCP / FNCP2 0.03161 for 257 of 858 cases; 0.0316 FNCP1 / FNCP2 0.3162
1
for 535 of the 858 cases; 0.3165 FNCP / FNCP2 0.4819 for 40 of the 858 cases). In each
1
case, the absolute magnitudes of the false negative call probabilities are quite small, and the
ratios of the false positive call probabilities are almost 1.
Figure B.1: Plot of FNCP2 versus FNCP1 for cases satisfying the criteria TA 0.8 0.2TP and
S A 0.5 0.5S P , for different levels of p (1 x 10-2, 1 x 10-3, 1x 10-4, 1 x 10-5, 1x 10-6). The red
dashed line corresponds to FNCP FNCP2 . The results are stratified by magnitude of the ratio
1
FNCP1 / FNCP2 (specifically, rounded values of the common logarithm of the ratio: –3, –2, –1,
0, respectively, for the four plots).
APPENDIX B REFERENCES
Benjamini, Y. and Y. Hochberg. 1995. Controlling the false discover rate: A practical and
powerful approach to multiple testing, Journal of the Royal Statistical Society, Series B 57:
289–300.
68 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
Genovese, C.R. and L. Wasserman. 2004a. Controlling the false discovery rate:
Understanding and extending the Benjamini-Hochberg Method,
http://www.stat.cmu.edu/genovese/talks/pitt-11-01.pdf.
Genovese, C.R. and L. Wasserman. 2004. A stochastic process approach to false discovery
control, Annals of Statistics 2004: 32(3), 1035–1061.
Ku, H.H. 1962. Notes on the Use of Propagation of Errors Formulae. Journal of Research of
the National Bureau of Standards 70C(4), p.269.
Navidi, W. 2006. Statistics for Engineers and Scientists, McGraw-Hill.
Pawitan, Y., S. Michels, S. Koscielny, A. Gusnanto, and A. Ploner. 2005. False discovery
rate, sensitivity, and sample size in microarray studies, Bioinformatics 21(13), 3017–3024.
Sarkar,S.K. 2006. False discovery and false non-discovery rates in single-step multiple testing
procedures, Annals of Statistics 34(1), 394–415.
Vardeman, S.B. 1994. Statistics for Engineering Problem Solving, PWS Publishing, Boston,
Massachusetts: p.257.