Watch-List Operational Performance and List Size: A First-Cut Analysis
Let p be the probability that someone presenting to a watch-list system has been previously enrolled, and F(·) be a prior distribution on this probability. F(·) may be discrete and even a point prior with all mass at one possible value π of p, a continuous distribution on the interval [0,1] such as a Beta distribution, or any other probability distribution function on a probability space on [0,1]. Two types of results may be distinguished here: matching an enrolled presenter to the correct prior enrollment sample or, less restrictively, recognizing that the presenter has previously enrolled, although perhaps by matching to the wrong enrollee. The latter is pertinent to watch-list performance because such a result would serve the intended function of denying privileges, even if for the wrong reason. We distinguish here between these two possibilities by referring to the first as identification and to the second as watch-list recognition.
Addressing the identification problem first, one is trying to match a person specifically with his or her enrollment record and is in error if the correct match is missed. The confidence we should place in a claimed match—that is, its “predictive value”—is the probability that a claimed match is correct:
Consider the effect on this predictive value of enrolling one additional person in a watch list of length n, assuming the pattern of presentations to the list is fixed at proportion p of previous enrollees. In addition to comparisons with the slightly shorter previous list, the presenter is now compared to the new enrollee. This cannot increase and may decrease P(true match|enrolled), because each comparison offers an additional opportunity for an enrolled presenter to be erroneously matched with the wrong enrollee by matching more closely with someone else’s stored data than with his or her own. Similarly, both denominator terms cannot decrease and may increase, because the new comparison offers any presenter an oppportunity of falsely matching with an extra enrollee.
Hence the ratio, PPV(p), cannot increase and may decrease with watch-list length. Using the subscript to indicate watch-list length, PPVn+1(p) ≤ PPVn(p) for any specific p. Thus, the posterior means for the two list sizes over the distribution F(p) must hold the same relationship:
These expectations are the marginal probabilities that a claimed match is correct for the different list sizes, so increasing list length by one enrollee cannot increase and may be expected to decrease the confidence warranted by a watch-list identification. Iterating this point shows that lengthening the list by any amount must have the same implications. However, this argument depends on decoupling the presentation distribution F(p) from enrollee characteristics. In a finite population setting, where increasing enrollment increases p, a much more complicated argument might be required, with the outcome dependent on the specifics of functional relationships. A general argument that would work in such a setting is not obvious.
As noted above, increasing watch-list size by one new enrollment without changing p offers an additional opportunity for each unenrolled presenter to falsely match. Thus, P(claimed nonmatch|unenrolled) can-
not increase and may decrease. The new enrollee can affect results only for those enrolled presenters failing to match their enrollment samples and gives such presenters an additional chance to match the watch list, although incorrectly, thus decreasing P(claimed nonmatch|enrolled). Assuming that list size does not affect the presentation distribution F(p), the net impact depends on the ratio of the two probabilities. In the simplest conceivable model, when comparisons between pairs of individuals are independent and true and false-match probabilities are uniformly μ1 and μ0, these are respectively (1 – μ0)n and (1 – μ1)(1 – μ0)n–1 when n subjects are enrolled, and both are multiplied by (1 – μ0) with each new enrollment, leaving their ratio and NPV(p) unchanged. But if μ0 depends on enrollment status, as might occur when attempts are made to compromise the identification process, then NPV(p) can decrease or increase when μ0 is higher for comparisons of unenrolled to enrolled presenters, or of one enrolled to other enrolled presenters, respectively. The expectation would change accordingly, in either direction.
Considering the watch-list recognition problem from the same perspective, one is now satisfied with a claim that the presenter matches someone on the list, without concern for whether the match is to the presenter’s own enrollment sample. The definition and above discussion of NPV remain unaltered because a false match of a presenting enrollee, which is the event adjudicated differently by identification and watch-list recognition, does not contribute to probabilities conditioned on the absence of a match. Moreover,
With a new enrollment to the list, an enrolled presenter who fails to match the correct enrollment sample has an added chance of matching the new enrollee and being correctly flagged as previously enrolled. This increases the numerator probability rather than decreasing it, as was the case for individual identification: numerator and denominator thus both increase. In the simple case described above, PPV can be shown to decline with list size, as was the case for identification. However, other scenarios and results are conceivable; if match probabilities differ for enrolled and
unenrolled presenters, the prior distribution F(−) depends on list size, and match comparisons may be dependent.
For an example of how linkage of F(p) to list size can change these results, consider a closed set identification system scaled up by enrolling many more users, each of whom interacts with the system daily to obtain workplace access, perhaps in a rapidly expanding corporation. Unless the number of attempted intrusions increases greatly, F(p) is shifted to the right and p stochastically increases. In the resulting change, the increasing dominance of the PPV fraction by its numerator term outweighs the increasing chance of false recognition for any single impostor challenge, because impostor challenges occur with declining relative frequency. Confidence in a match would thereby increase rather than decrease.