correlation coefficient; this is almost certainly an unrealizable goal. Absent that, what would be helpful is any kind of benchmark or context that can be attributed to system-reported scores.
Recommendation 6.8: Normalized comparison scores—such as statistical correlation scores, which scale to fall between 0 and 1—are vital to assign meaning to candidate matches and to make comparison across searches. Though current IBIS scoring methods may not lend themselves directly to mathematically normalized scores, research on score distributions in a wide variety of search situations should be used to provide some context and normalization to output correlation scores. Possible approaches could include comparing computed pairwise scores with assessments of similarity by trained firearms examiners or empirical evaluation of the scores obtained in previous IBIS searches and confirmed evidence “hits.”
As discussed in Chapter 5, it is impossible to make a full evaluation of the NIBIN program and its effectiveness because the data that are systematically collected on system performance is far too limited. The monthly operational reports that are reviewed by the NIBIN program consist of basic counts of evidence (entered that month and cumulative) and of completed hits. Even within this extremely limited set of variables, the information collected is not rich enough to answer important questions, such as whether hits are more often realized when connecting two pieces of crime scene evidence or in linking a crime scene exhibit to one test fired from a recovered weapon. Completely absent from the standard operational statistics are any indicators of the searches performed by the system (save for the fact that the entry of every piece of evidence should incur a local search by default).
Certainly, some of the data that one would like to have to evaluate the system’s effectiveness are not items that can or should be maintained within the IBIS platform; these items include any of the indications of the quality of the investigate leads generated by completed hits, whether an arrest was made in a particular case (or cases), and whether convictions are achieved. But we believe that IBIS at present is too “black box” in nature and that it is not amenable to analysis or evaluation; the system should be capable of generating a fuller audit trail and operational database than the inadequate