Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
9 Feasibility of a National Reference Ballistic Image Database In the formative era of modern firearms examination, Hatcher (1935:291â292) noted a development that he interpreted to be suggestive of the adage that âa little knowledge is a dangerous thing.â âCertain very well-intentioned individuals recently came very near having a federal law enacted to require every maker of a pistol or revolver to fire and recover a bullet from each gun made, and to mark that bullet with the number of the gun, and keep it for reference by the legal authorities in case a crime should later be committed with a gun of that caliber.â Hatcher argued against this forerunner of a national ballistic toolmark database (if not a national ref- erence ballistic image database), citing the complexity of the task and the workload burden it would create: In the first place, it is by no means certain that a bullet fired through the same gun several years later would match the one kept for record, for theÂ barrel may have rusted or otherwise changed during the interval. InÂ the second place, the matter of the classification of bullets so as to lighten theÂ labor of looking for the right one of the thousands of record bullets has not, and probably never can be, solved, for the fine scratches, parallel to the rifling marks, on which this identification depends, have nothing by which they can be sub-classified. [Although fingerprints can be classified by general shape patterns, bullets can] be roughly classified by caliber, number of grooves, direction of rifling, etc.; but there is no method of sub- classification. Suppose, for example, that the maker produces only 1000 .38 Special caliber guns in the same year. There will be five or six grooves on each bullet, say 5000 groves to be compared in trying to match the murder bullet to only one yearâs production of guns of only one maker. It 223
224 BALLISTIC IMAGING may take from fifteen minutes to one hour to compare each groove, and looking searchingly into the comparison microscope is impossible for more than about three hours a day, otherwise the operator is likely to suffer severely from eye-strain, fatigue, and headache. At this rate, it would take one operator something like four or five years to search one manufacturerâs record bullets for one yearâs production of one caliber of gun. More than 70 years later, ballistic imaging technology has demonstrated its capacity to address some of these concerns, providing an initial analysis and sorting of massive volumes of evidence thatânow, as thenâare impos- sible for a human examiner to process. The question is whether the technol- ogy has advanced to the point that a massive, national database of exhibits and images from new and imported firearms is any more tractable than the collection Hatcher described as well intentioned but dangerous. In this chapter, we present the argument from the preceding chapters in order to answer the primary, titular question of our study: Is a national ref- erence ballistic image database (RBID) a feasible, accurate, and technically capable proposition? In Section 9âA, we discuss the basic question of how many guns would be included in a national RBID, followed in SectionÂ 9âB with an outline of other general assumptions on the shape and content of a national RBID. Subject to those assumptions, we consider in Section 9âC the technical aspects of establishing such a database from the information management and manufacturing perspectives, the statistical feasibility of such a database, and other perspectives on the issue. SectionÂ 9âD presents our general conclusions. We then discuss the implications of our conclu- sions on subnational, state-level RBIDs that currently exist or that may be created (Section 9âE). This is important because conclusions for or against a national RBID impact not only state RBIDs butâdepending on the weight placed on supporting argumentsâon the long-term viability of a crime-Âevidence database like the National Integrated Ballistic Information Network (NIBIN) as well. Some detailed probabilistic calculations related to the statistical feasibility of an RBID are laid out more fully in the appen- dix to this chapter, in Section 9âF. 9âAâ A NATIONAL REFERENCE DATABASE: HOW MANY GUNS? An important consideration in evaluating the feasibility of a national RBID is the magnitude by which ballistic imaging workload would increase: How many guns would have to be entered into such a database? Yearly firearm production figures compiled by the Bureau of Alco- hol, Tobacco, Firearms, and Explosives (ATF) reveal that domestic fire- arms manufacturers produce between 3â3.5 million firearms per year (see Table 9-1). Approximately one-third of these, on the order of 1 million,
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 225 TABLE 9-1â Firearms Manufactured in and Exported from the United States, 2002â2004 Firearms 2002 2003 2004* Manufactured â Handguns 1,088,584 1,121,024 1,022,610 ââ Pistols 741,514 811,660 728,511 ââ Revolvers 347,070 309,364 294,099 â Rifles 1,515,286 1,430,324 1,325,138 â Shotguns 741,325 726,078 731,769 â Miscellaneous 21,700 30,978 19,508 Total 3,366,895 3,308,404 3,099,025 Exported â Handguns 56,742 42,864 39,081 ââ Pistols 22,555 16,340 14,959 ââ Revolvers 34,187 26,524 24,122 â Rifles 60,644 62,522 62,403 â Shotguns 31,897 29,537 31,025 â Miscellaneous 1,473 6,989 7,411 Total 150,756 141,912 139,920 *The cover sheet for the 2004 report indicates that 26 percent of manufacturers did not file reports for 2004. No such response or compliance rates are indicated in the 2002 and 2003 reports. SOURCE: Data from Bureau of Alcohol, Tobacco, Firearms, and Explosives Annual Firearms Manufacturing and Export Reports, 2002â2004. are handguns; rifles are the modal category, constituting 35â40 percent of annual domestic firearms production. Relatively few of these firearmsâ only about 150,000âare exported from the United States. By comparison, tabulations from the U.S. Census Bureauâs Foreign Trade Division (see Thurman, 2006) indicate that 844,866 handguns were imported to the United States in 2004, most from Austria (29 percent), Brazil (24 percent), and Germany (17 percent). Nearly twice as many handguns were imported to the United States as rifles (489,740); an additional 71,625 shotguns and combination guns were imported in 2004 (Thurman, 2006). However, the enabling action for entry in a national RBID is not the production of a firearm or its arrival in the United States; rather, it is the sale of a firearm. The previously cited firearms manufacture statistics do not directly correspond to annual sales to individual customers; they include production for military and law enforcement purposes, and they include guns that may sit in inventory rather than be quickly sold. The ATF esti- mates about 4.5 million ânew firearms, including approximately about 2Â million handguns, are sold in the United Statesâ each year (U.S. Bureau
226 BALLISTIC IMAGING of Alcohol, Tobacco, and Firearms, 2000:1). It is important to remember that these figuresâand the coverage of a national RBIDâinclude only the primary gun market, which covers sales from licensed dealers to consumers. Cook and Ludwig (1996) estimate that about 2 million secondhand guns are sold each year in the United States, from a mixture of primary and secondary sources (where the secondary gun market includes transactions by unlicensed dealers). The answer to the question of how many guns would have to be entered into a newly established national RBID each year depends crucially on the exact specification of the content of the databaseâwhether the database is restricted to handguns and whether imported firearms from foreign coun- tries are required to be included. As we discuss further in the next section, we generally assume that a national RBID wouldâat least initiallyâfocus on handguns, and hence an annual entry workload of 1â2 million firearms per year, depending on whether imports are included. 9âBâ Assumptions In Box 1-3, we describe some basic assumptions about the nature of a national RBID, with particular regard to the wording used in past legisla- tion and in the enabling language of the currently operational state RBIDs. It is useful to begin the assessment of the feasibility of a national RBID by revisiting those assumptions. Fundamentally, we assume that a national RBID wouldâat least initiallyâbe tantamount to a scaled-up version of the current state RBIDs. First, we assume that the âballistic sampleâ required for entry in the database would consist of expended cartridge cases and not bullets. Though the enabling legislation in Maryland and New York was vague on this point, the only operationally feasible approach was to restrict attention to casings. It takes more operator time (and money) to enter bullet evidence into a system such as the Integrated Ballistics Identification System (IBIS) than casings, and requiring recovery of a bullet specimen at the end of the manufacturing process would be unduly burdensome. That would require firing into a water tank or other nondestructive trap; as in test firings con- ducted by the police, firings into a tank must be done one at a timeâand the bullet retrieved from the tank between each firingâin order to prevent damage to the specimens and to ensure that recovered bullets are identified as coming from the proper gun. Collecting cartridge casings also involves additional timeâthe protocol must allow for a casing to be attributed to the correct gun sourceâbut the ejected casing is more amenable to rapid recovery than spent bullets that must be separately fished from a tank. Second, we assume that the focus of a national RBID would be on handguns, as the major gun class used in crime. Expanding state RBIDs
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 227 to include long guns has been contemplated by legislation in Maryland but not enacted. These first two assumptionsâcartridge cases only and a restriction to handgunsâcombine to limit the ability of the national RBID to generate âcold hitsâ to one group of firearms: revolvers, which do not automatically expel cartridge casings and, hence, would leave casings at a crime scene only if the gun user manually emptied them at the scene (e.g., to reload). However, we believe that the assumptions are realistic to make the program tractable at the outset. Third, we assume that the actual process of generating samples and acquiring images from them would follow very closely the New York Com- bined Ballistic Identification System (CoBIS) model: that is, that most of the burden of generating the sample of cartridge casings would fall on firearms manufacturers, who would include the sample in the firearmâs packaging. The burden of actually acquiring images and entering them in the database would be done by another entity, and the envelope containing the sample would be sent for imaging (along with related information) at the point and time of sale. In principle, images could be acquired by manufacturers, but the approach poses major problems both operationally and conceptually. In terms of operations, it would require the placement of at least one IBIS-type installation at every manufacturerâs location and require trained operators, a very costly proposition. Technology for mass batch capture of images from cartridge cases could be developedâForensic Technology WAI, Inc. (FTI), continues to develop a prototype, which it dubs the Virtual Serial Number Systemâbut the technology is not yet mature, and working with large batches of samples simultaneously exacerbates the problem of ensuring that the sample packaged from a gun was actually fired from that gun (see Section 9âB.2). Conceptually, imaging by the manufacturer is problematic because it is a step removed from the objective of an RBID, connecting ballistics evi- dence with a point of sale and not the point of manufacture. Achieving the link to point of sale would require a further database of sales, presumably to be merged periodically with the image database using the firearm serial number and other data. Imported firearms are particularly tricky in this regard because they raise potential problems of differential compliance. U.S. legislation to estab- lish a national RBID could compel manufacturers to include test-fired exemplars with newly shipped firearms, for entry into the database, but foreign manufacturers might not be so bound. Hence, imported firearms may involve the additional workload of test firing before sale, in addition to acquiring images. A critical assumption that underlies much of the political debate over a national RBID deals with the information entered into the database along with exhibit images: Should information on the firearmâs purchaser be logged
228 BALLISTIC IMAGING in the database, rather than just information on the firearm? The extent to which personal information is recorded raises the question of whether imple- mentation of a national RBID is tantamount to establishing a national gun registry. Again, we assume that the New York CoBIS model would hold. In New York, licensing information completed at the time of sale is sent along with manufacturer-supplied casing samples to the state police headquarters for processing. However, that personal (purchaser) information is immedi- ately separated from the ballistic image processing and forwarded to another agency, and it is not entered into the CoBIS database. We interpret the goal of a national RBID as suggesting an investigative lead to the point of sale. This is obviously not as direct a lead as could be the case, and requires that investigators follow up with seller records to progress further (akin to the standard gun tracing process described in Box 9-1), but it could still provide BOX 9-1 Tracing Guns The Gun Control Act of 1968 (18 U.S.C. 922(a)) established the legal frame- work for regulating firearms transactions in the United States, requiring that any individual engaged in the selling of guns in the United States must be a federal firearms licensee (FFL). Significantly, the act also established a set of requirementsâa paper trailâdesigned to allow the tracing of the chain of com- merce for any given firearm, from its manufacture or import through its first sale by a retail dealer. Each new firearm, whether manufactured or imported, must be stamped with a unique serial number (27 CFR 178.92; ATF Ruling 76-28). Manu- facturers, importers, distributors, and FFLs are required to maintain records of all firearms transactions, including sales and shipments received; FFLs must also report multiple handgun sales and stolen firearms to ATF and provide transac- tion records to ATF in response to firearms trace requests. When FFLs go out of business they are required to transfer their transaction records to ATF, which then stores them for use in tracing. Local law enforcement agencies may initiate a trace request by submitting a confiscated gun and associated information to the ATFâs National Tracing Center (NTC); in addition to descriptors of the gun itself, this associated information may include the location of the recovery of the gun, the criminal offense associated with the recovery, and the name and date of birth (if known) of the firearmâs possessor. The NTC searches this information against its in-house databasesâthe records of out-of-business FFLs and the records of multiple handgun sales. If no matching information is found from these queries, NTC agents contact the manufacturer or importer and begin following the chain of subsequent transfers until they identify the first retail seller and (through that FFLâs records) the first buyer of the gun. The table below summarizes gun trace results in 1999, omitting on the order of 11,000 trace requests from foreign agencies (summary counts and percentages are recomputed from the cell entries in the original table).
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 229 some spark to criminal investigations that may otherwise grow cold. The assumption that purchaser information would not be recorded in an RBID is consistent with the federal law that prohibits the establishment of âany system of registration of firearms, firearms, owners, or firearms transactions or dispositionsâ by federal or state agencies (18 U.S.C. 926(a)). We also assume that the user interface to a national RBID would mirrorâand likely build on top ofâthe current interface of the NIBIN program. Specifically, we assume that queries on the database would be initiated by state and local law enforcement agencies, who would acquire images from evidence they wished to compare and send them over a net- work for comparison. (Doing this on NIBIN-supplied IBIS equipment, and effectively using the existing NIBIN terminals as the interface to the RBID, would obviously require changes in legislationâwhich currently limits Trace Result Count Percent Completed Traces (by method) 82,669 52.9 â Out-of-business FFL records 13,167 8.4 â Multiple sale reports 3,627 2.3 â FFL record 60,526 38.7 â Other 5,349 3.4 Incomplete/Not Traced (by reason) 73,690 47.1 â Too old 16,192 10.4 â Serial number problem 16,920 10.8 â Error on trace request 17,588 11.2 â Dealer record problem 15,123 9.7 â Other 7,867 5.0 Total 156,359 100.0 SOURCE: Cook and Braga (2001:Table 1). Of the guns submitted for tracing in 1999, slightly more than half were successfully traced to the point of origin. Trace failures may be caused by the age of the gun (e.g., manufactured before 1968 and hence exempt from serial numbering and recordkeeping), or because of problems with the serial number, the submission form, or the information on file with the FFL where the gun was first sold. âEnd to endâ or investigative tracesâcompletely documenting the chain of possession from manufacture or import through the most recent ownerâare con- siderably more expensive and are not routine. However, under the Youth Crime Gun Interdiction Initiative, ATF does perform âend to endâ tracing for all firearms recovered from people under 21 years old.
230 BALLISTIC IMAGING NIBIN to crime-scene evidenceâand in the memoranda of understanding with partner sites.) A partial explanation for the scarcity of hits from the current state RBIDs in Maryland and New York is a relative scarcity of searches performed on the system, and a key reason for that lack of queries is that questioned evidence must be transported to a specific site for entry on RBID-specific equipment. To promote usage of the system, we assume that ways would be found to allow local law enforcement to directly query the database without turning over the physical evidence to other agencies, thus raising concerns about the chain of custody of that evidence. In articu- lating this model, we further assume that possible high-probability matches on the national RBID would be returned to those localities for their review and, if desired, for them to subsequently request pieces of physical evidence to confirm a hit. A technical assumptionâand a difference between a national RBID and the existing NIBIN systemâconcerns the performance of automatic comparison requests. In the current NIBIN framework, any new piece of evidence entered into the system incurs an automatic comparison against all evidence entries within that NIBIN siteâs partition, and the results of that comparison are returned to the local site after processing at one of the three ATF national laboratories. (Manual comparison requests can also be initiated.) This default behavior is sensible for a database like NIBIN, which is assumed to consist exclusively of case-related evidence and for which the interrelationship between entries is of interest. In a national RBID, however, the interrelationships between entries in the database are not of direct interest (since there is no reason to expect a match between two newly manufactured or imported guns), and performing comparison requests as each new entry is added only serves to increase the computa- tional demands on the system infrastructure. What is interesting in the RBID setting is the comparison results that are obtained when a piece of crime scene evidence is entered and compared against the RBID. Hence, we assume that comparison requests in a national RBID would be manually generated or automatic when it is known that a new image being acquired comes from crime scene evidence. â This is not to say that interrelationships between RBID entriesâand what comparison scores say about themâare uninteresting; indeed, an RBID provides ideal opportunities for studies of system performance in a large database of known nonmatches. Hence, comparison requests of RBID entries against the balance of the database are of great potential research interest, but are logically unnecessary as part of the data entry process.
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 231 9âCâ Technical Feasibility 9âC.1â Information Management Perspective At one basic level, a national RBID is technically feasible: Current and projected computer capabilities can handle the information flows Âassociated with such a database. In our assessment, a national RBID would be a sizable but not insurmountable computational challenge and would be within the capacity of existing technology. The human workload necessary to process exhibits and acquire images would be formidable, but pos- sible. In this section we describe this conclusion using basic calculations thatâalthough âback of the envelopeâ in natureâare meant to be âworst caseâ projections. We include computational, networking, staffing, and physical require- ments, and impose a number of stricter assumptions (beyond the general nature of the database) in making this analysis. These additional assump- tions include: 1. The work of collecting test-fired exhibits and acquiring images from them will be distributed across a small number of geographic sites. In this, we diverge from the New York CoBIS and Maryland MD-IBIS models, where routing of all database entries through a single site is tractable, and move toward the existing NIBIN model where computational infraÂstructure is divided across three sites (and entry dispersed over more than 200 locali- ties). Economies of scale are maximized if the workers and machines are clustered into a dozen or less geographic centers. We will assume that there are 10 such data acquisition centers. 2. Assume a data entry rate of samples from 1 million guns per year, and that image acquisition itself takes approximately 5 minutes. The 5-Âminute mark follows from our high-level assumption that cartridge cases, and not bullets, are to be imaged into the system, and is a plausible assumption with the current two-dimensional imaging standard. However, it may be an overly optimistic assumption for three-dimensional surface measurement, as it has developed to date (see Chapter 7), if that emerges as the imaging standard for the database. That said, the time needed to acquire three-dimensional measurement data has decreased significantly from the earliest efforts at imaging three-dimensional contours of bullets; with further refinement and automation, a 5-minute acquisition time is not unreasonable in the long run. 3. Allow 5 minutes per entry for associated tasks, such as barcode reading, preparing and mounting the exhibits, and transporting exhibits between physical storage areas. 4. Data collection for this national system would run 24 hours a day,
232 BALLISTIC IMAGING 7 days a week. Timeliness of searches on the database requires round-the- clock operation. Under these assumptions, six guns or exhibits can be processed by a human operator each hour. Multiplied by 2,000 hours per year, this implies 12,000 guns processed per operator per year, and hence a human staff of at least 84 operators. A three-shift staff of 84 requires 28 data entry terminals; to allow headroom for maintenance (or equipment failures), this could be expanded to 40â42 data entry terminals. The rate at which queries are made of the national RBIDâthat Âexhibits are entered by state and local law enforcement agencies for comparison purposes with the databaseâwill depend on local law enforcement accep- tance and staff limitations. As described in previous chapters, large differ- ences between jurisdiction in the effective use of the existing NIBIN system depends on differences in acceptance of the technology, hence the set of recommendations in Chapter 6 to enhance NIBIN by making it a more vital part of the investigative system. The actual use of New Yorkâs CoBIS database, in terms of queries made, has been vastly short of expectations. Still, we have to assume that the presence of a national RBID would lead to the desire to conduct searches against it, as the technology is accepted and such searches become routine. Hence, for the purposes of this section, we assume 1,000 query exhibits are entered (nationwide) each day. It is expected that these searches will be done on an ad hoc basis, rather than in large batches. A reference image will be sent in parallel to a col- lection of geographically dispersed servers, over conventional networking, for comparison against stored images. The systemâs ability to handle this throughput depends on the speed of the comparison process and the size of the database against which the reference image is compared. As we reiter- ate later in this chapter, a common logical flaw in considering a national RBID is looking at the large number of new guns produced annually (that would have to be entered in the database) and assuming that the system will automatically be swamped by the computational demands of performing one-against-millions comparisons. However, one would never do a straight comparison of one image against the entire database; like the current IBIS and NIBIN setup, some demographic filtering will inevitably be done to reduce the size of the comparison set. In addition to demographic filtering, similar subsetting may be done on the shape of the firing pin, gun entry and crime occurrence dates, gross features of the casing, and (perhaps) geographic region and proximity. Exactly how much of a reduction can be expected is an open question and would impact the computational require- ments. If it can be assumed that reference images can be compared against stored images at a rate of 30 per second (on a PC-class machine), and that demographic subsetting can whittle down the comparison set of images to
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 233 1/20 of the full database size, thenâin aggregateâcomparing a reference image to 1 yearâs worth of RBID data would mean performing 50 million pairwise comparisons per day. This would require 20 PC-class machines as comparison servers. If one plans for a factor of three in âheadroom,â then 60 machines are required. Each year that the system is in operation, 60 additional machines must be purchased (or the original 60 replaced by ones that are twice as fast). Storage space, both electronic and physical, is a significant âwild cardâ in implementing the technical infrastructure for a national RBID. In terms of electronic storage, the per-casing disk storage for two-dimensional greyscale images as currently done by the IBIS platform is on the order of 1 mega- byte. At 1 million casings per year, the aggregate system must be capable of storing 1 terabyte of information during the first year, and then to add 1Â terabyte per year thereafter. Given modern computing environments, this is certainly feasible. However, these demands would have to be scaled upward with a change in imaging standard, either to finer-resolution two- dimensional photography or to three-dimensional imaging. The per-casing storage would also increase if practices such as those we recommend for the NIBIN programâentering of more than one exemplar per gun, particularly one of a different ammunition typeâare used as standard protocols for a new national RBID. Physical storage of the casing exhibits is also an impor- tant consideration. We expect that human firearms examiners would still be needed to confirm âhitsâ on the national RBID through direct comparison; hence, the physical casings must be retained and must remain accessible. They must be filed in such a way that they can be retrieved with ease, that they are not damaged, and that there is minimal risk of being exchanged or confused with exhibits from a different firearm. Hence, simply packing envelopes of exhibits in large boxes and warehousing them is not a viable option, and the physical structure would have to be designed accordingly. The computing and network assumptions sketched above suggest that the informational throughput in one directionâsubmitting an inquiry to the database for processingâis manageable. However, care would have to be taken in specifying the reciprocal flow of comparison results back to requesting sites. Though we critique the IBIS 20 percent threshold elsewhere in the report and recommend that it be revisited (Recommendation 6.15), the threshold does serve the purpose of limiting the amount of image and score data that must be pushed back from regional correlation servers to NIBIN partner agencies for every comparison request. Some limit on the number of results routinely returned on comparison requests would likely have to be established to keep transmission times in check. The preceding is a somewhat simplified list of concerns from the information management perspective; practically, the implementation of a national RBID would raise relatedâand complexâconcerns. Of these,
234 BALLISTIC IMAGING access controlâhow and from which locations an RBID search can be initiated and who is enabled to edit recordsâis a particularly significant one. Computer security and database encryption are also not built directly into the preceding assumptions, but would involve cost and computational burden, as well as maintaining compliance with relevant regulations at the federal level and at access points (e.g., state law enforcement agencies). Poli- cies on âsunsettingâ of exhibits and procedures for removing entries (e.g., if a gun is known to have been recovered by police and destroyed) would have to be considered in assessing the growth of the database. 9âC.2â Manufacturing Perspective Just as we conclude that a national RBID is, strictly speaking, techni- cally feasible from the information management perspective, we conclude that it is generally feasible from the manufacturing perspective. Like the information management question, though, this assessment is very much conditional on some details in the implementation of the RBID. Specifi- cally, the specification of the database content and the question of how images are to be acquired and entered into the system are critical in judging how disruptiveâand costlyâRBID implementation would be to firearms manufacturers. At the most basic level, the collection of exhibit casings from newly manufactured firearms should be relatively tractable because, conceptually, all it would require is a systematic, cross-manufacture standardization of current practices of test firing for quality control. Manufacturers routinely test (or proof) fire new firearms to assess product safety issues; the needed change in procedures would be to recover the casing(s), label them, and keep them associated with the correct firearm through the remaining parts of the manufacture process (e.g., packaging and shipping). There is cost associ- ated with reconfiguring the late stages of production to accommodate this process and in providing adequate personnel to keep the process moving, and there is cost associated with the slowing of productionâhowever slight that might beâto ensure that the casing collection is done accurately. The accurate connection between a newly manufactured firearm and the exhibit casings packaged with it has emerged as an issue with the exist- ing state RBIDs. Tew (2003) noted a problem with the sets of two fired cartridge cases included with new Glock firearms, part of a large batch purchase by the Scottsdale, Arizona, Police Department. Two casings each were retained for 15 of the new pistols during the departmentâs qualifica- tion shooting, and these casings were compared against each other and with the casings in the envelope provided by the manufacturer. Examiners determined that only 2 of the 15 guns had manufacturer-provided casings that could be matched to the new post-purchase test firings; for 2 other
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 235 guns, a match was possible with one of the provided casings but not the other. The remaining 11 appeared to have manufacturer-provided samples that were not from the actual gun that was sold; worse, in 6 cases, the two packaged casings were determined to be from two different firearms, neither of which were the sold gun. The Maryland State Police Forensic Sciences Division (2003:8) noted a similar problem, also with a Glock firearm. Evidence from a gun known to have been sold in Maryland was matched against the Maryland RBID, but no good matches were found: When the crime gun in question was later recovered, it was found that the casing entered in MD-IBIS did not appear to have been fired from that gun. âGlock has since taken measures to correct this problem on their end,â the report observed. During one of the committeeâs site visits to manufacturers, personnel from Beretta USA estimated that their per-gun charge to perform test firings (and thus comply with Marylandâs MD-IBIS database) is about $7. But this is for a relatively limited number of guns; if manufacturers were required to perform shell casing capture for compliance with a national database, some efficiencies would doubtless be realized. Still, it must be recognized that implementation of a national RBID would, in at least the short term, detrimentally affect manufacturersâ production schedules and thus result in a commensurate increase in product costs. There is also a likely significant detrimental effect on the profitability of the companies because the delivery schedule for products plays such an important role in capturing overhead and fixed costs. Collecting test-fired exhibits from newly manufactured firearms raises one set of logistical issues; collecting such exhibits from newly imported fire- arms poses similar problems. Foreign manufacturers could not be directly bound to supply test-fired exemplars with their weapons, so the process of unpackaging, test firing, cleaning, and repackaging imported firearms would likely be shifted to domestic distributors. For both newly manufactured and newly imported firearms, a criti- cal question that would have to be addressed is exact specification of the conditions under which test fires are to be performed and the number of firings that must be completed before designating one or two casings as the ballistic sample. As described in Section 3âD.3, the concept of a âsettle-inâ effect would be a greater concern if bullets were used as the sample rather than casings; in that event, the prevailing view among firearms examiners would hold that the gun must be fired 8â10 times before its unique mark- ings stabilize. However, as mentioned in Section 3âD.3, structural features like paint on the breech face can lead to early shot-to-shot variability in â eretta B USA is headquartered in Accokeek, Maryland, hence the immediate need to comply with Maryland statute.
236 BALLISTIC IMAGING cartridge case markings. New York CoBIS personnel have partnered with manufacturers to consider a related problem, which is the effect (if any) of the cleanliness of a new firearm on the first cartridges fired through the weapon. Specifically, it remains an open question whether the presence of heavy grease or oil when weapons are pulled from the assembly line for test firing diminishes the breech face or firing pin impressions on recovered cartridges. One firearms manufacturer the committee visited suggests that they fire up to three rounds; if there is some ground to doubt the clearness of marks on the very first firing(s), and it is necessary to fire more rounds through each weapons, the cost of RBID compliance (in both time and money) would ratchet up accordingly. 9âC.3â Statistical Perspective Following the logic of the preceding sections, a national RBID is tech- nically implementable; we now turn to the fundamental question of the overall feasibility and accuracy of such a database in providing investiga- tive leads. A useful framework is to consider the basic problem in working with ballistic image databases probabilistically. Define a true match to be the case when a firearms examiner confirms a suggested possible match from an image database query. One can decompose the probability of a true match into a number of conditional probabilities that capture the various stages involved in getting a true match: Pr(true match) = Pr(true match | potential match with item based on images) Ã Pr(potential match with item based on images | item in top K) Ã Pr(item in top K | item in database) Ã Pr(item entered in database | item submitted to database) Ã Pr(item submitted to database | item collected in field) Ã Pr(item collected in field) All but one of the components in this expression involve human and not algorithmic issues: â¢ Pr(true match | potential match with item based on images) mea- sures the concordance of physical evidence similarity (as determined by the firearms examiner through direct physical comparison) with similarity based on database images. â¢ Pr(potential match with item based on images | item in top K) mea- sures whether the human firearms examiner can pick out a potential match when images are ranked as highly similar in a list of possible matches.
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 237 â¢ Pr(item in top K | item in database) measures the ability of the algorithm to rank the item in the top K results. â¢ Pr(item entered in database | item submitted to database) measures the chance that the item was entered into the database as opposed to not being entered (e.g., caught in a backlog). â¢ Pr(item submitted to database | item collected in field) measures the chance that the item was submitted for further processing. â¢ Pr(item collected in field) measures whether the evidence was col- lected at the time of manufacture or sale (for an RBID) or whether it was found and recoverable at a crime scene (for a NIBIN-type database), and whether it was damaged or otherwise rendered unfit for analysis. These are the major components to determining how good the overall ballistics identification system is. The technical, algorithmic component of this expressionâPr(item in top K | item in database)âis an important one; it is the focus of the major studies outlined in Section 4âE, the experi- mental work described in Chapter 8, and the balance of this section. It is important to remember, though, how that component fits in the whole system; that single probability can be quite highâeven 1âand yet a bal- listics identification system could be judged a failure, depending on the other components. The discussion of overlap metrics in Section 8âB.3 suggests a way of framing the problem using a simple binomial model; the appendix to this chapter, Section 9âF, develops a model in fuller generality. Suppose one compares a reference casing with N guns in an image database; for simplicity, assume that there is one correct casing (gun) in the database that matches this reference exhibit and that all the other entries are nonmatches. Also assume that the ballistics identification system yields a single list of ranked exhibits; this is tantamount to looking only at one type of marking on the casings, which is generally not advisable, but is a useful simplification for these approximate calculations. Let the overlap metric be p, the probability that the similarity score for a correct casing will be smaller than that for the nonmatches. Assume that all of the N com- parisons are independent (see the appendix, Section 9âF, for more elaborate structures). If X is the number of casings in the database that yield a higher similarity score than the correct match, then X follows a binomial (n,p) distribution. One can use this to assess the likelihood of the correct match being in a top 10 list of ranked probable matches (akin, in this model, to flipping N coins, each with probability p of turning up tails, and calculating the probability of getting nine or fewer tails). This simple probability model can be used to make approximate statements on how good the identifica- tion systemâs similarity scores and overlap metrics have to be in order to have effective identification. For instance, suppose that the database against
238 BALLISTIC IMAGING which a reference casing is to be compared has 10,000 elements; how small does p have to be in order to have the correct casing appear in the 10 high- est rankings at least 99 percent of the time? In probabilistic terms, how small does p have to be so that Pr(X < 10) â¥ 0.99? As a rough calculation, the properties of the binomial distribution are such that if Np = 10, then the probability of the matching casing being in the top 10 is only around 0.46. Therefore, as N gets very large, p has to be accordingly small. In fact, p needs to be approximately 4/N to get the correct match in the top 10 rankings 99 percent of the time. To get in the top 10 rankings 90 percent of the time, p can be around 6.2/N. In a com- parison database of 100,000 images/guns, then p needs to be on the order of 6.2 Ã 10â5 to have a 90 percent chance of the correct matching casing in the top 10. The estimated overlap metrics in Table 8-6 can be used to assess the feasibility of databases of different sizes. These specific metrics correspond to calculations using the analysis of three-dimensional topographic data by the National Institute of Standards and Technology (NIST), and not the current two-dimensional IBIS system, but they are instructive nonetheless because we found the three-dimensional system to perform comparably with IBIS. For a moderate database of size N = 100,000, the only estimated values of p small enough are those that are zero. However, one can see from Table 8-6 that, with the exception of breech face measurements on the NBIDE exhibit set, the overlap metrics are all too large to be adequate. Even if N is as small as 100, the success rate for top 10 lists for the DKT exhibit sets are still less than 0.5. The success rate would only be slightly higher (56 percent) for the NBIDE firing pins. The breech face measurements on the NBIDE exhibits stand out as being excellent. Under the most optimistic scenario (grouping by casing), for a database of size N = 100,000, the success rate is about 90 percent. If, instead, there is grouping by guns, then the success rate is only 50 percent. Under the pessimistic scenario of a single group, p needs to be on the order of 6.2 Ã 10â5, so that the estimate of p = 0.002 is over 30 times too large, despite being orders of magnitude smaller than anything else. The above analysis was just for a single casing. When there are multiple firings from each gun, one can form separate matching and nonmatching distributions for each gun, resulting in a different p for each gun. In gen- eral, having more numerous and more refined groups will lead to more optimistic conclusions, while having fewer groups that are pooled will lead to more pessimistic conclusions. That is, success in a very large database demands a very small value of p. Thus casings that are not very distin- guishable will tend to increase the estimated p of their member group to unacceptably high levels. Having a smaller group limits the damage done by a single casing.
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 239 9âDâ Conclusion Conclusion: A national reference ballistic image database of all new and imported guns is not advisable at this time. Three lines of reasoning have particular salience for this conclusion. The first has to do with the general use and role of ballistic imaging tech- nology. The current technology in use for automated toolmark comparison, based on two-dimensional greyscale images, can be useful for gross catego- rization and sorting of large quantities of evidence. However, it appears to be less reliable for distinguishing extremely fine individual marks as is necessary to make successful matches in RBIDs, where large numbers of exhibits on file would share gross class and subclass characteristics. Throughout the report, and particularly in Chapter 4, we make it clear that we view ballistic imaging as a form of computer-assisted firearms identification and advise against practicesâlike overreliance on âtopÂ 10â comparisonsâthat impute to ballistic imaging an unwarranted level of precision for identifying matches. The temptation to expect too much from a national RBIDâto expect âhits,â and investigative leads to points of sale, with high frequencyâis misguided given that the event of a single, particular new gun being used in committing a crime is relatively rare. The difficulty in achieving matches in an RBID is compounded by the gross samenessâin class and subclass characteristicsâof large segments of the database exhibits. Ballistic imaging can be an effective tool for screening and filtering, and can be 70â95 percent successful in finding same-gun matches using cartridge case markings, as Nennstiel and Rahm (2006b:28) concluded. This is very good performance, but De Kinder et al. (2004) compellingly demonstrate that this performance can degrade in databases flooded with same-class-Âcharacteristic images; we saw much the same thing in our limited work entering exhibits in the New York CoBIS database (described in Chapter 8). The second salient argument concerns the capacity of ballistic imaging systems to distinguish true matches from nonmatches, as described in Sec- tion 9âB.3 and Chapter 8: Basic probability calculations, under reasonable assumptions, suggest that the process of identifying a subset of possible matches, that contains the true match with a specified level of certainty, depends critically on as-yet-underived measures of similarity between and within gun type. The process may return too large a subset of candidates to be practically useful for investigative purposes. We emphasize that we do not frame this argument strictly as a âbreak- downâ or massive degradation in matching capability with database size. Pure reliance on a numeric breakdown argument maligns all forms of ballistic imagingâa national RBID most immediately, due to the large
240 BALLISTIC IMAGING number of guns involved. But such arguments would apply in short order to state RBIDs, to the national-level NIBIN crime scene evidence database, and, ultimately, to individual databases maintained by metropolitan police departments (particularly for popular caliber families). What we do com- ment on is accuracy of making matches within some range of possible matches and with a specified level of probability; the experimental work described in Chapter 8 suggests that the existing imaging methodologies (including NISTâs three-dimensional-topography prototype) do not have the discriminatory power needed to reliably place true matches in the top rankings using imaging comparisons. Though there is no special magic in the top 10 ranks, there is also a practical limit in the number of potential matches that any human exam- iner or operator is likely to page through and consider in his or her work; though the existing methods can be made to work well, they simply do not work well enough to make a national RBID practical. De Kinder (2002a:202) reached a similar conclusion in his assessment of implementing national RBIDs, generally: The goal of the [RBID] is to identify a cartridge case or bullet found at the crime scene. Let us try to evaluate the effort needed to find a cartridge case of caliber 9mm PARA in a relatively small ballistic fingerprinting database of 400,000 entries, containing a single cartridge case per firearm. A pre- selection on the caliber has to be performed first. For bullets, a further pre-selection on the general rifling characteristics can be performed. The general occurrence of this caliber is about 30%. [We] define the discrimi- nating power of an automated comparison system as the percentage of the hit list you have to examine manually in order to have an acceptable probability of 99.99% of including the correct firearm. The current com- puting time needed to perform the comparisons and set up a hit list is still acceptable, as it can be run overnight. A discriminating power of 1% corresponds to 120 cartridge cases. All of them have to be manually com- pared with the questioned cartridge case using a comparison microscope. This is a substantial task, as the traces on the cartridge cases will be much alike. . . . This number is, at its very best, linear in the number of firearms contained in the ballistic fingerprinting database. This means that higher performances of the comparison algorithm are needed in order to perform comparisons in a feasible way. A common argument against a national RBID is the perceived ease with which such a database could be âdefeatedâ by replacing firearms parts (like the firing pin) or taking deliberate action to alter the individual markings of firearms. Defeat is perhaps too strong a word, but the third salient point that has particular weight is a potentially much easier (and likely more unintentional) way of dampening an RBIDâs ability to find matchesâthe
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 241 choice of ammunition used in shooting. The potential large influence of ammunition type and variability is a significant source of error in identifi- cation. A standard, protocol type of ammunition could be specified in an RBID (as it is in NIBIN), but it may not correspond with the ammunition used in crime; the choice of protocol ammunition, or a requirement to use multiple ammunition types, could have significant financial implications for both ammunition and firearms manufacturers. In addition to these three core arguments against a national RBID, other supplemental arguments contribute to our assessment that a national RBID is inadvisable. As indicated in Sections 9âB.1 and 9âB.2, too much remains unknown about the real costs of implementing collections for such a database in the context of the existing firearms manufacturing environ- ment. Furthermore, the means for ensuring that the sample of casings included with a newly manufactured gun actually originated from that gun lies at the heart of the enterprise; the issue of chain of custody of the test fires in order to provide a legal linkage is a daunting challenge. De Kinder (2002a:199â200) adds another argument against a national RBID, which is thatâby constructionâthe content of an RBID is not truly representative of the firearms used in crime, the set with which RBID entries would ultimately be compared. Specifically, De Kinder reports the results of a limited test in Belgium, in which for 1 year police processed and imaged all ballistics evidence acquired by the police in one section of the country, crime-related and noncrime-related. The âfirearms not directly related to crimeâ included âfirearms which are in illegal possession for failing to comply with the current firearms law and firearms which were proactively seized after family problems.â This type of test is substantially weaker than the creation of a pure RBIDâin the U.S. context, it would correspond to a relatively modest expansion of NIBINâs scope rather than the imaging of all new and imported firearms. Still, the composition of the dataset after 1 year suggests a basic difficulty: the resulting set of images is inherently âbias[ed] towards other types of guns than those normally used at crime scenes.â That is, even when restricting searches by caliber and other demographic information, an RBID necessarily overrepresents some types of guns (e.g., those from smaller manufacturers, possibly more expensive and intricately machined guns) relative to their use in crime. The Maryland State Police, Forensic Sciences Division (2003:9â10), made the same observation based on the first 3 yearsâ experience of the Maryland RBID, comparing the c Â ommon makes of guns entered in the RBID with ATF gun trace statistics. In particular, several revolvers are among the most frequently traced guns in Maryland (including the most frequently traced gun, a Smith & Wesson .38 revolver), which is inherently problematic for RBIDs since ârevolvers are less likely to leave cartridge casings at crime scenes than are pistols.â
242 BALLISTIC IMAGING 9âEâ Implications for State Reference Ballistic Image Databases Having concluded that a national RBID is inadvisable at this time, a natural follow-up question is what this conclusion means for the state-level RBIDs currently in operation in Maryland and New York and as may be implemented by other states. Although the core arguments that can be made against a national RBID can be applied to a state RBID, we conclude that the smaller-scale state databases are critically important proving grounds for improvements in the matching and scoring algorithms used in ballistic imaging. Indeed, they provide an ideal setting for the continuing empirical evaluation of the underlying tenets of firearms identification in general. The state databases can be a critical, emerging testbed for research in ballistic imaging and firearms identification. Early in ATFâs work with the IBIS platform, Masson (1997:42) observed that as ballistic image databases grew in size, the IBIS rankings tended to produce suggested linkages that might look promising on-screenâand might also be tricky to evaluate using direct microscopy: As the database grew within a particular caliber, 9mm for instance, there were a number of known non-matched testfires from different firearms that were coming up near the top of the candidate list. When retrieving these known non-matches on the comparison screen, there were numerous two dimensional similarities. When using a comparison microscope, these similarities are still present and it is difficult to eliminate comparisons even though we know they are from different firearms. Far from undermining the utility of the system, Masson (1997:43) argued that this finding presented a critical learning opportunity. âIn the past, best examples of known nonmatched agreement were collected from casework and thus, surfaced sporadically;â in addition to the potential for generating hits, Masson suggested value in studying misses. âFirearms examiners should take advantage of this current expanded database to fully familiarize themselves with the extent of similarities found in many non-identifications in order to hone their criteria for striae identificationâ because the âexaminerâs power of discrimination can be heightened because of the experience.â Even in the best of operational circumstances, RBIDs should not be expected to produce torrents of hits or completed matches. They are, at root, akin to detecting low-base-rate phenomena in large populations, and present particular difficulties becauseâby constructionâsuch large popula- tions contain a great many elements that are virtually identical in all but the tiniest details. A major reason that the current state databases have underperformed in generating hits is that they have been undersearched. As
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 243 put most bluntly, in a discussion of the MD-IBIS hit that yielded a criminal conviction, by a critic of the current implementation, âIf you donât use the system . . . it isnât going to workâ (quoted in Butler, 2005). The utility of state-level RBIDs will depend on how often the database is actually queried in the conduct of investigations and how investigative leads are followed up. The design of the current databases, and the need to ensure a firewall from NIBIN data due to the legal restrictions on NIBIN content, have made the databases inconvenient to search: exhibits must be transported to specific facilities for acquisition and comparison. To that end, mechanisms for encouraging searches of state RBIDs by law enforcement agencies in the same state or region should be developed and the results evaluated. To the extent that law permits and arrangements can be made, broader research involving the merging and comparison of state-level RBID images with NIBIN-type evidence would also be valuable. 9âFâ Appendix: Models of Hypothesized System Performance Throughout this appendix, we restrict the discussion to cartridge cas- ings; however, the same problem formulation would apply to bullets. Suppose one has a database that consists of N images of casings, where N is a large number. These images may correspond to D different types of (new) guns. For each gun type, there are nd different images, from different guns of the same type or various gun and ammunition combinations, etc. D So the database has a total of N = â nd images. Consider now a newly d =1 acquired casing from a crime scene. One wants to compare the image of the new casing with the N images in the database and find the best K matches. The top K matches will then be scrutinized by a firearms examiner, and a direct physical comparison made will be to verify any hits. Assume that the database does in fact contain a casing fired from the particular crime gun. Then, the statistical feasibility of the problem depends on whether the correct image will be among the top K matches, when K is a reasonably small number (top 10, top 50, or even top 100) even though N, the size of the database, is very largeâon the order of millions. Specifically, some of the statistical questions of interest are: 1. What is the probability that the correct image from the database (the one that corresponds to the crime gun) will be in the top K? How does this probability decrease with N? What are the critical factors that affect it? 2. How large should K = K(a) be if we want to be certain that the correct image is in the top K with probability at least (1 â a)? How does this depend on the size of the database and other factors?
244 BALLISTIC IMAGING 9âF.1â A Simple Formulation For a particular combination of image capture technology and algo- rithm, the comparison of a newly acquired casing with the N images in the database yields comparison scores X1, . . . , XN. (The scores themselves are functions of the comparison algorithm but are considered variableâand subject to a probability distributionâbecause of the variability in the mark- ings of the newly acquired casing, because the arrival of a new casing can be seen as a draw from an underlying distribution, and because of variability in the image capture process.) Assume throughout that a high score implies a good match; further- more, as stated above, assume that there is a casing in the database that corresponds to the crime gun (so that there is a true or ârightâ match). To be specific, let X1 be the score obtained for the ârightâ match. Suppose the scores X1, . . . , XN are independent. (See the end of this section for a discussion of this assumption.) Let Xi be distributed according to Fi(x), i = 1, . . . N. Furthermore, let I j = I ï£® X j > X1 ï£¹ ï£° ï£» denote the indicator of the event that the score from one of the wrong casings has a higher score than X1, the right match. Note that the Ijâs are dependent since X1 is common to all of them. Let ( ) ( ) p j = E I j = P X j > X1 , j = 2, . . . N . One can compute pj using the expression p j = â« ï£®1 â Fj ( x ) ï£¹ dF1 ( x ) = â« F1 ( x ) dFj ( x ). ï£° ï£» The key random variable of interest for our problem is N T = â Ij, j =2 the number of scores that are ranked higher than the true match X1. The questions of interest can be answered if one can compute the distribution of T. For example, the probability that the score of the true casing is in the top-K matches is obtained by computing P(T < K) â¥ 1 â a: that is, the probability that the total number of wrong matches is strictly less than K. Similarly, the question of how large should K be chosen to ensure that this probability is at least (1 â a) is answered by choosing K so that P(T < K) â¥ 1 â a. Analyzing this distribution will also show how
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 245 the probability and K = K(a) vary with the size of the database and what other factors influence them. It is clear that they depend critically on pjâs, the probabilities. (Other important parameters are discussed below.) If the Xjâs are independent, then in the simple case where all the pjâs are the same and equal p, T will have a binomial distribution with param- eters N and p. However, the Xjâs are not independent; in this case, with a single p, T has a correlated binomial distribution with a simple correlation structure. In our application, however, the pjâs will all be different, and the distribution of T is more complicated. But one can still write down expres- sions for the distribution of T. For example, the probability that X1 is the top score is N P (T = 0 ) = â« â ï£®1 â pj ( x)ï£¹ dF1 ( x) ï£° ï£» j=2 ( ) where p j ( x ) = P X j > x . Expressions for P (T = k) and P (T â¤ k) can be similarly written down. However, one will have to resort to numerical or other kinds of approximation to compute the required probabilities. Since N is very large, a normal approximation is the simplest and most natural. It is easy to see that N E (T ) = Î² = â p j . j =2 For computing the variance, since the Ijâs are dependent (due to common X1), we have to take the covariances into account. The variance of T is N N N j = 2 k= 2 j =2 ( ï£° ) Var (T ) = Î³ 2 = â â ï£® p jk â p j pk ï£¹ = Î³ 2 = â p j 1 â p j + 2â ï£® p jk â p j pk ï£¹, ï£° ï£» j >k ï£» where pjk = pj if j = k and ( ï£° ) p jk = P X j > X1 , Xk > X1 = â« ï£®1 â Fj ( x ) ï£¹ ï£®1 â Fk ( x ) ï£¹ dF1 ( x ) , j â k. ï£»ï£° ï£» One can now approximate the distribution of T by a normal distribu- tion with mean b and variance gâ2. Based on this, the probability of having the correct match being in the top-K scores can be approximated as ï£« KâÎ²ï£¶ P (T â¤ K ) = Î¦ ï£¬ . ï£ Î³ ï£·ï£¸ Furthermore, to ensure that this probability of the correct one being in ï£« KâÎ²ï£¶ the top-K scores is at least (1 â a), i.e., Î¦ ï£¬ â¥ 1 â a , we must take ï£ Î³ ï£·ï£¸ K â¥ Î² + Î³ Î¦ â1 (1 â a ) .
246 BALLISTIC IMAGING The key factors underlying these are b and g, which depend on the pjâs and the pjkâs. To see more clearly what influences these pjâs and pjkâs, suppose the distributions of the Xiâs are all Gaussian, that is, F1(x) is ( ) N Âµi , Ï i2 , I = 1,. . . , N . (One can just as easily consider any other para- metric distribution.) 9âF.2â Calculations and Insights In the rest of this appendix we take the Xiâs to be independent and normal with mean mi and variance Ï i2 . Then ï£« Âµ âÂµ ï£¶ ( ) p j = P X j > X1 = Î¦ ï£¬ j 1 ï£· ï£¬ Ï2 + Ï2 ï£· , j = 2, . . . , N . ï£ 1 j ï£¸ Furthermore, ï£« Âµ j â Âµ1 â Ï 1z ï£¶ ï£« Âµ â Âµ â Ï z ï£¶ ( ) p jk = P X j > X1 , Xk > X1 = â« Î¦ ï£¬ ï£ Ïj ï£· Î¦ï£¬ ï£¸ ï£ k 1 Ïk 1 ï£· Ï ( z ) dz, ï£¸ where f(z) is the standard normal density. These correspond to probabilities of quadrants of bivariate normal random variables and have to be calcu- lated numerically. We offer two general observations. First, the Gaussian case is much more general than it seems at first. The rankings of the scores are invariant under any ï£» ï£° ( ) monotone transformations of the Xiâs, i.e., I ï£® X j > X1 ï£¹ = I ï£® h X j > h ( X1 ) ï£¹ ï£° ï£» for any monotone increasing, continuous function h(î¿). Thus, assuming a lognormal distribution, for example, is equivalent to assuming a normal distribution. Second, recall the assumption that the scores, X1, . . . ,XNâs, are inde- pendent. Since these are all matches to the same casing from the crime scene, a natural question is whether this will induce dependence among the Xiâs and if so how will the assumption of independence affect the results. Sta- tistically speaking, what is the difference between treating the image of the crime scene casing as fixed versus random? It turns out, however, that if the effect of the common source of dependence is the same on the Xiâs, it does not matter. Specifically, suppose Xi = Yi + Z for i = 1, . . . ,N where the Yiâs are independent and Z is the common source of dependence for the Xiâs due ( to the crime scene casing. Then, it is easy to see that I ï£® X j > X1 ï£¹ = P Yj > Y1 , ï£° ï£» ) where the Yiâs are independent. The dependence can be more than addi- tive, as long as it is additive up to a monotone increasing transformation.
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 247 More specifically, if Xi = h (Yi + Z ) for a monotone increasing function, ( ) then I ï£® X j > X1 ï£¹ = I Yj > Y where the Yjâs are independent. It is possible ï£° ï£» that the âeffectâ of the common source (i.e., the crime scene casing) is not the same on the different images, in which case the analysis will be more complicated. We will not deal with this case here. For two cases, we compute the probabilities of interest under several scenarios to see how they vary with N and the parameter values mjâs and sjâs of the Gaussian distribution. Case 1 We start with the simple case where there is only one gun type, D = 1, and all the images correspond to different guns of the same type. In some sense, this is the make-or-break case, since there has to be enough separa- tion of the images that correspond to guns of the same type. One has the matching image X1 from the crime scene gun and the others X2, . . . , XN that are all from different guns but of the same type. To keep things simple, assume that X2, . . . , XN all have the same distribution with parameters m2 and s2. Let m1 and s1 be the mean and standard deviation of the match- ing image. The computations depend only on m1 â m2, so one can assume without loss of generality that m1 = 0. We consider different values for N and D = m1 â m2 in the calculations. In this analysis, we address only the second question that is posed in the introduction: What are the values of K = K(a) needed to ensure a confi- dence level of at least 100(1 â a)%, that is, that a correct image is found in the top K with at least the specified probability? The tables below give the number K of matches we need to examine to ensure that the true casing is in the top K for a given size of the database and parameter configurations. We also give K corresponding to 50 percent even though a 50 percent con- fidence level would commonly be viewed as unacceptable; the main reason for giving it is because it corresponds to the mean of the random variable T. It provides a (conservative) lower bound to the value of K under various assumptions about the variances of the Xjâs. Optimistic Scenarioâ It turns out that the values of K(a) depend greatly on the ratio of s1 to s2, that is, the variability of the true match to that of the wrong matches. First take the extreme case where s1 = 0, i.e., X1 has zero variance. Recall that one is interested in the random variables I j = I ï£® X j > X1 ï£¹ and T = â I j . If X has zero variance, then the I âs are N ï£° ï£» j =2 1 j independent. Furthermore, in this special case where X2, . . . , XN have the same distribution, T has a binomial distribution.
248 BALLISTIC IMAGING TABLE 9-2â Values of K(a) for Various Configurations of N and Î± for the Optimistic Scenario Confidence Level â N â 1 = n1 â 1 50% 75% 90% 99% 2 1,000 23 26 29 34 2 10,000 228 238 247 262 3 10,000 14 16 19 23 3 100,000 135 143 150 163 4 100,000 4 5 6 8 4 1,000,000 32 36 39 45 4 10,000,000 317 329 340 359 5 10,000,000 3 5 6 7 5 100,000,000 29 33 36 42 Table 9-2 gives the values of K(a) for various combinations of N and D that might be of interest. For example, if D = Âµ1 â Âµ2 = 4 and there are about 100,000 images from the same type of gun in the database, and one wants a 99 percent confidence level, then one needs to look at the top K = 8 matches. If N increases to about 1,000,000, then one needs to look at the top K = 45 matches. The situation considered hereâthat variance of X1 is zero or very small relative to that of the other matchesâis a very optimistic scenario. The required number of matches will be much larger when the variance of X1 is of the same order of magnitude as that of the other Xjâs. We turn to this comparison next. But a caveat is in order first: the confidence levels in Tables 9-2 through 9-5 refer only to the probability of the true match being in the top K. They do not say anything about the correct one being actually identified in practice, which would depend on a firearms examiner reviewing the results of all K matches and finding the correct one (retriev- ing the physical evidence for a direct comparison). This may or may not actually happen. Pessimistic Scenarioâ This scenario considers exactly the same setup as before except that s1 = s2. The results depend only on the ratios, so one might as well take them to equal one. For the computations in Table 9-3, we used Monte Carlo simulation to approximate the probabilities ï£« Âµ j â Âµ1 â Ï 1z ï£¶ ï£« Âµ â Âµ â Ï z ï£¶ ( ) p jk = P X j > X1 , Xk > X1 = â« Î¦ ï£¬ ï£ Ïj ï£· Î¦ï£¬ ï£¸ ï£ k 1 Ïk 1 ï£· Ï ( z ) dz. ï£¸
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 249 TABLE 9-3â Values of K(a) for Various Configurations of N and Î± for the Pessimistic Scenario Confidence Level â N â 1 = n1 â 1 50% 75% 90% 99% 2 1,000 79 165 245 380 2 10,000 787 1,660 2,450 3,815 3 1,000 17 50 80 130 3 10,000 169 500 800 1,310 4 1,000 2 12 20 33 4 10,000 23 110 190 325 4 100,000 233 1,110 1,900 3,255 5 10,000 2 18 33 60 5 100,000 20 190 340 600 5 1,000,000 203 1,875 3,380 5,970 6 100,000 1 23 43 76 6 1,000,000 11 230 430 770 6 10,000,000 110 2,310 4,280 7,680 7 1,000,000 1 25 45 80 7 10,000,000 4 230 430 775 8 10,000,000 1 10 20 40 Even though the simulation error was less than 10â8, the error in the standard error of T can be large when the database size N is of the order of 106 or bigger. Recall that there are roughly N2 covariance terms. So there is large variability in the values of K in Table 9-3 for large N, and for these cases, they should be interpreted only as providing approximate guidelines. Several features are of interest in Table 9-3. First, the values of K are much larger than in Table 9-2. The reason for the larger values of K is ( ) that the mean of T is smaller since p = Î¦ â D 2 instead of Î¦ ( â D ) in the earlier case. Furthermore, the variance of T is now much larger due to the positive correlation among the I j = I ï£® X j > X1 ï£¹ âs. This dependence gets ï£° ï£» larger with the ratio Ï 1 Ï 2 , i.e., the variance of X1 relative to the others. A particularly discouraging feature is that, for fixed D and a, the values of K scale up almost linearly in the size of the database N. In the independent case in Table 9-2, the standard deviations were scaling up in terms of N . But here they are scaling up linearly due to the covariances. More specifically, there are (N â 1)(N â 2) covariance terms, and these are
250 BALLISTIC IMAGING TABLE 9-4â Values of K(a) for Various Configurations of n1 â 1, n2, D1, D2, and a for the Optimistic Scenario Confidence Level D1 n1 â 1 D2 n2 50% 75% 90% 99% 2 1,000 3 1,000 25 28 31 36 2 1,000 4 10,000 24 27 30 35 2 1,000 5 100,000 23 26 29 34 2 1,000 5 1,000,000 23 26 29 34 3 10,000 4 100,000 17 20 22 27 3 10,000 4 1,000,000 46 50 54 61 4 1,000,000 5 1,000,000 32 36 40 46 4 1,000,000 5 10,000,000 35 39 43 49 about the same order as the variance of Ij, so the standard deviation of T is now increasing linearly with N; this is troublesome as it leads to much larger values of K. Case 2 We now consider situations in which there is more than one gun type in the database. The essence of the problem can be captured by just two types, so we restrict attention to this case. Again, assume that X1 has mean m1 and 2 variance Ï 1 , all the Xjâs corresponding to the same gun type as X1 have 2 common mean m2 and variance Ï 2 , and finally all the Xjâs Âcorresponding 2 to the second gun type have common mean m3 and variance Ï 3 . Tables 9-4 and 9-5 give the values of K = K(a) for various values of D1Â = m1 â m2, D2 = m1 â m3, n1, and n2. Table 9-4 corresponds to the optimistic scenario where the variance of X1 is zero. Recall that the Ijâs are all independent in this case. Table 9-5 corresponds to the pessimistic case where the variance of X1 is the same as the variance of the other Xjâs. The calculations in Tables 9-4 and 9-5 suggest thatâas in the simpler one-gun caseâvalues of K can quickly grow to levels of practical implausibility from the perspective of reviewing database comparison reports, particularly for low D values and less-clear separations between gun types. However, they also illustrate the importance of the degree of mean separation between the images from different gun types (akin to the discussion of overlap metrics in Section 9âC.3). Notice in Table 9-5 that if D2 is 2 units bigger than D1 and n1 = n2, the values of K in Table 9-5 are about the same as that in Table 9-3. A similar conclusion
FEASIBILITY OF A NATIONAL REFERENCE BALLISTIC IMAGE DATABASE 251 TABLE 9-5â Values of K(a) for Various Configurations of n1 â 1, n2, D1, D2, and a for the Pessimistic Scenario Confidence Level D1 n1 â 1 D2 n2 50% 75% 90% 99% 3 1,000 5 1,000 17 52 82 135 3 1,000 6 10,000 17 51 82 135 3 1,000 7 100,000 17 51 81 133 4 10,000 6 10,000 24 113 192 330 4 10,000 7 100,000 24 112 192 330 4 10,000 8 1,000,000 24 112 190 325 5 10,000 7 10,000 3 20 35 60 5 10,000 8 100,000 3 20 35 62 5 10,000 9 1,000,000 3 19 35 61 6 100,000 8 100,000 2 25 50 85 6 100,000 9 1,000,000 2 24 44 76 6 100,000 10 10,000,000 2 26 50 80 7 1,000,000 9 1,000,000 1 30 50 90 7 1,000,000 10 10,000,000 1 26 48 85 holds if D2 is 3 units bigger than D1 and n2 = 10n1 or if D2 is 4 units bigger than D1 and n2 = 100n1. So, for instance, the ability to detect matches in a relatively small database containing equal numbers of moderately distinct images (D1 = 4, D2 = 6; 10,000 each) is comparable to that when one small set of images (D1 = 4; 10,000) is flooded with 1,000,000 images that are vastly different in mean (D2 = 8).