2
Data Fusion for Security Operations

WHAT IS DATA FUSION?

Data fusion for security operations is a state-estimation process based on data from multiple security systems or data sources.1 The states of greatest relevance to security are the threat levels (from, e.g., a bomb in baggage), although a larger set of

1

The results and definitions in this chapter derive from D.E. Brown. 2006. Data Fusion for Air Transportation Security, Technical Report 2006-3, Department of Systems and Information Engineering, University of Virginia, Charlottesville, February 14.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 19
Fusion of Security System Data to Improve Airport Security 2 Data Fusion for Security Operations WHAT IS DATA FUSION? Data fusion for security operations is a state-estimation process based on data from multiple security systems or data sources.1 The states of greatest relevance to security are the threat levels (from, e.g., a bomb in baggage), although a larger set of 1 The results and definitions in this chapter derive from D.E. Brown. 2006. Data Fusion for Air Transportation Security, Technical Report 2006-3, Department of Systems and Information Engineering, University of Virginia, Charlottesville, February 14.

OCR for page 19
Fusion of Security System Data to Improve Airport Security states could be envisaged (such as the potential for an outside attack). The goal of data fusion is to increase the accuracy of the estimate. Modern data fusion systems can involve prodigious amounts of information that must be rapidly extracted from data sources, processed, and transferred. The data sources may involve computer systems on the ground; systems on moving conveyors, vehicles, or aircraft; systems on specific sensors; and even from systems in space. The large quantities of information produced by such systems may need to be processed rapidly to be effect-tive. For transportation security, in some cases, this processing must be done in real time, sometimes in very short periods and across many interfaces. Also, the way in which information is generated is important. Proving the validity of algorithmically driven results versus that of analytically created results, and doing so in real time, is a formid-able objective. Some transportation security data fusion systems may involve information extracted from hundreds of thousands of files or records, the processing of results, and the selection and then the transfer of the proper information. And yet the developer may propose fielding such systems never having tested the extraction and transfer of more than a few files, and never having carried out the whole process end to end in real time. Also, unique simulators and emulators may need to be designed and built to exercise these systems in a realistic way during development. This work can require dozens of contractors and suppliers all working together as a team with clear, effective, and open communication. Data fusion may have a significant impact on the performance of a system (see Figure 2-1). Even multiple looks with the same security system can provide improvements in metrics such as probability of detection. As Figure 2-1 illustrates, combining multiple security systems can frequently overcome the ambiguity present in many situations and defeat attempts at deception. FIGURE 2-1 Data fusion overview. The fusion of data from multiple systems can improve the results from any individual system.

OCR for page 19
Fusion of Security System Data to Improve Airport Security Data fusion will most likely be incorporated both within security systems and as a separate fusion component implemented apart from individual security systems. Better performance (as measured by increased probability of detection and decreased probability of false alarm) can sometimes be achieved through parametric-data fusion—that is, the combining of data at the signal level rather than fusion at the decision level. This report examines the issue of data fusion from the perspective of the actual security systems: baggage screening, checkpoints, and access control. To facilitate this discussion, the next section describes the steps in data fusion. STEPS IN DATA FUSION Typically, data fusion consists of three steps: data preparation, data association, and estimation or prediction. Data preparation means putting the data into a form that will enable fusion. One of the most important components of the preparation step for fusion is data registration. The data, which come from different sources, must have a common registration—that is, the data must be converted to the same view angles and spatial and temporal resolutions. For example, if a security system, such as baggage screening, detects a likely explosive in a bag, the location of the likely explosive needs to be conveyed to the secondary security system (e.g., hand searching the bag) in order to direct the search; but for this to happen properly, the two security systems must have a common registration coordinate system or their data cannot be combined. This registration involves both spatial and temporal resolution because change can occur in both bags and passengers (i.e., both can move, and bag contents can shift), and there needs to be allowance for change if fusion is to be done effectively. In addition to registration, the data preparation step also requires formally defining confidence intervals for data produced by each security system or source. No security system is perfect in its reporting, but it is not enough simply to recognize this fact. Effective fusion requires that data be quantified. The fusion of results from two security systems has the potential to reduce the errors associated with each individual security system. However, to understand and exploit that reduction, it is necessary to know the amount of variance in data as an input to fusion. With this knowledge, the fused output should have a quantified error rate that is less than that of any of the individual security systems. Other parts of data preparation include data cleaning and normalization. Cleaning removes obvious errors from the data. Normalization puts the data on common scales of measurement. The second step in fusion, data association, looks for data that are linked. In baggage screening, this means looking for multiple security system results showing the presence of an explosive. For example, at check-in, a service agent might input observations of a passenger’s suspicious activity. Those results, associated with suspicious baggage-screening results, can be used to estimate the likelihood of terrorist activity by that particular passenger. Association provides hypotheses about linkages in the available data; typically, multiple hypotheses result from this processing step. Hence, algorithms for data association are computationally expensive and inexact, which means that one can

OCR for page 19
Fusion of Security System Data to Improve Airport Security only approximate the most likely linkages in the data. Data fusion cannot avoid or completely eliminate false positives or negatives. The last step in fusion is estimation or prediction. Once data are associated, they might be used to estimate a current state or situation and to predict a future state: a common estimation or prediction problem in transportation security is that of estimating or predicting the probability that an object is an explosive. Another use might be to estimate the probability that an area or object is the target for an attack. Methods for estimating rely on parametric statistical modeling and have been advanced by developments in mixed-effects modeling. COMPARISON OF DECISION-DATA FUSION AND PARAMETRIC-DATA FUSION Data fusion is most easily, and typically, accomplished by taking the decision outputs from each security system and combining them into one global decision. While simple to implement, this approach, called decision-data fusion, has several shortcomings. An alternative approach combines the data from multiple sources and uses these data together to produce a state estimate. This approach, which the committee calls parametric-data fusion, has the potential to improve system performance, but it requires more extensive registration, normalization, cleaning, and error parameterization than does decision-data fusion. In this section, the committee discusses the performance characteristics of parametric-data fusion for transportation security, as compared with the simpler decision-data fusion method. To illustrate decision- and parametric-data fusion, the committee discusses two hypothetical explosives-detection security systems whose outputs can be correlated. The committee has not made any assumptions about the applicability of these notional examples to current security systems or existing technology, and it has not performed a detailed statistical analysis of the issues. Security System 1 reports integer values between 2 and 13 with an average of 4, and each value is converted into a probability of detection (PD) that is conditional on the data. Security System 2 reports real values between 11.7 and 72.8 with an average of 14.0. Figure 2-2 shows the response histograms and a response profile for each security system for a test set of size 31 with 12 detectable targets. Over this data set, the security systems have a correlation coefficient of .009. These data are simulated and do not represent the response histograms or response profiles of any known security systems. The graphs in Figure 2-2 show the marginal distributions for the response from each hypothetical security system and are not conditioned on the presence or absence of a simulated explosive. Figure 2-3 shows the density of each security system response conditioned on the presence of a simulated explosive. These plots show that neither security system alone would be completely effective in detecting the presence of the simulated explosives over a range of test cases. These security systems can be operated in one of five modes: with each one operating individually without fusion, by connecting the systems’ decision outputs (decision-data fusion) with AND or OR logic, or by combining their responses to produce a single fused probability using parametric-data fusion. The discussion below explains

OCR for page 19
Fusion of Security System Data to Improve Airport Security each of these modes and provides the associated receiver operating characteristic (ROC) curves for comparisons. FIGURE 2-2 Notional individual security system response histograms (top) and response profiles (bottom) for the test sample—Security System 1 and Security System 2.

OCR for page 19
Fusion of Security System Data to Improve Airport Security FIGURE 2-3 Conditional response profiles for each notional individual security system. Individual Security Systems with No Fusion The individual security system mode first uses a single security system to produce a response based on the sample input object; this response is then converted into a detection decision. The block diagram in Figure 2-4 shows this mode of operation. The decision function is normally part of the security system. This function is separated out in the diagram, since it will be important to an understanding of the modes of operating multiple security systems. The output from the process is a detection decision with an associated PD. FIGURE 2-4 Individual security system operational mode with no data fusion. Security system manufacturers can set the threshold for detection; the threshold results in a probability for true positives (sensitivity) and false positives (specificity).

OCR for page 19
Fusion of Security System Data to Improve Airport Security Typically, the operating characteristics of security systems against these two measures are plotted as ROC curves. The ROC curves for the security systems in this example are shown in Figure 2-5. The dashed line indicates how a system that randomly makes a detection decision would perform, while the solid line indicates the performance of the sample system. Clearly both of these sensors do better than randomly making a decision. FIGURE 2-5 Receiver operating characteristic (ROC) curves for each security system—Security System 1 and Security System 2—for the test sample. Solid line: performance of the sample system; dashed line: performance of a system that randomly makes a detection decision. An alternative way of looking at these data would be in a Bayes table (Figure 2-6), where the results of the tests are examined for true detections, missed detections, false positives, and true negatives, as shown. FIGURE 2-6 Example of a Bayes table for examining test results.

OCR for page 19
Fusion of Security System Data to Improve Airport Security While the differences are subtle, it is important to distinguish between a false negative and a missed detection. A false negative means that a threat item has been inappropriately identified as a nonthreat, whereas a missed detection means that the threat item has not been seen. Decision-Data Fusion with AND or OR Logic One of the simplest ways to combine more than one security system in support of decision making is through AND or OR logic. This decision-data fusion approach allows the operator to use the security systems as manufactured and to change out security systems as needed for maintenance or replacement. AND logic is illustrated in Figure 2-7. The input object is processed first by Security System 1; if this security system detects a threat, the input object is passed to Security System 2. If the second system also detects a threat, it will signal an alert. If the second system does not detect a threat, the item is cleared. FIGURE 2-7 Decision-data fusion with AND logic. This approach is designed to reduce the number of false positives. The ROC curve in Figure 2-8 supports this expectation for the committee’s example data. For a false positive rate, called a false-alarm rate (FAR)2 of 0.2, or 20 percent, the combined security systems with decision-data fusion with AND logic have a true positive rate of almost 0.8; when each operating in a stand-alone mode the two security systems, 1 and 2, experienced true positive rates of 0.45 and 0.67, respectively. 2 The FAR derives from putting the data (actual known cases) through the systems and then counting the number of times alerts were signaled on cases that were not threats. False negatives, where a threat item is inappropriately identified as a nonthreat, are calculated similarly.

OCR for page 19
Fusion of Security System Data to Improve Airport Security FIGURE 2-8 Receiver operating characteristic (ROC) curve for the AND decision-data fusion for the combination of two notional security systems (solid line). The dashed line represents chance performance. OR decision-data fusion logic simply combines the security systems so that detection by either security system will cause an alert, and it takes both security systems to clear an input object. The block diagram for this approach is shown in Figure 2-9, and the ROC curve based on the example data and security systems is shown in Figure 2-10. Notice that with the false positives—the FAR—limited to a notional value of 0.2, the OR decision approach does worse than each individual security system. The resulting probability of detection for OR decision-data fusion is 0.42, whereas for Security System 1 it is 0.45 and for Security System 2 it is 0.67. The OR decision-data fusion logic works to decrease the FAR, but at the cost of increased missed detections. This example shows that simply assuming that a decision-data fusion approach will improve performance is not always correct. Before implementing fusion, the Transportation Security Administration (TSA) should perform the necessary analysis to ensure that the correct approach is selected. FIGURE 2-9 Combining security systems with OR decision-data fusion logic.

OCR for page 19
Fusion of Security System Data to Improve Airport Security FIGURE 2-10 Receiver operating characteristic (ROC) curve for the OR decision-data fusion for the combination of two notional security systems (solid line). The dashed line represents chance performance. Parametric-Data Fusion of Security Systems As Figure 2-4 shows, each security system produces a response and a detection decision. It is important to note that parametric-data security system fusion combines responses from each security system rather than combining their detection decisions as would data fusion based on AND or OR logic. The parametric-data fusion process is illustrated in Figure 2-11. The input object is processed in some sequence by both security systems, but their response values are combined in a joint estimate of the probability of detection (correct classification). FIGURE 2-11 Parametric-data fusion response values from two notional security systems. NOTE: PD, probability of detection. As shown in the ROC curves in Figure 2-12, parametric-data fusion provides better results than the other models in trading off true positives versus false positives in the ROC curves. Figure 2-12 shows that the fusion of the two security systems for the

OCR for page 19
Fusion of Security System Data to Improve Airport Security example data results in false positive rates of less than 0.2 and true positive rates of better than 0.8. By comparison, neither the AND combination logic (Figure 2-8) nor the OR combination logic (Figure 2-10) could achieve a 0.8 true positive rate without accepting something more than 0.2 in the rate for false positives. In general, parametric-data fusion produces better results than decision-data fusion over a large range of values. FIGURE 2-12 Receiver operating characteristic (ROC) curve for the parametric-data fusion for the combination of two notional security systems (solid line). The dashed line represents chance performance. The results of all the different fusion alternatives for this example are summarized in Figure 2-13, which shows all ROC curves on a single plot, and in Table 2-1. In the committee’s example, data fusion itself improves the performance of single security systems. However, the extent of the improvement depends on the type of fusion employed. Here, the AND logic for decision-data fusion provides more significant improvement than the OR logic does. OR decision-data fusion actually does worse over large portions of the error surface than does Security System 2 by itself. Parametric-data fusion provides the best performance over significant, but not all, regions of the error surface. When compared with AND decision-data fusion, parametric-data fusion provides improvements in the probability of detection, with only slight degradation in false positives.

OCR for page 19
Fusion of Security System Data to Improve Airport Security FIGURE 2-13 Receiver operating characteristic (ROC) curves for different modes of operation: individual security systems without fusion, systems’ decision outputs combined with AND and OR logic, and systems’ responses combined with parametric-data fusion. These data are also shown in Table 2-1. Column 2 of this table shows the probability of detection of each mode of operation at a fixed FAR of 0.20. Column 3 shows the minimal FAR achieved when the probability of detection is set to the values shown in column 2. Thus, it can be seen that parametric-data fusion provides a 10 percent improvement in the probability of detection over AND decision-data fusion when the FAR is set to 0.20. For these systems, the OR decision-data fusion approach makes things worse by reducing the probability of detection at the FAR of 0.20. TABLE 2-1 Summary of Fusion Results for Different Modes of Operation for the Two Example Security Systems Mode of Operation PD (FAR = 0.20) Minimum Observed FAR Security System 1 alone 0.45 0.20 Security System 2 alone 0.67 0.16 AND logic decision-data fusion 0.83 0.20 OR logic decision-data fusion 0.42 0.16 Parametric-data fusion 0.92 0.11 NOTE: PD, probability of detection; FAR, false-alarm rate. The foregoing is just a simple example of how fusion may be used, and these results apply to the notional security systems used for this data set. Increasing the complexity and changing the performance of the security systems would change the resulting ROC curves. In particular, the AND decision-data fusion approach does not always dominate the OR decision-data fusion approach. Nor is parametric-data fusion always dominant over most of the error surface.

OCR for page 19
Fusion of Security System Data to Improve Airport Security Figures 2-14 and 2-15 show ROC curves for the same two notional security systems but with random permutations (e.g., Gaussian noise) added to their measurements. Notice that in Figure 2-15, the OR decision-data fusion approach dominates the AND. These results indicate that before implementing a fusion approach, the outputs from the security systems need to be analyzed to ensure that the most appropriate fusion approach is adopted. FIGURE 2-14 Receiver operating characteristic (ROC) curves for random permutations of security system measurements in different modes of operation: individual security systems without fusion, systems’ decision outputs combined with AND and OR logic, and systems’ responses combined with parametric-data fusion.

OCR for page 19
Fusion of Security System Data to Improve Airport Security FIGURE 2-15 Receiver operating characteristic (ROC) curves for random permutations of security system measurements in different modes of operation: individual security systems without fusion, systems’ decision outputs combined with AND and OR logic, and systems’ responses combined with parametric-data fusion. There are also cost considerations that must be addressed in the implementation of a security system data fusion solution. In particular, the increased requirements for data preparation with parametric-data fusion approaches are minimized with decision-data fusion. In addition, software maintenance and hence life-cycle costs are lower for decision-data fusion than for parametric-data fusion. Finding: Decision-data (versus parametric-data) fusion does not necessarily allow for the greatest improvements in throughput, reduction of false alarms, or improvements in probability of detection. Most TSA data fusion efforts in current programs employ decision-data fusion. Recommendation 1: Before implementing a data fusion approach for a specific set of security systems, the TSA should perform a formal analysis to select the specific data fusion approach that would increase the detection rate, or that would raise throughput and/or reduce false alarms while maintaining the existing detection rate.