. "Appendix H: Data Mining and Information Fusion." Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment. Washington, DC: The National Academies Press, 2008.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment
such searches were not being conducted today as extensions of standard investigative techniques.
These approaches have been criticized because they are relevant primarily to future events that have a nontrivial similarity to past events, thus providing little leverage in anticipating terrorist activities that are qualitatively different from those carried out in the past. But even if this criticism is valid (and only research and experience will provide such indications), there is definite and important benefit in being able to reduce the risk from known forms of terrorist activity. Forcing terrorists to use new approaches implies new training regimes, new operational difficulties, and new resource requirements—all of which complicate their own planning and reduce the likelihood of successful execution.
The jury is still out on whether pattern-based data mining algorithms produced without the benefits of machine learning will be similarly useful, and in particular whether such techniques could be useful in discovering more subtle, novel patterns of behavior as being indicative of the planning of a terrorist event that would have been unrecognized a priori as such by intelligence analysts. Jonas and Harper (2006) refer to this kind of data mining as “pattern-based” data mining.13 The distinction between subject-based and pattern-based data mining is important. Subject-based data mining is focused on terrorist activities that are either precedented (because analysts have some retrospective understanding of them) or anticipated (because analysts have some basis for understanding the precursors to such activities), while pattern-based data mining is focused on future terrorist activities that are unanticipated and unprecedented (that is, activities that analysts are not able to predict or anticipate).
Subject-based techniques have the advantage of being based on strongly predictive models. For example, being a close associate of someone suspected of terrorist activity and having similar connections to persons or groups of interest are strong predictors that a given person will also be of interest for further investigation. By contrast, pattern-based techniques, in the absence of a training set, are likely to have substantially less predictive power than the subject-based patterns chosen by counterintelligence experts based on their experience—and consequently a very large false positive rate. (Indeed, one might expect such an outcome, since pattern-based techniques, by definition, seek to discover anomalous patterns that are not a priori associated with terrorist activity and therefore have no historical precedents to support them. Pattern-based techniques
J. Jonas and J. Harper, “Effective counterterrorism and the limited role of predictive data mining,” pp. 1-12 in Policy Analysis, No. 584, CATO Institute, Washington, D.C., December 11, 2006.