are also, at their roots, tools for identifying correlations, and as such they do not provide insight into why a particular pattern may arise.)
Jonas and Harper (2006) identify three factors that are likely to have a bearing on the utility of data mining for counterterrorist purposes:
The ability to identify subtle and complex data patterns indicating likely terrorist activity,
The construction of training sets that facilitate the discovery of indicative patterns not previously recognized by intelligence analysts, and
The high false positive rates that are likely to result from the problems in the first two bullets.
A number of approaches can be taken to possibly address this argument. For example, as mentioned above, it may be possible to develop training sets by broadening the definition of what patterns of behavior are of interest for further investigation, although that raises the false positive rate. Also, it may be possible to reduce the rate of false positives to a manageable percentage by using a judicious mix of human analysis and different automated tools. However, this is likely to be very resource intensive. The committee does not know whether there are a large number of useful behavioral profiles or patterns that are indicative of terrorist activity.
In addition to these issues, a variety of practical considerations are relevant, including the paucity of data, the often-poor quality of primary data, and errors arising from linkage between records. (Section H.2 discusses additional issues in more detail.)