. "Appendix H: Data Mining and Information Fusion." Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment. Washington, DC: The National Academies Press, 2008.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Protecting Individual Privacy in the Struggle Against Terrorists: A Framework for Program Assessment
applied. For example, handwritten text can now be considered to be data, and progress in automatic interpretation of handwritten text has already reached the point that over 80 percent of handwritten addresses are automatically read and sorted by the U.S. Postal Service every day. A problem of another type on which substantial progress has also been made is how to represent the information in a photograph efficiently in digital form, since every photograph has considerable redundancy in terms of information content. It is now possible to automatically detect and locate faces in digital images and, in some restricted cases, to identify the face by matching it against a database.
This new world of greatly increased data collection and novel approaches to data representation and mathematical modeling have been accompanied by the development of powerful database technologies that provide easier access to these massive amounts of collected data. These include technologies for dealing with various nonstandard data structures, including representing networks between units of interest and tools for handling the newer forms of information touched on above. A question not addressed here—but of considerable importance and a difficult challenge for the agencies responsible for counterterrorism in the United States—is how best to represent massive amounts of very disparate kinds of data in linked databases so that all relevant data elements that relate to a specific query can be easily and simultaneously accessed, contrasted, and compared.
Even with these new database management tools, the retention of data is still outpacing its effective use in many areas of application. The common concern expressed is that people are “drowning in data but starving for knowledge” (Fayyad and Uthurusamy1 refer to this phenomenon as “data tombs”). This might be the result of several disconnects, such as collecting the wrong data, collecting data with insufficient quality, not framing the problem correctly, not developing the proper mathematical models, or not having or using an effective database management and query system. Although these problems do arise, in general, more and more areas of application are discovering novel ways in which mathematical modeling, using large amounts and new kinds of information, can address difficult problems.
Various related fields, referred to as knowledge discovery in databases (KDD), data mining, pattern recognition, machine learning, and information or data fusion (and their various synonyms, such as knowledge extraction and information discovery) are under rapid development and providing new and newly modified tools, such as neural networks,
U. Fayyad and R. Uthurusamy, “Evolving data mining into solutions for insights,” Communications of the ACM 45(3):28-31, 2002.