Data Mining

The proliferation of data from sensors and intelligence gathering is overwhelming to humans. The computing activity known as data mining uses statistical and artificial intelligence techniques to extract useful information from databases of ever expanding size, where manual interpretation of data is impossible. The data mining task includes automatic or semiautomatic analysis of data for extraction of information found in operationally relevant patterns. Individuals engaged in data mining require knowledge of computer science, large database management, statistics, and relevant subject matter expertise. For instance, to extract useful associations out of telephone chatter from a foreign battlefield will require knowledge of language and local customs. Data mining has been extensively used in civilian environments, including market analysis, customer behavior, human genetics, spatial analysis of geophysical data, and even in high-energy physics experiments. While the field is expanding very rapidly, each use of machine learning must be grounded in deep understanding of the subject domain.

Network science, in particular dynamic link analysis, is a rapidly developing area related to data mining that is emerging as a distinct, multidisciplinary field. The combinatoric complexity of networks has led to alternative statistical approaches that go beyond static analysis. The Internet and more specialized communications systems are highly dynamic. Understanding the effects of those dynamics will be key to addresing significant problem areas such as needle-in-haystack issues, detection of anomalous behavior, and defending against cyber threats (and developing offensive cyber capabilities).


As the military, and society generally, have become dependent on information systems, communications, and computing, cybersecurity has become a critical capability. Even the cyber vulnerabilities of some civil infrastructure threaten assured operations outside military theaters. Military concerns about cybersecurity are not limited to military-owned infrastructure. It is in the interest of the military that the civilian STEM workforce be knowledgeable about the best information assurance techniques.

Cybersecurity research challenges include ensuring the integrity of data, controlling access to sensitive information, making data accessible when needed, protecting privacy, preventing intrusion, preventing access to data that is unencrypted while it is being processed, and managing degraded information systems to effectively serve priority mission needs. In addition, it is a challenge to know whether combining multiple sources of data increases the sensitivity of the merged data, when, for example, personal identity associated with a record might be inferred.

The cybersecurity STEM workforce will need to apply new approaches in algorithms, hardware and software architectures, and the design and engineering of complex, secure systems. This is particularly complicated by the fact that education and training programs outside the intelligence and military communities address only defensive cybersecurity. It is incumbent on the intelligence community to continue to explore ways to partner with industry and with educational institutions to provide the STEM workforce a strong background in effective approaches to cybersecurity.

Cloud Computing

A recent development in computing technology is the centralization of storage and heavy-duty computing capabilities in locations separate from the user’s PC. In many ways, this development is reminiscent of the early days of computing, when a user’s desk had only a terminal and all the storage and computing were executed on a mainframe computer located somewhere else in the building. The difference between the old and the new is the communication protocols and bandwidth that are available. Cloud computing, as opposed to using a large central mainframe, relies on sharing common hardware resources such as memory and CPU that are accessed via the Internet.

The driver for cloud computing is the need to get users’ applications loaded and running faster at considerably lower cost, reduced local maintenance, and higher reliability of resources including servers, storage, and networks. With the availability of handheld devices such as smart phones and notepad computers, cloud computing is a grow-

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement