3
PREDICTING IMPROVISED EXPLOSIVE DEVICE ACTIVITIES (WORKSHOP 2)

The second workshop focused on basic research for predicting improvised explosive device (IED) activities and consisted of unclassified plenary lectures and breakout sessions. The workshop started with presentations to provide participants with a context for the discussions about IEDs and lessons that can be learned from different disciplines and contexts. They were delivered by experts in law enforcement, computer science, statistics, mathematics, and remote sensing. The breakout sessions gave participants a chance to explore the kinds of data needed to predict IED activities and kinds of basic research that would enable the handling, priority-setting, and delivery of such data; that would allow leveraging of human expertise in data interpretation; and that might lead to procedures for analyzing mixed, complex, noisy, or incomplete data. The workshop concluded with a discussion of potential high-impact research.

THREAT DETECTION: THROUGH THE EYES OF PRACTITIONERS

Kathleen Kiernan (The Kiernan Group) spoke about some lessons from law enforcement that can be applied to a counter-IED effort. To emphasize why the law-enforcement perspective is applicable, she observed that although not every criminal is a terrorist, every terrorist is a criminal. Terrorist and criminal groups use similar methods as they recruit, learn their craft, finance operations, obtain and conceal contraband and weapons, disguise intentions, and disguise themselves with fraudulent identification.

Kiernan pointed out the lack of communication between local law enforcement and federal security agencies. A main obstacle is that most law-enforcement officers lack the level of security clearance needed to obtain and exchange information. She suggested that a database that contains low-level security information would allow the law-enforcement community to access information to help in its mission and decrease the communication divide between the two groups. She noted that it would be preferable to minimize the amount of restricted information and to control access to it.

The training and skill sets of law-enforcement officers could be helpful to military causes, and partnering could leverage the strengths of both groups. Some in the law-enforcement community have developed datasets on the methods used by gangs. Given the similarity between the methods used by gangs and those used by some terrorist and insurgent groups, such datasets could yield important lessons for the counter-IED effort. And those datasets could be made available to researchers more easily than datasets on terrorist or insurgent methods and could be valuable proxy datasets on which researchers



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 30
3 PREDICTING IMPROVISED EXPLOSIVE DEVICE ACTIVITIES (WORKSHOP 2) The second workshop focused on basic research for predicting improvised explosive device (IED) activities and consisted of unclassified plenary lectures and breakout sessions. The workshop started with presentations to provide participants with a context for the discussions about IEDs and lessons that can be learned from different disciplines and contexts. They were delivered by experts in law enforcement, computer science, statistics, mathematics, and remote sensing. The breakout sessions gave participants a chance to explore the kinds of data needed to predict IED activities and kinds of basic research that would enable the handling, priority-setting, and delivery of such data; that would allow leveraging of human expertise in data interpretation; and that might lead to procedures for analyzing mixed, complex, noisy, or incomplete data. The workshop concluded with a discussion of potential high-impact research. THREAT DETECTION: THROUGH THE EYES OF PRACTITIONERS Kathleen Kiernan (The Kiernan Group) spoke about some lessons from law enforcement that can be applied to a counter-IED effort. To emphasize why the law- enforcement perspective is applicable, she observed that although not every criminal is a terrorist, every terrorist is a criminal. Terrorist and criminal groups use similar methods as they recruit, learn their craft, finance operations, obtain and conceal contraband and weapons, disguise intentions, and disguise themselves with fraudulent identification. Kiernan pointed out the lack of communication between local law enforcement and federal security agencies. A main obstacle is that most law-enforcement officers lack the level of security clearance needed to obtain and exchange information. She suggested that a database that contains low-level security information would allow the law- enforcement community to access information to help in its mission and decrease the communication divide between the two groups. She noted that it would be preferable to minimize the amount of restricted information and to control access to it. The training and skill sets of law-enforcement officers could be helpful to military causes, and partnering could leverage the strengths of both groups. Some in the law- enforcement community have developed datasets on the methods used by gangs. Given the similarity between the methods used by gangs and those used by some terrorist and insurgent groups, such datasets could yield important lessons for the counter-IED effort. And those datasets could be made available to researchers more easily than datasets on terrorist or insurgent methods and could be valuable proxy datasets on which researchers 30

OCR for page 30
could test theories and models of social dynamics and behavior. Examples of existing sources of such information include: the Department of Homeland Security: Study of Terrorism and Responses to Terrorism (START) Global Terrorism Database (GTD),4 Department of Homeland Security: Homeland Security Information Network (HSIN), 5 National Counterterrorism Center: Worldwide Incident Tracking System (WITS),6 Department of Justice: State & Local Anti-Terrorism Training (SLATT),7 Federal Bureau of Investigation: Law Enforcement Online (LEO),8 International Association of Chiefs of Police (IACP),9 Open Source Center: OpenSource.gov,10 and the Institute for the Study of Violent Groups (ISVG),11 Law-enforcement professionals learn their craft on the street by dealing with human behavior. They learn to detect nuances of deception and to adapt rapidly to criminals’ ever-changing tactics, behavior, and technologies. A byword on the street is “JDLR”—“just doesn’t look right”. The ability of law-enforcement officers’ to detect anomalies, if translated to the counterinsurgency or counterterrorism context, would be helpful in detecting IED-related anomalies. Kiernan noted that many of the military personnel who are best able to detect changes and evade IEDs in Iraq are former law- enforcement officers. Kiernan closed by saying that we need to move to the next level in the counter- IED effort. Research on networks would be helpful in identifying the crucial elements and operatives within criminal, terrorist, and insurgent groups, which is important when working against those organizations. Just as catching the easy-to-catch criminal—the small-time street dealer—is less valuable than catching those masterminding a criminal operation, catching an IED organization’s bomb-maker is more useful than catching a person who is paid a small amount to emplace an IED. OVERVIEW OF TOLL-FRAUD DETECTION Daryl Pregibon (Google, Inc.) spoke about his experience with telephone-fraud detection. He discussed the characteristics of fraudsters (those who commit telephone-toll fraud), how to identify them, and how lessons from his experience might be useful in IED detection and prevention. Fraudsters want free service for personal use or for resale. To commit fraud, they use a wide array of technologies to exploit weaknesses at the interfaces of technology and 4 Department of Homeland Security: START Global Terrorism Database. http://www.start.umd.edu/data/gtd/. Accessed July 15, 2008. 5 Homeland Security Information Network. http://www.dhs.gov/xinfoshare/programs/gc_1156888108137.shtm. Accessed July 15, 2008. 6 National Counterterrorism Center: Worldwide Incident Tracking System. http://wits.nctc.gov/. Accessed July 15, 2008. 7 Department of Justice: State & Local Anti-Terrorism Training. http://www.iir.com/slatt/. Accessed July 15, 2008. 8 Federal Bureau of Investigation: Law Enforcement Online. http://www.fbi.gov/hq/cjisd/leo.htm. Accessed July 15, 2008. 9 International Association of Chiefs of Police. http://www.theiacp.org/. Accessed July 15, 2008 10 Open Source Center. www.opensource.gov. Accessed July 15, 2008. 11 Institute for the Study of Violent Groups. http://www.isvg.org. Accessed September 16, 2008. 31

OCR for page 30
Fraudsters want free service for personal use or for resale. To commit fraud, they use a wide array of technologies to exploit weaknesses at the interfaces of technology and the technical barriers to information transfer. They also take advantage of the tendency for data obtained in one form by one organization to remain in that form and location. Information does not flow easily through the interfaces between business and residential telephone networks, between landlines and cellular telephones, or between voice and data networks, which makes these interfaces vulnerable to fraud. Fraudsters will adapt to prevent or delay discovery. The cycle of adaptation is similar to that in an IED campaign. Fraudsters will migrate to the telecommunication provider that is easiest to do business with, that is, the provider that is easiest to defraud. Similarly, insurgents and terrorists C tend to attack the targets that are the easiest to strike, although sometimes they choose targets for their symbolic importance or for other reasons.C The problem of fraud detection in the case of telephone networks is an example of how massive datasets can be analyzed. The average large telephone-service provider covers 100 million to 1 billion active identities, of which 50-500 million are active each day. The population of telephone identities is dynamic: there might be 50,000-500,000 new and canceled users each day. That information and call data (such as the origin, destination, date, time, and length of a call) are collected and analyzed and are helpful in toll-fraud detection. Methods of two main kinds of data analysis are used in fraud detection: anomaly detection and link-based methods. Anomaly detection uses the call history to build a signature for each caller that models how the caller acts. A customer’s calls are then compared with his or her signature to scan for abnormal behaviors that may indicate a fraudster. The method works well because all calls are scanned and factored into a caller’s signature. However, low-level fraud often goes undetected, as does subscription fraud (when new accounts are set up with the intent to deceive). Link-based methods analyze calling networks to detect fraudulent behavior. A caller’s “most-contacted” list (outgoing and incoming calls) is used to build a network for each person. It can be used to link a caller to other fraudsters or recently terminated accounts because, although fraudsters may change telephone numbers, they often do not change whom they call (such as friends and family). The networks can also be used to link two people on the basis of a common third party; however, this can sometimes confuse a fraudster with his or her family members because family members are often closely related within a network. Those who engage in an IED campaign and fraudsters share many characteristics, including the following: ! They work to exploit gaps in whatever system they are infiltrating. ! They use a wide variety of technologies. ! They adapt to delay their discovery. ! They attack at the weakest point. However, for toll fraudsters communication is an end in itself, whereas for terrorists and insurgents it is a means to an end. Moreover, fraudsters tend to be greedy and impatient. The same cannot be said for terrorists and insurgents, or at least those who are not quickly caught. 32

OCR for page 30
Pregibon noted that detecting IED-related activities and communication is clearly much harder than detecting toll fraud. However, although current fraud technologies would have to be adapted to be useful, they have been developed to analyze large, dynamic sets of data and have demonstrated scalability and utility. One important lesson from the toll-fraud detection problem is that data constitute a key enabler. That suggests that collecting and analyzing as many relevant data as possible and searching for patterns will allow one to maximize the chance of detecting potential IED attacks. DEPLOYING WIRELESS SENSOR NETWORKS FOR ENVIRONMENTAL SENSING Alex Szalay (Johns Hopkins University) discussed how lessons learned from his experience deploying wireless sensor networks for environmental sensing might be useful in countering IEDs. The nature of scientific computing is changing. One consequence is that data are increasingly available as a result of the development of successive generations of inexpensive sensors. Large quantities of data provide an opportunity to understand a system that is being measured but also make it challenging to extract knowledge. As the need for processing larger and larger datasets increases, new methods for doing it are needed. There is no single, well-accepted public solution for dealing with large datasets (100-1,000 terabytes [1 terabyte = 1012 bytes]). As counter-IED surveillance datasets may also be large, the same processing challenges may apply. Szalay suggested that in developing strategies to handle the influx of data, it is helpful to consider the main steps that are taken in producing scientific journal articles: (1) acquire data, (2) process and calibrate data, (3) transform and load data, (4) organize data to facilitate the analysis algorithm, (5) analyze data, (6) publish. A review of that breakdown suggests that one approach to accommodating large volumes of data is to bring the analysis to the data—move (3) to (5)—rather than vice versa. If computations are executed within a database, the volume of data that needs to be transferred is minimized, and this can reduce the costs and time for download and minimize data truncation. A cluster of databases can create and send computationally inexpensive “bricks” (subsets of the datasets) to components of the cluster for analysis. That approach is being used at Johns Hopkins University (JHU) with the GrayWulf database cluster, which has a capacity of over 1 petabyte (1 petabyte = 1015 bytes). Szalay discussed other projects under way at JHU that must extract information from massive datasets, including the Sloan Digital Sky Survey (SDSS), which can be thought of as the genome project for the cosmos; the PAN-STARRS project, which tries to detect asteroids that may strike Earth and is currently the largest astronomy database in the world; and the Life Under Your Feet project, which uses a wireless sensor network to measure, for example, carbon dioxide emission from soil. A unique feature of the SDSS project is the public availability of the database, SkyServer, which allows the project to tap into distributed computing power. SkyServer has seen 930,000 unique users. In contrast with the roughly 10,000 astronomers worldwide, that number indicates the level of public interest in the project. In fact, a key discovery was made by a schoolteacher who accessed the database. There are obvious 33

OCR for page 30
security concerns in making IED-related data available to the public, but this demonstrates the potential that increasing the number of people reviewing data can hold for increasing the likelihood of anomaly identification. For IED detection and surveillance, it is conceivable that outdoor, distributed sensor networks that send wireless signals for analysis may be used. Szalay’s experience with sensors in and outside the laboratory provides valuable, practical lessons on wireless sensor networks. Before deployment, any sensor system should undergo extensive testing, including finding the limits of instrument robustness and performing field tests, to maximize the effectiveness of data collection and transmission. Factors that may be problematic include hardware failures, background noise, and natural events, such as rain. Data from wireless systems should be compared or fused with data from external sources to increase confidence in the observations. Szalay also advised that one collect as many data in as many forms as possible, particularly in transient or singular systems, where there is only one chance to collect real-time data but the data can be analyzed as many times as necessary. Szalay prefers to organize the data first rather than perform data-processing at the collection node; this approach puts extra strain on the nodes but increases the longevity of the data library. DATA FUSION: AN ENABLER FOR IMPROVED IED PREDICTION Pramod Varshney (Syracuse University) spoke about the role that data fusion could play in enabling improved IED prediction. Data fusion is the acquisition, processing, and synergistic combination of data gathered by various sources and sensors to improve the understanding of some phenomenon or to introduce or enhance intelligence and system control functions. An example of data fusion is the processing used by the human brain, which naturally fuses human sensory information to make inferences regarding the environment. Most data- fusion models are based on Bayesian networks, but other graphical models that incorporate probability theory are also used. As an introduction to the topic, Varshney described Figure 3.1, a conceptual framework for data fusion. In this elementary model, source prescreening allocates data to various fusion stages. Level one processing, or object refinement, consists of data alignment, tracking, association, and identification. In level two, or situation refinement, inferences regarding relationships between objects, events, and a priori information are made by using contextual information and meanings. In level three, which in IED prediction can be thought of as threat refinement, inferences regarding future threats are made on the basis of computation and knowledge of the adversary. Level four, process refinement, is the monitoring and control of the fusion process and of sensor and resource allocation. Database management allows the storage of information and should accommodate rapid-retrieval requirements. The human-computer interface is where communication between a user and a computer takes place, including directives from the user or alerts from the computer. 34

OCR for page 30
Data Fusion Domain Level One Level Two Level Three Source Pre-Processing Object Situation Threat Refinement Refinement Refinement Human Computer Sources Interface Database Management System Level Four Process Refinement Support Fusion Database Database Figure 3.1 A conceptual framework for data fusion. ©1997 IEEE NOTE: Reproduced with permission from Hall, D.L, J. Llinas, 1997. An introduction to multisensor data fusion. Proc. IEEE 6-23 85(1):6-23. Research has extended that basic framework to five rather than four levels, where preprocessing is identified as the zeroth level and threat refinement is considered an impact assessment. The fifth level, added by Blasch and Plano (2003) and building on C human-factors work by Endsley (1995),C is user refinement. The new level allows C adaptive data to be retrieved and displayed in support of decision-making. With respect to application, the models of data fusion can be used to integrate and analyze disparate data from different sources (Xiaotao and Bir 2005; Smith and Srivastava 2004; Iyengar, Varshney, and Damarla 2007). Note that data fusion is particularly relevant to information obtained from wireless sensor networks. Such networks integrate a large number of low-cost, computationally limited processors with flexible interfaces for networking of various sensors. The network can feed the data to a fusion center. However, issues with networking and signal-processing and other system constraints must be addressed. Data fusion is multidisciplinary in that its value is found in analyzing data from multiple sources. Developing methods that can take advantage of the variety of data types and databases available will require a large cooperative effort. Many challenges in fusing data in the IED context remain, including the treatment of social-network information and human intelligence information within databases. Current modeling of IED networks and the human terrain may not be advanced enough for appropriate training of the data- fusion models, and this may lead to spatiotemporal inference problems. In addition, the IED problem is global, dynamic, and complex, and data are being collected with many timescales, levels of accuracy, and formats. Treatment of uncertainty in particular and the handling of dependent information must be addressed. Other technical challenges in the IED context include setting up queries to achieve an actionable result and deciding whether it is better to analyze data in or outside the database. And it must be determined whether machine learning methods can be used in a distributed setting. 35

OCR for page 30
One participant suggested that market-sentiment data, used in economics, could potentially yield lessons that are useful in characterizing psychologic, social, and cultural data. VLADIMIR LEFEBVRE’S REFLEXIVE CONTROL THEORY AND IEDS Jonathan Farley (California Institute of Technology) explained how reflexive- control theory, originally developed by Vladimir Lefebvre, could be used in a counter- IED context. In traditional mathematical approaches, a person trying to deter IED activities—the decision-maker—is passive and thus does not take into account his or her ability to preemptively influence an adversary’s actions. In contrast, reflexive control presumes that the decision-maker not only predicts an adversary’s actions but at least partially determines those actions by his or her own actions. A central goal is to develop methods to influence the adversary’s decision-making process by manipulating the adversary’s perception of reality. Thus, reflexive-control theory models the adversary’s decision-making process rather than predicting the adversary’s next move. As an example, consider the following border security scenario. A decision-maker receives information that an attack will be attempted in one of three locations. Presumably, the attack will be aimed at the location where there is the perceived greatest likelihood of success. There are three possible levels of secret, nonpublic troop deployment: high, medium, and low. And there are three levels of “shows of force”, public troop deployments: high, medium, and low. The problem is how to best deploy a finite number of troops to protect the anticipated attack locations and influence the adversary’s decision about where to attack. That is, how should a decision-maker secretly deploy his or her forces, and what should the concurrent shows of force be? Reflexive- control theory provides a framework to solve that problem. Anticipated attack locations are modeled by assigning each location a difficulty level and a risk level. Each location is evaluated to determine the probability that it will be the next target. As another example, suppose that in a given geographic area there are three community centers that are friendly (blue in Figure 3.2) to a decision-maker and two that are unfriendly (red). A diffusion model of public opinion provides insight into the best way to allocate public-relations resources to achieve maximal goodwill in the community. According to the diffusion model in Figure 3.2 and given that public- relations resources can be allocated to only three of the five community centers, having good relations with A, D, and E will generate the most good will in the community. The value of such an approach is determined in part by how well the diffusion model captures the diffusion of public opinion. Improvements are needed in this area, including the development of more realistic network models and models that account for people’s behaving dynamically. Diffusion models should be tested in real situations. The spread and control of public opinion may follow some of the same patterns as the spread of disease outbreaks, so a research partnership with epidemiologists could be valuable. 36

OCR for page 30
54% friendly 74% friendly Figure 3.2 Diffusion model demonstrating how reflexive-control theory can be used to identify the best allocation of resources to achieve a specific goal (increase in blue). Courtesy of Jonathan Farley. Farley also discussed other mathematical approaches that could be useful in the IED problem, including formal concept analysis and the theory of partially ordered sets. STATISTICAL SIGNAL PROCESSING FOR IED DISCOVERY Alfred Hero (University of Michigan) discussed the application of statistical signal processing to IED discovery. Many potential sources of signals are relevant to IED detection, such as video, aerial photographs, sequence of wireless personal digital assistant signals, and Internet binary signals. Such a collection of signals implies a complex, high-dimensional problem space. The signals may be mixed, including signals that are continuous and discrete and signals that are stationary and nonstationary; these variations pose special challenges for analysis. In addition, the IED problem occurs in an adversarial environment where the agents being measured may detect that they are being measured and adjust their behavior accordingly. The latter issue makes signal processing particularly challenging in the case of IEDs. However, predictive models have been developed for other complex, high-dimensional signal processing problems—for example, in telecommunication, “electronic nose” sensor arrays, Internet traffic, and genetics networks—and the results may be applicable to IED discovery. Developing an analytic method for discovery in a complex system requires observations from the field and relevant contextual information to build a function that can be used to estimate or predict the state of the system. (Is the situation normal, or has an anomaly occurred?) The challenge is to design the function, also known as a predictor, so that it minimizes the chances of error in the result. Functions are derived from datasets, and the fundamental analytic constraints imposed by the datasets must be observed. Any developed function or model must be iteratively “trained” and tested on appropriate datasets. During the training period, the developer identifies all the 37

OCR for page 30
parameters necessary to model the behavior of a dataset adequately. During the test period, the developer runs the model against another, similar dataset to ensure that the model is not limited to one system. It is important to note that if the function or model has too many parameters or degrees of freedom, it may appear to accommodate any dataset but be inaccurate. If the model has too few parameters to fit the data accurately, a bias may be introduced into the analysis. Thus, any proposed model should be run against multiple surrogate datasets similar to the actual test data to check for inaccuracies. Anomaly detection requires that statistically significant deviations from the normal baseline be reliably identifiable or “predictable”. That is a challenge particularly when one is faced with a shifting, noisy baseline, and any reliable model must incorporate some form of training to accommodate such changes. One way to address that is to use a hierarchic Bayes model to test the function. A prediction is tested against a series of functions that have been increasingly conditionally constrained to test the limits and accuracy of prediction. Each successive function describes the model system more narrowly by imposing conditions on the analysis that have been derived from known contextual information. By performing this type of analysis on datasets from the same system that have been acquired at different times (and thus have different baselines), one can identify the key components for accurately modeling data across shifting baselines. For example, detecting anomalies in internet traffic data has an important parallel to IED detection. Internet traffic has a constant baseline shift: at no two times will the volume of internet traffic be exactly the same. Similarly, detecting a cell phone call that is used to detonate an IED is difficult because the traffic of cell phone calls within a network will almost certainly also have a constant baseline shift. This constantly shifting baseline requires that methods to construct a reliable predictor use online training that allows for changes in the baseline and the underlying nominal distribution. Other approaches can be used for anomaly detection. One is based on the use of level sets: a geometric entropy minimization method, an adaptive nonparametric method based on a class of entropic graphs called K-point minimal spanning trees (Hero 2007), is used. Another is based on dimensional estimation with entropic graphs (Carter, Raich, and Hero 2007). Network tomography12 is another field of research with potential applicability to IED discovery. Rabbat and co-workers (2006; 2005) explored the problem of identifying the topology of a telephone network by using observations in the network, and Justice and Hero (2006) addressed the problem of tracking a suspect through an unknown network by using a Bayesian hierarchic prior model that accounts for changes in topology. Similar approaches have been applied to gene-pathway reconstruction (Rabbat, Figueiredo, and Nowak 2007). The main objective is to discover signaling pathways, sequences of transcription factors for gene expression that are expressed in a time- dependent way. Because pathways are not known a priori, they are estimated by applying a stimulus and observing a response. That type of interaction has parallels to social- network settings in IED detection. Electronic-nose sensor arrays address a similar issue that arises when the signal response from a diverse array of sensing elements is used to train the array to detect a 12 Network tomography is the field of inferential network monitoring, in which internal characteristics of a network are inferred by using information derived from end-point data. 38

OCR for page 30
specific chemical. Such methods as linear discriminant analysis have been applied to this type of problem with the advantages that training can minimize errors, and validation is based on the model’s predictive power for other datasets (Feldhoff, Saby, and Bernadet 1999). Other methods used to analyze an array of inputs include principal-component analysis (Pardo et al. 2006). More advanced methods of pattern recognition—including probabilistic and artificial neural networks, nearest-neighbor classification, binary recursive classification, maximal-margin classification, bootstrap aggregation, and machine-learning decision-tree sampling (for example, random forest classification)— might be applicable to classification of quantifiable data on the human terrain. Research in signal processing has the potential to improve counter-IED capabilities. The complexity of data collected in studying the IED problem means that there observations will probably be inadequate to develop a fully predictive model, and tradeoffs will have to be made between the richness of expression of a model and overfitting of the model on incomplete data sets. Incomplete data means that preanalysis will be helpful in allocating resources to collect data. Additional challenges posed by the IED-detection problem include the need for a low-dimensional feature space to keep the problem manageable and the timeliness required because relevant information must be used quickly before the situation on the ground changes. BREAKOUT SESSION DISCUSSION After the plenary session, workshop participants engaged in a series of breakout- group discussions to identify possible research opportunities. Participants were assigned to groups that mixed government representatives and academic researchers. To the extent possible, each group included a broad array of expertise. Each discussion group was chaired by a member of the organizing committee and lasted 1 hour and 15 minutes, after which participants reconvened in a plenary session to discuss the groups’ findings. The discussion topics were • What data are needed/desired to predict IED activities, and what basic research avenues would enable the handling, prioritization, and delivery of such data? • What research is needed to allow leveraging of human expertise in data interpretation? • What research opportunities might lead to procedures to better analyze mixed, complex, noisy, incomplete data? The final session of the workshop built on the talks and breakout sessions. Participants were invited to provide feedback on overarching themes and critical research subjects highlighted during the workshop. Workshop participants represented a variety of fields of study, so different views and perspectives were expressed during the breakout discussions and plenary sessions. What follows is a general description of issues, questions, and research subjects highlighted by the reporting members of the breakout groups. 39

OCR for page 30
Data to Predict Improvised Explosive Device Activities and Basic Research to Enable the Handling, Priority-Setting, and Delivery of Data Participants noted that research questions determine what data need to be collected (such as why one group would use IEDs and another would not and how IED use varies among groups). Thus, the question of which data to collect is a question of priorities, and participants noted that categorizing and indexing data that already exist in preparation for analysis would be an important step toward identifying what data are most useful for predicting IED activities. Participants considered the difficulties in determining whether general or specific questions would be most useful. The study of IEDs and their use is multifaceted and touches on many fields of study, and it is important to define the parameters for analysis carefully to achieve the desired research outcome. In addition, the multidisciplinary nature of the analysis means that communication between researchers and those collecting and aggregating the data is critical to avoiding confusion. For example, it was also noted that the definition of a dataset is different in different fields. The environment where IEDs are used is constantly changing, so data-collection methods would ideally be robust, adaptable, and easily integrated into current protocols. It would be convenient if data-collection methods and tools worked well with the methods and tools that soldiers and others on the ground are already using. As noted by Kiernan, data-collection techniques used by law-enforcement organizations could be studied and compared with the types of data and collection methods available in a military setting. Assuming that military personnel can be used as sensors to collect data, practical problems exist in data handling. In this context, soldiers would not be used to perform social research. Rather, the goal is better use of the data already collected by soldiers in their work. In some cases, the questions asked by military personnel could be tailored to achieve both the tactical goals and the social-research goals, but it is understood that a soldier’s primary job is not gathering social-research data. Data are likely to be acquired in many forms (verbal, audio, video, word documents, handwritten notes, and so on), and there is no standard method for integrating heterogeneous data sources. Real-time translation capabilities for digital, print, and audio media are also lacking. Developing such tools will improve the ability to quantify and analyze information provided by soldiers in theater. In using data to anticipate IED activities, signal-to-noise issues are important. These measurements are made in a civilian environment, and it is challenging to differentiate suspicious activity from the myriad innocuous tasks that a population performs every day. For example, persistent surveillance assets will result in large datasets, and these will contain a great deal of day-to-day background activity. Enhanced methods for modeling systems and networks would help to identify anomalous events that stand out from the noise. The nature of the environment in which the data are acquired may lead to incomplete datasets. Modeling may also help to compensate for errors resulting from those incomplete datasets, and may reduce the analysis required by identifying the most and least valuable portions of the collected data. Modeling may also help to develop proxy datasets for testing of analytic methods. 40

OCR for page 30
Methods for handling large, heterogeneous datasets must be developed for this type of analysis, including programs that can systematically characterize and filter data. New visualization and mapping techniques for viewing data could be developed to make interpretation and analysis of data easier. Participants were concerned about the lack of available IED-related data for research purposes. There was a discussion of the potential of open-source databases as research tools. For example, could one study the use of everyday technology, such as cellular telephones and the Internet, and correlate it to IED events by using open, unclassified databases? Would it be possible to create an unclassified wiki type of database, similar to ones that have been used in astronomy research, that would enable citizens to assist in labeling IED events and identifying trends? Some basic research subjects are promising, such as developing improved methods of image identification or modeling of networks and informal financial systems, but application of the basic research to IED activities will require access to pertinent datasets. Research Needed to Leverage Human Expertise in Data Interpretation This discussion touched on one of the same issues as the first breakout session: the potential utility of automated prefiltering and preliminary analysis of data. Other research subjects of interest included new methods of data acquisition and aggregation; challenges in detecting anomalies in video streams; developing effective methods of data presentation and visualization for analysts and decision-makers; improving understanding, modeling, and training of human analytic abilities; and human-directed and automated gathering of information. In many cases, the information required to anticipate IED activities must be acquired in a hostile environment. Modeling that environment may help to identify the most pertinent data to collect, determine the most effective means of collecting them, and help to interpret them. For example, the attitude of the local population toward a particular situation and the individual or group collecting the information will probably affect the data. Understanding motivating factors and issues that affect a population’s response to specific activities would be useful in creating an environment favorable for data collection. One key source of data is the soldiers and civilians on the ground in an area where IED activities are taking place. These “sensors” are human, so the quality of the information they provide is dictated by human abilities and the environment in which it is collected. Some people are better than others at identifying anomalous events and activities, what Kiernan referred to as the ability to notice when something “just doesn’t look right” (JDLR). A system of data collection that relies on soldiers in the field may benefit from research on characteristics of exceptional observers, including studies of experience law enforcement personnel. Some initial questions could be what are the visual cues that soldiers use to find IEDs? How do police officers identify JDLR situations? Can these skills be trained? How do you take the skills of the best people at it and train others? A better understanding of why some people are skilled observers might make it possible to test personnel for relevant abilities and to place them in 41

OCR for page 30
positions where they would be the most effective. It could also help to improve training programs to raise the skill level of ordinary observers. A number of research challenges are pertinent to information acquisition and aggregation. Whether data have been collected by people and aggregated or collected with a remote surveillance device, such as a video camera, nearly real-time priority- setting of information and rapid processing and interpretation of the information are required if a system designed to predict IED activities is to be worth while. Information from reports provided by soldiers on patrol may need to be correlated with biometric data, cellular-telephone data, video of an event, external documents, and information from interrogation of suspects. Processing of those data in such a way as to make it possible to cross-reference and search through all the material is a distinct challenge. Workshop participants were particularly struck by the difficulties inherent in the processing of video and image data. Current methods of analyzing such data are inadequate. Once a dataset has been acquired, it is necessary to validate it. Validation metrics for large, complex datasets are still being developed. Surrogate datasets available in the open literature may be of use for developing metrics and models prior to use in predicting IED activities. Participants once again noted the importance of making relevant datasets public for use in basic research. Any acquired dataset, whether small and homogenous or large and heterogenous, must be must be reliably searchable. Tools for accommodating varied content types (such as video, audio, and textual) would enhance the correlation of events and data. Real-time translation and interpretation of digital and nondigital media would also be useful. Methods of analyzing data for predicting IED activities must be able to highlight events that rise only slightly above the background noise of day-to-day living. Participants felt that development of automated prefiltering systems—perhaps informed by studies of the techniques used by human analysts—could substantially reduce the background noise and improve the chances of identifying suspicious activities. That may involve developing visualization methods to ease the job of the analysts or simply developing a comprehensive, searchable database. Visualization itself can play many roles, from helping to identify anomalous events to simply presenting data in a form that lets analysts and decision-makers interpret information faster or focus their attention on particularly interesting portions or aspects of the data. Visualization is often a convenient way to highlight the layers of an analysis and allow investigators to “dig” into the data by looking through the overlaid levels. In addition, methods of visualization can be adapted customized to the problem at hand by, for example, tailoring color schemes to highlight relevant pieces of information; this could be useful in interpreting data when a quick response is required. Humans are the best anomaly detectors currently available for some types of data. Understanding analysts’ methods of correlating and interpreting data could assist in the development of analytic programs, in the improvement of visualization methods by identifying elements that require specific attention, and in the development of training methods for other analysts and personnel. Such studies could also identify tasks that are best accomplished by automated systems. Development of analytic systems that mimic an analyst’s abilities to perceive changes and patterns would be valuable. Can intuition and the “Eureka!” moments be mapped and modeled? Participants also discussed the 42

OCR for page 30
possibility of developing multi-initiative systems, whereby an analyst could coordinate the real-time collection activities of an automated data-collection system and tailor the searches and collection to focus on narrow or broad criteria as needed. Many aspects of human analytic ability could be studied in greater detail. For example, humans are good at adapting to errors in data (such as an incorrect address on a document) and ignoring some errors as irrelevant to the overall analysis. However, although an analyst's intuition and analytic ability are valuable, mistakes will happen. A robust automated analytic system needs to be able to work smoothly around or correct human errors. Valuable lessons may be learned by studying the analytic processes of professionals who are required to analyze and interpret complex data quickly, such as air- traffic controllers, stock traders, and emergency personnel. Military personnel who are experts at detecting the visual cues indicative of an IED would also be a pool of people to study. Those studies of these other fields could also lead to important information on the effect of state of mind on analysis. The effects of stress, emotions, pharmaceuticals, stimulants, fatigue, boredom, and the like on a person’s ability to process and manage data are unclear. Is there an optimal physical and emotional state for analyzing information? The effects of learning and experience on analytic ability are also important. Finally, when considering the groups that use IEDs, participants felt it was important to model and understand the environment in which they operate so that effective predictive tools can be developed. That requires mapping and modeling community support, methods of adaptation in the face of stress, formal and informal movement of funds, the structure of cells, and the effect of interference on the network. Research Opportunities to Analyze Mixed, Complex, Noisy, or Incomplete Data In the third breakout session, participants were asked to consider research that might lead to procedures to analyze better mixed, complex, noisy, or incomplete data. Included in the question were the ideas of what research is needed to develop an improved capability to fuse data in a computationally reasonable way and what concepts and methods need to be developed to allow the integration of diverse forms of data. The value of developing robust methods for combining quantitative and qualitative data in the study of IED activities was repeatedly raised during the discussion, and participants wished to develop metrics for analysis for both types of data. The goal of data fusion in this context is to assist in the prediction, identification, and ideally, prevention of IED activities. Data fusion should produce either descriptive results (such as improved visualization) or predictive results. Ideally, it involves combining spatial and temporal information with demographic, social, and behavioral information. Data analysis needs to allow for tracking of heterogeneous information (such as video, interviews, documents, and census data) and timescales (for example, continuous video stream vs. cumulative monthly activity reports) to identify correlations. For example, the asynchronous nature of the planning and implementation of IED attacks presents a particular challenge. Devices may be built and placed long before they are detonated. IED organizations will alter their tactics in response to counter-IED efforts. 43

OCR for page 30
Effective models need to accommodate both variable factors (such as changing tactics of red and blue forces and political and social changes) and invariable factors (such as location and the desire for detonation), and be adaptive to remain relevant. Data fusion may also require synthesis between datasets of different sizes, such as cellular-telephone records and suspect interviews. Developing methods for sorting various kinds of data— regardless of size, source, or type—into geospatial-temporal bins could be one important step in the fusion and analysis process. A universal format for translating heterogeneous data into a common framework for analysis would also be helpful. With the fusing of data from multiple sources, it is important to understand the sensitivity and robustness of the fusion method in the face of errors or uncertainty in the original data. How confident can an analyst be in the data once they have been “translated” into a more useful form? It would also be helpful to develop an understanding of the relative strengths and weaknesses of different methods of uncertainty analysis for this type of data analysis. Video and other data are likely to be compressed for transfer, storage, and analysis. To what extent can data of different types be compressed before necessary information is lost? A related question is related to sampling. How many samples must be collected to have a particular level of confidence in the data, whatever the type? How can incomplete datasets be used for analysis, and how much uncertainty would such a dataset introduce? As noted in the previous section, data that have been collected must be searchable. Current methods of data mining are inadequate for managing multiple forms of data. One specific example of this is the case of video searching. There has been some success in video identification of specific features, such as license plates and faces, but substantial challenges in event tracking and identification remain. For example, it is sometimes difficult to identify an IED blast automatically with video. In part, that may be due to the overwhelming noise and clutter in video. Methods to filter out some of the noise before analysis and to identify events and patterns in a series of images would be useful. Systems should also be able to handle errors in the data and still allow accurate searching and analysis. A study of the human ability to filter out minor errors in data without consequence might inform this research. Another example of the need to develop data-mining methods that can handle multiple forms of data is the reports filed by patrols in theater. The reports can be filed weekly or biweekly, and this quickly leads to the creation of a very large number of files. Typically, the files are in Microsoft PowerPoint form and may include text, images, video, and audio data. Methods that can efficiently search these many different files, which may contain many forms of data, will be valuable. One common suggestion was that preanalysis of data by automated systems might assist in data fusion and analysis. Because some parameters must be placed within an automated system, a method of identifying the “important” elements of a dataset is necessary. Participants noted that some people make high-risk decisions with little information in noisy environments every day (for example, emergency-room personnel, air-traffic controllers, and poker players). Some work has been done in decision theory to study such systems, and the research may yield some value in developing filters for human- and instrument-derived data on IED activities. Participants also felt that network research and operations research may offer a great deal in addressing these challenges. 44

OCR for page 30
EMERGING THEMES The discussions at the workshop were wide-ranging, but a few research subjects were mentioned often enough to be considered themes: ! Collection, handling, and preprocessing of data. ! Availability of data for researchers. ! Improvement in and automation of data analysis. ! Characterization of electronic and social networks. ! Addressing the types, validity, and completeness of and noise in datasets. Collection, Handling, and Preprocessing of Data Another common theme in the workshop discussions was the collection, complexity, and methods of handling and treating the breadth of data relevant to predicting IED activities. Given the broad variety of data sources that are relevant to predicting IED activities, research on combining structured and unstructured data will be particularly valuable. For example, methods need to be developed to enable data from sensors (which may have varied temporal and spatial resolution) to be combined with intelligence and other information. Because the human dimension of IED campaigns is so important, an integrated approach could be beneficial in developing such methods, bringing together such fields as econometrics, engineering, psychology, and anthropology. In addition to research on combining data, research is needed on how to search for content in video and audio files and how to search many files that are created with common programs (such as Microsoft Word and PowerPoint). Interpretation of IED-related data is complex and requires that researchers have a way of placing the results of data analysis in the context of the environment in which the data were collected. Participants felt that the modeling tools presented during the talks were indicative of the potential for researchers in disparate fields to contribute to the development of such models. That may result in the development of better models to assist in the filtering of data and identification of anomalies and result in the further development of formal models for interpreting social-network behavior. Availability of Data for Researchers The availability of data and the ability of researchers to test models and hypotheses against data were of major concern to workshop participants. Proxy data are useful, but it would be helpful to have a sanitized dataset that is representative of field data and that can be used to test what a patrol might look for with potentially available sensors. Such a dataset could be made available to the research community and used in a competition, with a portion of the dataset withheld to determine the competition winner. Additional data could be made available by bringing together multidisciplinary groups that are sent to training centers to collect data and return home fairly quickly to analyze and propose research. That approach was successfully followed during World War II to engage and 45

OCR for page 30
make data available to operations researchers. However, it would not provide historical data and would have to be conducted in such a way as to avoid interference with training. Improvement in and Automation of Data Analysis One of the best tools for detecting anomalies in a dataset is a human being. It is important to understand and quantify the processes used by people in making high-risk decisions on the basis of incomplete or inconsistent information. Data peculiar to the IED problem may be classified or otherwise unavailable to researchers, but other contexts can be examined fruitfully, such as the decision processes of air-traffic controllers, stock traders, and meteorologists. Research in decision theory could also focus on adversarial learning and adversarial modeling. Research in cognitive psychology will also be useful. Some people are skilled at picking out objects or detecting changes or anomalies. Similarly, some law-enforcement personnel are able to discriminate quickly between normal and criminal behavior. Research that helps to identify behavioral attributes or metrics that enhance that ability would be useful in expanding our understanding of human information-processing capabilities and could help to improve training and data-filtration methods. In addition, research in human perception, visualization of data, and presentation of results in a user- friendly manner to aid in a decision-making is important. Such research could include neuroscience and investigate techniques for enhancing cognition. Research to enhance human-computer (mixed-initiative) decision-making will also be valuable. Characterization of Electronic and Social Networks IED campaigns are generally conducted by groups, and the groups form networks. Research that enhances our ability to model networks while taking into account uncertainty and the fact that the networks are dynamic could be valuable because it could further our understanding of how to influence the structure and behavior of networks. Participants noted that the methods of modeling telecommunication activity, genetic networks, reflexive theory, and others demonstrate the variety of ways that similar problems have been addressed in different fields. A multifaceted, multidisciplinary effort in network modeling, perhaps incorporating game theory and efforts in sociology, could be useful. Addressing the Types, Validity, and Completeness of and Noise in Datasets The reliance of effective analysis on complete, accurate data was highlighted many times during the workshop. Data on IED activities are generally collected in adversarial, civilian environments. That can lead to incomplete datasets because of the difficulty of collecting data consistently and collecting data with large, highly variable background signals and noise. In addition, data may be acquired in any number of forms—including audio, video, handwritten notes, and measurements from wireless 46

OCR for page 30
sensors—and may need to be fused to provide a complete picture of a situation. For such data to be used effectively in developing predictive models, they must be accurate. However, verification of data acquired in the field, such as data from human intelligence, may be difficult. Basic research in signal processing, data fusion, and system modeling could provide tools for addressing those issues. 47