Page 57 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

4

Models and Methods Relevant to NGA

Chapter 3 described ways to think about the models, data, analysis, and computation necessary for a model-based investigation. This chapter covers the committee charge (see Box 1.1), including a description of types of models and methods (Task 1), their relevance to the National Geospatial-Intelligence Agency (NGA) (Task 2), the state of the art (Task 3), and actions and research needed to make them more useful for geospatial analysis (Tasks 4 and 5). The chapter begins with a discussion of how the committee addressed each task. The remainder of the chapter is divided into sections on different classes of models and methods, each of which covers all of the tasks.

Given the breadth of national security and humanitarian challenges under NGA’s purview, it could be argued that dozens of models and analysis methods, each with important variants, are potentially relevant to NGA. It is not possible or useful to discuss every one of them to address Tasks 1 and 2. Instead, the committee focused on broad categories of models and methods that connect directly to NGA’s mission and that would help address the two example intelligence scenarios provided by NGA. With regard to the first, NGA’s mission is to produce geospatial intelligence by assessing and visually depicting physical features and geographically referenced activities on Earth. To extend this task to the modeling realm, NGA will need models of human behavior (activities), set within an environmental context (physical features), to develop scenarios and make predictions, as well as techniques for integrating, analyzing, and verifying geographically referenced data in a model-based investigation. These needs place a premium on the following types of models and methods:

Models of physical processes that affect human activities (e.g., weather and water flow);
Social system models of human behavior in a geospatial context;
Models of combined physical and social systems;
Inverse methods to infer uncertain model parameters from measurements of the real-world system;
Spatial statistics, data mining, and machine learning to discover trends, patterns, and associations in disparate data; and
Spatial network analysis to examine how patterns of relations affect behavior at the individual to state level.

Page 58 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

The sections below give examples of how these models and methods would contribute to answering the questions posed in the intelligence scenarios provided by NGA:

Megacities (greater than 10 million people): How will worldwide urbanization trends affect regional political, economic, and security environments?
Chinese Water Transfer Project: How do agriculture and energy production and consumption change over time? How and where will populations, including rural communities, shift?

Task 3—a description of the current state of the art in models and methods, including features and scales captured by the model, accuracy, reliability, predictability, uncertainty characterization, and computational requirements—was difficult to address for two reasons. First, each of these factors is a product not only of a model, but also of the particular context in which it is being used. As discussed in Chapter 3, the key questions driving the investigation will influence what models are used, the processes and features represented in the model, the type and accuracy of data needed, how or whether the model’s fidelity to the real-world system will be assessed, and the computational requirements. Thus, a description of the state of the art for an individual model is unlikely to apply to all variants and applications of that model, and even less likely to apply to a category of models. Second, many of the factors specified in Task 3 (e.g., scales and predictability) do not apply to analysis methods, and other useful factors (e.g., availability of software, training support, and data issues) are not specified. Consequently, the committee developed a common set of state-of-the-art factors for all of the models and methods discussed in this chapter, and often provided ranges or examples to describe them.

For Task 4, the committee considered actions NGA could take to use or adapt existing models, given the educational profile of NGA analysts (NRC, 2013), NGA’s experience with modeling, the difficulty of developing or reusing relevant models and methods, the availability of software and code, and the level of training support available. The objective was to identify a short list of ideas for making existing models more useful to NGA, not to produce a comprehensive action plan. Many of the actions could be undertaken in collaboration with partners in universities, federal agencies, and private companies. NGA already has relationships with a number of universities that have strong programs in geospatial science (see Box 4.1), some of which also have experience in models and methods discussed in this chapter.

Research funded by NGA offers another clue about NGA analysts’ knowledge and experience and also provides guidance on future research and development needed to use the models and methods for geospatial intelligence (Task 5). For example, spatial and temporal analyses have been major research themes for NGA for at least the past decade, and algorithms for data-intensive computing, particularly for image analysis, have been a research theme for the past 5 years.¹ Consequently, NGA likely has reasonable capacity and connections with outside experts in these areas. In contrast, space-time modeling and predictive models have only recently become research areas for NGA, and so capacity and connections will likely have to be developed.

PHYSICAL PROCESS MODELS

The physical system serves as the environment in which the social system evolves. Physical process models are executable descriptions of our understanding of atmospheric, oceanic, hydrologic, geologic, and other physical systems. Many of these processes lend themselves to geospatial analysis. Physical process models developed or used by NGA include those used to generate high-resolution representations of Earth’s magnetic and gravitational potential (Pavlis et al., 2012; see Figure 1.2). While grounded in the simulation of specific natural phenomena, physical process models often also supply information on the impact of environmental dynamics on human infrastructure, activities, and demographics. For example, the megacities intelligence question (see Box 1.2) would

___________________

¹ See NGA Academic Research Program Symposium programs for 2005–2015.

Page 59 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

BOX 4.1 NGA Partnerships with Universities

NGA has established relationships with dozens of colleges and universities, including historically black colleges and universities, for recruiting and continuing education purposes (NRC, 2013). In addition, NGA has selected a large number of universities as Centers of Academic Excellence in Geospatial Science as a means of cultivating relationships and partnerships.^a These include the following:

Alabama A&M University	Roane State Community College
Arizona State University	University of Alabama
Delta State University	University of Maine
Fayetteville State University	University of South Florida
George Mason University	University of Texas, Dallas
Mississippi State University	University of Utah
Northeastern University	U.S. Air Force Academy
Ohio State University	U.S. Military Academy
Pennsylvania State University

__________________

^aSee NGA Academic Research Program Symposium programs for 2005–2015.

likely require models of environmental changes that could stress urban populations, such as sea-level rise and increases in summer temperatures. The Chinese water transfer questions would likely require a large-scale model of the hydrologic system in China to predict surface flow, subsurface flow, and abundance of water under different water diversion scenarios.

Many physical process models involve simulating fluids that are governed by Navier-Stokes and continuity equations, which represent conservation of momentum and mass, and they are solved numerically through finite or spectral discretization approaches. Accurate and representative observations of the natural system are critical both for creating the physical process models themselves and for setting the correct initial and boundary conditions that constrain the physical processes in a model-based investigation.

Physical process models can be large, highly nonlinear, and may couple together multiple processes over a wide range of space and time scales (e.g., Figure 4.1). Large, complex physical process models are often expensive to develop and run. In some cases, it may be sufficient to run reduced-order models, which use theoretical approaches to develop a simplified version of the full process model (Berkooz et al., 1993; Mignolet and Soize, 2008; Moore, 1981). Reduced-order models are intended to provide adequate approximations to high-fidelity models at significantly lower cost and time-to-solution.

A reduced-order model need not be faithful to the full spatiotemporal dynamics of the high-fidelity model; it need only capture the essential structure of the simulated structure from the input parameters to the outputs of interest. The most popular approach to model reduction is to reduce the state dimension and the state equations using projection-based methods (Benner et al., 2015; Chinesta et al., 2016). Such methods are most successful for linear or weakly nonlinear models in low parameter dimensions. However, constructing efficient and capable reduced-order models that can handle complex nonlinear dynamical models and that are faithful over high-dimensional parameter space remains challenging.

Reduced-order models or emulators, which replace the computational process model with a response surface model, are also used to speed inverse or sensitivity analyses that require a lot of model runs. The emulator is trained from an ensemble of physical process model runs and creates a response surface mapping model inputs to model

Page 60 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

output (Marzouk and Najm, 2009; O’Hagan, 2006). Emulators can be used to predict model outcomes at untried input settings, allowing more thorough inverse or sensitivity analyses to be carried out.

FIGURE 4.1 Characteristic spatial and temporal scales of Earth system processes. SOURCE: NAC (1986).

State of the Art

Some physical processes, such as turbulence and heat and fluid flow, are relatively well understood and modeled due to a wealth of observation and theory. Other physical processes, such as subsurface dynamics, clouds, and ocean biogeochemistry, remain a challenge to model because of a lack of observations. This is a particular problem for highly detailed simulations of flows or other processes that vary at fine spatial or temporal scales (e.g., groundwater flows and precipitation). In addition, there are uncertainties regarding how any natural system responds to forcings and feedbacks.

Climate models are mature for the questions and scales they were designed to address and are a good example of the state of the art in physical process models. The models are complex—involving multiple processes (see Figure 3.4) and multiple types and scales of observations—are computationally demanding, and typically require a dedicated research center or a large community of researchers to develop, validate, and run. A global research community has emerged to simulate consistent past, present, and future climate-based scenarios of climate drivers, such as greenhouse gas concentrations, volcanic and manmade aerosols, and solar strength. Although the response of individual models to climate forcings varies, uncertainty is typically minimized by using multimodel ensembles that have run the same scenario (e.g., Figure 3.5). One of the greatest uncertainties in climate models concerns which emissions scenario will match future societal choices about energy, transportation, agriculture, and other factors (IPCC, 2013).

Substantial efforts have been made to downscale large-scale climate simulation results to the regional or local

Page 61 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

scales that are more relevant to decision making (Kotamarthi et al., 2016). The goal of downscaling is to achieve the realistic high-frequency spatial and temporal variance of the real world that the coarser information lacks. There are three primary downscaling approaches:

Simple, which adds trends in the coarse-scale data to existing higher-resolution observations (Giorgi and Mearns, 1991);
Statistical, which relates large-scale features of the coarse data to local phenomena using regression methods, typology classification schemes, or variance generators that add realistic high-frequencies to the coarse data (Wilby et al., 2004); and
Dynamical, which uses the coarse information as input to high-resolution computer models to dynamically simulate phenomena at much finer temporal and spatial scales.

Each downscaling approach has its own advantages and disadvantages. The less complex methods are simple, fast, and inexpensive to calculate, but they may produce inaccurate results, particularly for future scenarios that may differ from historic observations (Gutmann et al., 2014). Dynamical downscaling is more complex and expensive, but it has the potential to generate high-resolution information over a wider range of extrapolative scenarios. An example of a downscaling application is illustrated in Figure 4.2.

FIGURE 4.2 Mean annual precipitation downscaled (upper) from NCEP-NCAR Reanalysis (lower) helps capture the regional influences of high variability terrain, such as the Rocky Mountains. NOTE: BCSDm = bias corrected spatial disaggregation, applied at a monthly timestep, the statistical method used for downscaling. NCAR = National Center for Atmospheric Research; NCEP = National Centers for Environmental Prediction. SOURCE: Gutmann et al. (2014).

Page 62 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

The state of the art in physical process models, particularly weather and climate models, is summarized in Box 4.2.

BOX 4.2 State of the Art in Physical Process Models

Space and time scales: The scales simulated by the family of physical process models cover subatomic to galactic spatial scales and near-instantaneous to geologic time scales. Specific physical process models and methods are generally designed to simulate specific processes and features within a limited space and time scale (see Figure 4.1) and are rarely valid outside of those scales.
Fidelity: Physical process models are designed to balance the degree of fidelity to the amount of accessible computing power, the time available to carry out the simulation, and the complexity of the system being modeled. Models range from simple, low-fidelity models that can be run in seconds on a low-end tablet computer to very complex high-fidelity simulations that take months to run on the fastest supercomputers.
Accuracy and precision: In physical process models, accuracy refers how closely the simulation matches the average behavior of the observed system, whereas precision is a measure of the variance. A prediction that tomorrow’s temperature will be less than 200ºF is surely accurate but imprecise, whereas a prediction that it will be 23.456789ºF is precise but likely to be inaccurate. The accuracy of many types of physical process models, such as numerical weather models, has been improving over time because of constant advancements in scientific knowledge and technology (Bauer et al., 2015). Both accuracy and precision are taken into account in measures of model prediction skill.
Predictions and scenarios: The predictability limit for an individual forecast of the highly nonlinear weather system is about 2 weeks. The predictability goal for climate models is currently a season to a decade (a) through better knowledge and observations of the ocean thermodynamics, which is the principal driver of the system at those scales, and (b) by forecasting outlooks of lower-precision features, such as wet or dry trends over broad areas, rather than precise temperatures or rainfall amounts at specific locations (Slingo and Palmer, 2011).
Uncertainty analysis support: In physical process models, uncertainty characterization is a method for conveying the uncertainties that are inherent when simulating continuous environments with discrete grid points and time steps, using imperfect observations and models. Because physical process models are increasingly being used in decision-making contexts, more emphasis is being placed on quantifying model uncertainty in a manner that makes the data more usable and actionable.
Validation and assessment support: Process models are validated though detailed comparisons of the accuracy and precision of the simulations relative to the observations of the physical system being simulated.
Computational requirements: State-of-the-art physical process models have matched their computational requirements to the rapid increase in computational capability over the past two decades. Petascale (1015 floating-point operations per second [FLOPS]) computers became firmly established in 2014, and exascale (1018 FLOPS) architectures are currently being designed.
Data requirements: Physical model data output ranges from insignificant to overwhelmingly large, even in installations with dedicated automated high-performance mass storage systems. Substantial model output is publicly available, and much of it uses standardized data and metadata formats to improve interoperability.
Difficulty to develop: Low-fidelity models of simple physical systems can be trivial to develop, whereas high-fidelity models of complex systems require years of effort by large teams of researchers.
Reuse: Physical process models are usually designed to be reused extensively.
Software/code availability: Although software and codes for physical process models developed in academia are often open source, most physical process models developed for commercial, classified, or emerging research applications require establishing contractual relationships to acquire or use.
Training support: Highly variable. While many physical process models include sufficient documentation on their use and application, others do not.

Page 63 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

How to Make Useful for NGA

Rather than attempting to build in-house expertise in all relevant physical processes, NGA could leverage existing expertise in other organizations, either by becoming a user of model results or by becoming a partner in teams experienced in designing, carrying out, and analyzing physical process simulations. Vast amounts of process model output data are readily available, although much of it would require additional context to be useful for NGA applications, and much would have to be downscaled to the regional and local scales most relevant to NGA questions. In addition, some features of process model simulations may be reliable and useful in new applications. For example, framing the analysis in terms of risk (a function of vulnerability, exposure, and hazard) can be useful for examining the impact of physical processes on human systems. Finally, some process modeling teams develop benchmark scenarios (e.g., future emissions trajectories for climate models), and they may be willing to work with NGA to develop, run, and interpret scenarios tailored to NGA’s specific interests. In all of these situations, NGA will need to invest time identifying domain experts and collaborating with them to design new simulations or scenarios, to select existing model output appropriate to the NGA question under consideration, to minimize uncertainties associated with downscaling, or to understand the strengths and weaknesses of the models in NGA scenarios.

The results of physical process models are increasingly being adapted for use in geospatial tools such as geographic information systems (GISs), which could facilitate their use in geospatial intelligence. The capabilities and sophistication of geospatial technologies as well as the large size of the GIS community have prompted many physical process modeling groups to ensure that their model results can be integrated into the rapidly proliferating suite of open and commercial GIS tools. Physical process model data can be made GIS compatible by using controlled vocabularies, standardized conventions for time and geolocation, and metadata. Once the georeferenced physical process model data are in GIS-ready formats, they can easily be mapped into human systems such as populations, cities, infrastructure, land forms, or social entities. This capability enables interactive data exploration, analysis, visualization, and distribution, all of which would improve delivery of usable model information to a broad range of users and uses (Wilhelmi et al., 2016). An example of a GIS analysis of climate model results is shown in Figure 4.3.

FIGURE 4.3 “Beat the Heat in Houston,” a Web-based tool that integrates temperature data with information about cooling centers and Centers for Disease Control and Prevention–based recommendations for at-risk populations. SOURCE: Courtesy of Jennifer Boehnert, National Center for Atmospheric Research.

Page 64 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

NGA-Funded Research and Development Areas

Geospatial intelligence is based on analysis of the environmental context, relevant natural and human factors, and potential threats and hazards.² Because so many physical processes are relevant, and because the linkages between physical process models and downstream impacts are often highly nonlinear, NGA will be challenged to determine which physical process models are sufficiently important to develop and maintain for geospatial intelligence applications. Moreover, the important physical process models will depend on the intelligence application, and their importance can only be unambiguously assessed after a complete end-to-end system has been constructed. That said, some areas do lend themselves to NGA research and development, such as the following:

Precision real-time weather predictions to support applications such as (a) ground and aerial insertion, mission supply, and disaster relief missions in complex natural environments, or (b) source location, dispersion rate, and damage estimations from the release of hazardous materials in densely populated urban environments;
Improved simulations to anticipate regional extreme events and environmental threats—such as droughts, relentless heat (sustained high heat and humidity) events, disease vector precursors, or rapid sea-level rise—that could lead to large-scale social unrest, disruption, or migration;
System emulators and reduced-order models of large complex physical systems—such as climate, water, or chemical processes—to allow rapid exploration of scenarios of interest to NGA;
Methods to rapidly and inexpensively predict the importance of physical processes on intelligence applications;
Design of robust frameworks, couplers, and application program interfaces to include physical process models in intelligence applications; and
Refinement of physical models to facilitate their combination with social system models to gain additional understanding and potentially predictive capability of relevant coupled physical–social system stresses and behaviors.

To help determine what physical process models are needed, NGA could survey its analysts and customers on gaps in data, information, knowledge, and capability. NGA could then build the in-house disciplinary knowledge and expertise to develop models or use available physical process data needed to fill the gaps. Partnering with other agencies that have relevant experience or mission responsibilities would likely speed the development of NGA’s in-house capabilities.

SOCIAL SYSTEM MODELS

Social system models are used to understand human behavior and to make decisions in both idealized and real-world settings. Many of these models provide a means for understanding the social consequences of the diverse feedbacks inherent in a complex social system. Using these models to reason about how different courses of action will affect behavior or how different scenarios are likely to unfold will often be more important to NGA than forecasting. For example, the megacities intelligence question (see Box 1.2) would likely require models to determine what changes in social systems (e.g., financial, cultural, ethnic, religious, and health) may trigger political, economic, or security problems across different geoterrains. The Chinese water transfer questions would likely require social system model scenarios of how affected populations are likely to respond to dam construction and involuntary migration.

A variety of social system models can be made useful for NGA, but those that most naturally lend themselves to

___________________

² See www.nga.mil/ProductsServices/GEOINTAnalysis.

Page 65 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

reasoning about geospatially embedded social systems are (1) system-level models and technical graphical methods (e.g., Bayesian influence models and Petri-nets) that support modeling of system-level flows and influences and (2) agent-based approaches that support modeling of emergent population behavior. These types of models support what-if reasoning, and they can be used to help NGA think through possible social responses to events embedded in particular geographic conditions, frame arguments, describe the interplay of complex processes, explain a behavior outcome to a physical process, or discuss alternative courses of action. These classes of models are described below.

System-level models. In system-level models, such as system dynamic models (Forrester, 1961; Sterman, 2000), causal loop or influence networks are used to convey how one part of the system influences another (the “wiring diagram”). Geospatial factors are incorporated in system-level models in three ways. First, direct causal and influence relations can be represented. For example, a reduction in rainfall reduces crop yields and increases food prices, and the resulting food shortage results in more illnesses and emigration. Second, geographical differences in sociopolitical influence can be captured in models of different locations (e.g., drought may be modeled as leading to emigration in one city and to more wells being built in another city). Third, changes in geographical resources can be represented in terms of stocks or the likelihood of events (e.g., level of deforestation or likelihood of building a well). The wiring diagram is often the most valuable aspect of a system-level model, because it allows the developer and the end user to gain a sense of system behavior, evoke discussion, and create common understandings. The results of the simulation itself are not always credible because of (1) insufficient theory, data, or time to instantiate the model and test the connectors between components, or (2) uncertainties in the functional form of a relation of one stock to another in the model.

Agent-based models. In agent-based models (Bonabeau, 2002; Gilbert, 2008), heterogeneous agents are placed in a context and group-level outcomes emerge from the interactions among these agents as they operate under various rules of behavior, cognitive processes, and modes of learning. Agent-based models fall into three major classes: cognitive, population, and network based. Cognitive models use only a handful of actors and focus on the detailed behavior of the actors engaged in specific tasks. The modeling frameworks often take into account differences in how data are sensed (e.g., by touch or sight), memory considerations, or time for precognitive and cognitive functions. Population-based models use large numbers of actors with simple information-processing behaviors, and they concentrate on emergent group-level phenomena. The modeling frameworks often take into account differences in the resources, goals, capabilities, or information available to the agents. Agent-based dynamic-network models (Carley et al., 2009) use a moderate number of actors with moderately complex cognitive functioning and a position in one or more social networks, so that what they know is related to whom they interact with, and the distribution of action and information changes as the social networks evolve, and vice versa. Because agent-based dynamic-network models incorporate social cognition (knowledge about groups, generalizations based on group membership, and presence of a generalized other), their predictions about social behavior are more realistic than those of other agent-based models. For all agent-based models, there is a tradeoff between the number of agents and the fidelity at which the actor’s cognition and social position are modeled.

Geospatial factors in agent-based models can be accounted for in three ways. First, agents can have geographically specific behavioral rules (e.g., agents representing Europeans might have a different caloric intake and response to violence than agents representing people in the heart of Africa). Second, types of agents may be differentiated by geographical properties (e.g., agents representing global companies might have a global reach, whereas agents representing a worker might have only a city-level reach). Third, the agents can be designed to move through a virtual landscape that represents the physical geography and to behave differently at different locations in that landscape (e.g., they can only get water at a well). The landscape is often represented as a grid or toroid around which agents move, and maps may be overlaid. Models that use actual maps and direct agents to move accordingly, such as along roads, typically assume the social network is fixed or dictated by location. In

Page 66 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

contrast, models that have better representations of the communication space (including the digital landscape) often have primitive representations of the physical landscape.

It is often more complicated to set up an agent-based model than a system-level model. Moreover, whereas system-level models can provide insight even at the wiring diagram level, agent-based models generally need to be instantiated and virtual experiments need to be run to gain insight for model-based reasoning. The lengthy development and validation process means that agent-based models will be most useful for activities with a multiyear time horizon.

State of the Art

A great deal has been learned about the large number of cognitive biases held by individuals and social groups, social constraints on communication, and the effect of incentives and sanctions on the distribution of goods and activities that are important to consider in models of geoembedded social activity. More than 100 cognitive biases and hundreds of biases in the formation of network ties and the factors that constrain actor access to information have been identified. A large number of factors that influence the existence and strength of relations among actors and under what conditions those relations affect decisions have also been identified. Social system models are becoming increasingly complex and are better representing human behavior (Weinberger, 2011). The realism of models based on cognitive tradition is being improved by imbuing the agents with social cognition. Active areas of research include the following:

Deviations between actual behavior and normative standards;
Elicitation of human preferences, beliefs, desires, aspirations, and so on needed to construct realistic models of human behavior;
Deviations between actor decisions and behaviors when acting independently or in a group;
Understanding how the social context or structure of the group affects group or societal outcomes; and
Understanding how to represent uncertainty in an individual model and in combined subsystem models.

The current state of the art in social system models is summarized in Box 4.3. Social system models are complicated to build, are data greedy, and operate at such a transdisciplinary level that underlying theories may not exist. These complexities mean that most models are used for reasoning, rather than forecasting, and that there is little validation and little to no model resuse. On the other hand, building models is facilitated by the availability of toolkits. System-level model toolkits are highly developed, and multiple commercial products (e.g., iThink and Stella for system dynamic models) are in use. Simple models, on which more detailed ones can be built, have been developed. The models can be built to output data directly to various statistical packages for analysis. The methodology has been documented extensively, and many practitioners find that simply creating the model “wiring diagram” is sufficient for explaining how the parts of the system work together. Toolkits for agent-based modeling are not as mature, but they facilitate model development and support integration with the common statistical platforms and network analysis tools used for analysis, validation, and system testing. Common toolkits for the more cognitive models include ACT-R and SOAR, and toolkits for the least cognitive models include Repast, Mason, and NetLogo.

Page 67 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

BOX 4.3 State of the Art in Social System Models

Space and time scales: The methodologies used for social system models (e.g., system dynamics models of human behavior) can be used at a wide range of scales. Some models do not consider space; others take into account spatial scales ranging from a small area around a single actor to the entire globe. Likewise, some models do not consider time; others operate at temporal scales ranging from a few nanoseconds (precognitive) to centuries. Spatiotemporal scale tends to be correlated with the number of actors being modeled, particularly for agent-based models. Cognitive agent-based models cover the least space and time, typically those needed for a single task, whereas more general agent-based models and system dynamic models tend to be used for global or multiyear models. Many models do not represent time or space in a “strong” way, and so they provide outcomes only in relative terms (e.g., increasing or decreasing) and present the possible order of actions, but not the time, they occur.
Fidelity: The fidelity of the social system model depends on the phenomena being modeled and the effort invested in the model. The methodologies support any level of fidelity. Because most models are built to demonstrate or explain a general phenomenon, they have relatively low fidelity on all dimensions, except the one(s) of critical concern to the modeler or end user. Most models used for forecasting are partly validated, are tightly tied to some data streams, and were often tuned at least once to some historic event.
Accuracy and precision: The accuracy and precision of these models depends on the underlying theory being modeled and the level of data used to instantiate the model. Most current models are relatively imprecise and tend to be more accurate about general processes and groups than specific events or people.
Predictions and scenarios: Social system models are used not for prediction in the classic sense but rather for suggesting the landscape of possible future scenarios and the relative likelihood of various outcomes. When the scope of the model is narrow and sufficiently multidisciplinary teams are involved in model development, it is usually possible to match the space of possibilities generated by the model with the frequency of events that occur.
Uncertainty analysis support: Few of the modeling methodologies support automatically tracking uncertainty and the propagation of uncertainty through the model. In general, uncertainty is handled by running a large virtual experiment and then examining the distribution of the results. Consequently, analysts talk about the robustness of the results to changing parameters. When the underlying process is not well understood, the process is commonly modeled as random, or as a set of alternative processes which are then compared.
Validation and assessment support: Most social system models violate the assumptions on which validation theory is based (e.g., stationarity of process), and so validation methods worked out for physical systems do not apply to social system models. For example, tuning the inputs and processes to generate outputs that match a historic case generally yields an overtuned model that cannot be used for adaptive actors. The level and type of validation for social system models depends on the purpose of the model. Most social system models never receive more than face validation because they are typically used for reasoning rather than prediction. Validation and assessment are generally easier when the models are built so that the inputs and outputs are in formats that can be used by standard social network tools and statistical packages (e.g., in CSV format).
Computational requirements: System dynamic and agent-based models can generate terabytes of data. When large numbers of virtual experiments are run, distributed processing systems (e.g., cloud computing or a condor cluster) are useful. A laptop is sufficient for small models and experiments. Currently, few models can be turned into Web applications because of both data and processing demands.
Data requirements: It commonly takes more time (sometimes an order of magnitude more time) to collect data to instantiate or test a model than it does to code the models, particularly those used for detailed assessment. Data are often drawn from multiple heterogeneous sources and the diverse data streams need to be fused. The amount of data generated by the models depends on factors such as the number and cognitive fidelity of actors that are modeled, the number of social outcomes tracked, and the number of time periods simulated, which add up to a handful in cognitive agent-based models to hundreds of thousands in population

Page 68 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

often exceeds the amount of empirical data that could be collected from the real world on the same topic.
Difficulty to develop: High. Social system models generally take 1 month to 1 year to develop, although many of the findings from traditional work can be applied in a few minutes. Instantiating the models with data and validating them requires at least an equal amount of time.
Reuse: Most models are built once and never reused.
Software/code availability: Multiple modeling toolkits exist for many classes of models. No toolkits have all the findings regarding cognitive biases, social network biases, or game-theoretic considerations built in.
Training support: Textbooks on modeling now exist, but most cover only one type of social science models, such as system dynamics models.
Data-to-simulated results: Many models are paper-only designs and are never actually built, and so no simulated results are generated. A paper model, often referred to as a wiring diagram, is used to illustrate what factors may be influencing others. Such models are commonly used to support reasoning.

How to Make Useful for NGA

For NGA, social system models are most useful for understanding how diverse social and physical subsystems interact to effect new outcomes. The accuracy of these models depends on the correctness of the assumptions, theories, available data, and the process description of the connectors between different subsystems (e.g., a description of how a change in water availability affects health or job availability). Teams comprised of computer scientists, engineers, and mathematicians will not have a sufficient understanding of the social processes, what theories are valid or untested, what data are available, and the current state of the art in modeling the relevant phenomena (Medina and Hepner, 2015). Consequently, the model development team may need to be quite large and include experts in multiple areas, such as geography, history, psychology, organization science, economics, and sociology. NGA has few scientists with a background in these areas. For system-level models, which must be tuned to provide accurate forecasts, data analysts, statisticians, individuals trained in experimental design, and a host of specialists for each critical connector process may also be required. Given these considerations, NGA might be best served by utilizing the expertise and skills of modelers outside of the agency.

For NGA analysts who need to model human behavior, an understanding of the limitations of simple traditional models (e.g., simple game theory and rational actor models) as well as basic training in cognitive biases, network biases, and the role of incentives and sanctions would be invaluable. Such training would help these modelers understand the limits of current knowledge and the areas of uncertainty in modeling geohuman activity, and gain the vocabulary for working with other social system modelers.

Finally, for social system models to be valuable to NGA, they need to be developed within the space of geospatial data. Huge volumes of disparate data are often required to effectively model the system of concern. For example, modeling the potential impact of drought on a population requires data on water levels, water use, laws governing water use, population location, growth and movement, and other data. Given the disparate models and methods used in the different domains of interest to NGA, greatly enhanced data interoperability capabilities will be essential for social system modeling. Particular needs include the ability to search across all NGA data holdings using natural language searches, standardized metadata to enable fast searches and cross correlations, and the ability to bring multiple data streams into a single analysis platform.

Page 69 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

NGA-Funded Research and Development Areas

Social system models are useful because they account for the human use of space, the impact of geographic features on sociocultural behavior, and the impact of sociocultural behavior on the built environment and the geography. However, the utility of these models is limited to providing general high-level guidance, supporting model-based reasoning, and demonstrating geosocial behavior in specific settings. The following core research would improve the utility of social system models for NGA:

Developing new techniques and procedures for increasing the ease of developing, testing, using, and reusing social system models. A particular need is to develop a new theory of validation for social system models.
Developing an infrastructure for combining models at different levels of resolution and the basic theories, methods, and algorithms for moving between these models at different levels. This simulation testbed needs to support the incorporation, running, and comparison of models built using different paradigms as well as the associated statistical and network analytic tools. A key will be developing a common representation scheme for temporal, spatial, and group features at different levels that can be used with both system dynamic and agent-based models.
Developing standards for representing spatial information and for collecting and fusing geosocial data.
Developing data sets at different temporal, spatial, and group levels of granularity that can be used by modelers outside NGA to develop tools and techniques of value to NGA. Such open-access data would allow the broader modeling community to work on geospatial-related social issues.
Improving understanding of how human behavior is constrained or enabled by the geography of the natural and built environment.

Intelligence questions are time sensitive, and so decreasing the amount of time or personnel required for model development, testing, and use is critical if social system models are to be used more routinely for geospatial intelligence. Key approaches to decrease time and effort include simulation testbeds and methods for making social system models reusable, automating the empirical instantiation of social system models, and conducting sensitivity analysis. Other necessary advances would be enabled by developing a comprehensive representation scheme for geographic factors and shareable geographic data for instantiating the models using these representational forms. Challenge problems for developing social system models using these common representations and shareable data could be beneficial here. Finally, basic research that would improve NGA’s modeling capability over the long term includes understanding how geographic factors influence the development of social networks and communications among actors, including covert actors, and how cognitive biases influence the perception of space.

COUPLED PHYSICAL–SOCIAL SYSTEM MODELS

Many geospatial intelligence investigations will involve multiple physical and social system subsystem models and processes that interact with one another. Subsystem models and processes that depend on one another may be coupled to understand the interactions between social and physical systems at different locations. The coupling can either be one way, in which only one model supplies data to the other, or two way, in which both models exchange data (see Appendix A). One-way coupling is simpler to implement and the results are easier to analyze, but two-way coupling tends to yield more realistic results. One advantage of coupling is that simpler and inexpensive models of subsystems (reduced-order models) can be substituted for almost any full model, enabling faster testing and execution of the coupled system. Coupling is increasingly being used to defray the cost of developing large models, to facilitate adaptability and expansion, to leverage the strength of multiple modeling technologies, and to reduce reliance on any one system developer.

Page 70 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

Physical and social system models are increasingly being coupled to understand and simulate current and future behaviors and interactions of humans and their environment at various locations and spatiotemporal scales. For example, the megacities intelligence question (see Box 1.2) would likely require coupled physical process–social system models to examine how an urban population may respond to an environmental stressor, such as a heat wave, water shortage, vector-borne disease, or air pollution. Coupled models can focus on individual sectors, such as the transportation system and its effect on the environment,³ the effect of climate on the energy demands of buildings,⁴ or the effects of climate on agriculture (e.g., Figure 2.10) or other ecosystems. More recently, efforts are being made to couple sector-specific models. For example, the PRIMA project links energy-system models, infrastructure models, and regional climate models.⁵ Because many problems are spatially heterogeneous, the models being coupled may depend on the specific location of interest.

More complicated interactions and feedbacks among physical and social system processes may be captured in integrated assessment models. Such models combine multiple features of human cultural, religious, and political domains; economic, financial, energy, transportation, and food systems; and the natural world (see Figure 4.4). The Chinese water transfer intelligence question (see Box 1.2) would likely require integrated assessment models to examine the complex interactions among water, agriculture, and energy production and consumption in China. Integrated assessment models can be used to identify vulnerabilities between different systems or regions, and their outputs are frequently used as drivers for other models, such as climate models, conflict models, and regional impacts models.

FIGURE 4.4 Schematic of a coupled human–Earth system model. SOURCE: DOE (2009).

___________________

³ For example, see the GREET model, https://greet.es.anl.gov.

⁴ See the BEND model, http://prima.pnnl.gov/regional-building-energy-demand.

⁵ See http://prima.pnnl.gov.

Page 71 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

State of the Art

Coupled models range from fairly basic mathematical forms to complicated network and agent-based formulations that simulate human movement and contact. The state of the art depends on the specific model. A good example of the state of the art is the energy community’s integrated assessment models, which simulate large-scale aspects of and trends in technology adoption, economic activity, population growth, and physical system constraints or resources, and are designed to explore different scenarios and outcomes. The state of the art in integrated assessment models is summarized in Box 4.4.

BOX 4.4 State of the Art in Integrated Assessment Models

Space and time scales: From countries to regions (collections of countries) and years to decades. A few models operate on a subnational scale for particular sectors (e.g., agriculture and land use). Specific integrated assessment models and methods are generally designed to simulate specific processes and features within a limited spatial and temporal scale and are rarely valid outside of those scales.
Fidelity: The fidelity of integrated assessment models lies in their representation of interactions between different sectors of the economy. These models often use reduced-form models of an individual sector, for example, representing large groups of actors with a single representative agent.
Accuracy and precision: The accuracy of integrated assessment models is not well quantified.
Predictions and scenarios: The models explore scenarios of how the world might unfold under different conditions rather than provide predictions because of high uncertainty about the future.
Uncertainty analysis support: Sensitivity analysis and multimodel comparison are commonly used to characterize uncertainty, although recent efforts have focused on more formal uncertainty characterization techniques (e.g., Butler et al., 2014).
Validation and assessment support: The community has largely focused on understanding future dynamics and trends, and it only recently began a process of formal model validation over a historical period.
Computational requirements: Computationally inexpensive, a single simulation can often be run on a laptop on the order of hours, although increasing spatial, temporal, and process resolution are increasing the computational expense. Efforts to characterize uncertainty, which may involve hundreds to thousands of simulations, require larger computers and computing clusters. Certain model configurations may also increase computational expense. For example, compared to integrated assessment models with a simple climate component, the integrated Earth System Model (Collins et al., 2015) uses a higher-resolution climate component, resulting in a 10⁵-fold increase in computational expense.
Data requirements: Integrated assessment models require large sets of internally consistent data, including information on energy production, consumption, agriculture, land use and land cover, emissions, and the economy. Global data, divided into countries or regions, are necessary. This community is moving toward freely accessible and interoperable data.
Difficulty to develop: High. Most integrated assessment models have been developed by large interdisciplinary teams over years to decades.
Reuse: High. These models are used by multiple researchers for numerous projects and studies.
Software/code availability: Variable. For many integrated assessment models, software and code are limited to those researchers employed by the developers. Some models are now open source.^a Additionally, some organizations, such as the Global Trade Analysis Project and the Energy Technology Systems Analysis Program, provide software and data used to develop models for a fee.
Training support: Variable. For many integrated assessment models, training support is limited to those researchers employed by the developers. However, some of these models are moving toward a community-based approach, and offer support through annual tutorials, listservs, and online documentation.^a
Data to simulated results: These models were designed to transform data to simulated results. However, the process by which this occurs differs by model.

__________________

^aSee http://www.globalchange.umd.edu/models/gcam.

Page 72 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

How to Make Useful for NGA

A number of model-based investigations of interest to NGA will require the use of multiple models. Developing expertise and skills in using different classes of models (e.g., process and empirical), and experience in combining models and comparing their outputs will likely be useful. For NGA analysts who need to work with physical–social system models, it is important to understand that uncertainties associated with the specifications of human behavior often far exceed those associated with the specifications of physical systems. Thus, improving the performance of coupled physical–social system models may best be achieved by improving the social and behavioral aspects of the model.

NGA analysts can learn to run existing coupled models, such as integrated assessment models. However, significant expertise would be required to develop the models further or to interpret the model results. It may be useful for NGA to develop a catalog of scenarios or types of analyses that are scientifically supported and that can be modified for future model-based investigations. In such cases, NGA will need to understand the strengths and weaknesses of integrated assessment models and how to expand the scope of analysis, including designing self-consistent scenarios and developing strategies for analyzing coupled results. Working with developers of integrated assessment models—who have experience coupling models from different disciplines, with different resolutions, and built for different purposes—would likely be helpful.

NGA-Funded Research and Development Areas

The following areas of research could improve the usefulness of coupled physical–social system models to NGA:

Improved representation of the system components being modeled,
Methods for formal verification and validation of model results against NGA-relevant benchmarks and test cases, and
Formal uncertainty quantification techniques.

System components in coupled models are often derived for a specific purpose and it may be necessary to improve their representations for NGA use. For example, because integrated assessment models were developed to analyze long-term climate and climate mitigation, the models typically represent the world in 5- to 15-year time steps, capturing long-term trends and not interannual variability (Krey, 2014). The questions of interest to NGA may require shorter time steps and better representation of short-term phenomena. Formal verification and validation of coupled models is nontrivial because of the level of interaction and communication between the different elements. In integrated assessment models, the heavy dependence on scenario assumptions and boundary conditions makes independent verification difficult. Finally, there are significant uncertainties surrounding the future evolution of human and natural systems. While some uncertainty techniques (e.g., scenario analysis) are widely used, the use of formal uncertainty techniques is somewhat nascent.

INVERSE METHODS

A forward model, such as the physical process or social system models discussed above, requires input parameters to produce outputs (i.e., predictions). These input parameters may describe spatially distributed initial conditions, boundary conditions, and source terms, as well as model coefficients, physical constants, or even model structure. Rarely are all of these input parameters known in advance. Typically system observations (i.e., data recorded at various space-time locations) are required to either estimate these parameters or constrain their

Page 73 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

uncertainty so that model-based results are more realistic. The task of inferring these unobserved input parameters from system observations is called solving the inverse problem.

There are two main challenges in solving inverse problems. The first is that the forward model is often computationally demanding to run, making it time consuming to search through the input parameter settings to find values that are consistent with the data. The second is that many inverse problems are ill posed, meaning that many different sets of parameter values may be consistent with the data and their noise. This characteristic of inverse problems, as well as uncertainties in the data and model, imply that uncertainty in the solution is a common feature of inverse problems.

Inverse methods provide systematic frameworks for employing data (i.e., system observations) to reduce uncertainties in those model parameters, bringing the model output closer to the real-world system. As such, they are fundamental to any modeling endeavor. They are most commonly applied to computationally demanding physical process models, such as hydrologic models, where observations of flow and pressure, taken at various spatial locations over time, are used to estimate porosity and permeability of an aquifer. The Chinese water transfer intelligence questions (see Box 1.2) would likely require inverse methods to estimate or constrain key model parameters of a large-scale hydrologic model (e.g., spatially varying permeability, flow rates, and evaporation) to produce plausible predictions of water availability as a function of location throughout China.

Inverse methods may be categorized as (a) simultaneous, where complete model runs are combined with the full set of observations to estimate all unknown parameters simultaneously, or (b) dynamic, where model runs over smaller time intervals are combined sequentially with additional observations to estimate model parameters and produce predictions, conditional on the data observed up to the current time. For the most part, sequential inverse methods, often called data assimilation methods, have been developed to estimate unknown parameters or the state of a dynamic system over time. Dynamic inverse methods include the Kalman filter (Kalman and Bucy, 1961), the extended Kalman filter (Anderson and Moore, 2012), the ensemble Kalman filter (Evensen, 2009), the particle filter (Liu and Chen, 1998), and a wide variety of related approaches. Simultaneous inverse methods have been used to estimate unknown parameters (or states, source terms, etc.) in both transient and steady-state systems (Tarantola, 2005). The various dynamic and static methods offer different strengths and weaknesses regarding computational cost, model fidelity, data completeness, size of the parameter vector, and other factors. Figure 4.5 shows a simultaneous (left) and a dynamic (right) inverse method applied to the same inverse problem.

Inverse methods can be either deterministic or probabilistic. The estimates in Figure 4.4 are probabilistic, showing plausible state reconstructions. The Bayesian paradigm is most commonly used for the probabilistic approach. It seeks to statistically characterize the probability of all sets of parameter values that are consistent with the data, the model, and any prior knowledge of the unknown parameters. In contrast, deterministic inverse methods seek a parameter setting that results in the best match to the data, typically with some penalty on the parameters to render the solution unique (at least locally).

In general, the choice of inverse method depends on the system being modeled (e.g., dynamic or static), what is being estimated (e.g., a few parameters or a million-dimensional state vector), available data (e.g., diversity, accuracy, volume, and velocity), computational considerations, and properties of the forward model (computational demands, processes it captures, and derivative information). It is common to tailor an inverse method to the features of the problem at hand.

State of the Art

Dynamic inverse methods. Simultaneous inverse methods, including deterministic and probabilistic methods, are most commonly applied to physical system models, for example geophysical models. Deterministic approaches to inverse problems are typically formulated as penalized (i.e., “regularized”) nonlinear least-squares optimization problems, where the data misfit function (i.e., the squared difference of model predictions with observed data)

Page 74 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

FIGURE 4.5 Comparison of simultaneous (left) and dynamic (right) inverse methods to estimate a one-dimensional state over time. The model allows many plausible evolutions of the state (position) over time (gray lines). At time points t1 = 25, t2 = 50, t3 = 75, and t4 = 100, a measurement is taken, giving the location of the object (with uncertainty) at that time. The simultaneous method uses complete model trajectories along with the entire set of observations to produce plausible trajectories of the object given the model and all of the data (red lines). The dynamic method estimates plausible trajectories over each time interval, combining the model with trajectories produced from previous intervals, the new data point, and the model. Candidate trajectories over the time interval ti to ti+1 are produced by extending plausible trajectories from the previous interval (blue lines). Only trajectories that are compatible with the new data point at time ti+1 (red lines) are kept. This process repeats with each new time interval, giving plausible state estimates given the data up to time ti+1.

forms the least-squares objective function. The state of the art depends on the nature of this objective function and the availability of derivative information from the forward model. When the objective function is nonsmooth, optimization methods that are specialized to noisy functions (e.g., certain direct search methods or simulated annealing) can be used. However, these methods quickly become prohibitively expensive as the parameter dimension grows and the execution of the forward model becomes more computationally expensive (e.g., Earth system models).

When the underlying objective function is smooth and gradients of the function with respect to the model parameters are available, powerful gradient-based numerical optimization methods, often based on Newton’s method or its variants, may be employed. These methods can often converge at a cost measured in forward model solutions that is independent of the parameter dimension. If gradients are not available, they can be approximated using finite differencing, generated by automatic differentiation, or obtained by developing an “adjoint” of the forward model (Marchuk, 1995). Of these three, the adjoint method is the most computationally efficient (only a single linearized model solution is required), but it can be difficult to retrofit to legacy codes. Finite differencing is too expensive for large numbers of parameters and often inaccurate for highly nonlinear models. Automatic differentiation is attractive since it requires as input only a forward code, though application to very complex geoscience models is often problematic (but exceptions do exist, such as the Massachusetts Institution of Technology ocean global circulation model). Developing methods for obtaining adjoints of (regularized) nonsmooth problems or legacy codes, as well as automatic differentiation methods that apply to very complex codes, are active areas of research.

Estimating uncertainties in inverse problems typically involves more than just finding the optimal parameter solution. State-of-the-art probabilistic inverse methods use the Bayesian paradigm for statistical modeling, producing a posterior probability distribution for the unknown parameters. This posterior distribution describes the probability of parameters given the data, model, and any prior knowledge on model parameters. The red lines in

Page 75 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

Figure 4.5 are draws from the posterior distribution of the state given the data. For nonlinear forward models, the resulting posterior distribution is not in a standard form, and so computing the mean or variance is nontrivial and drives much of the research in this area.

Markov chain Monte Carlo (MCMC) methods are commonly employed to produce samples from the posterior probability distribution. MCMC algorithms range from simple and general (e.g., the Metropolis-Hastings algorithm) to high dimensional and derivative based (e.g., Metropolis-adjusted Langevin algorithms; see Brooks et al., 2011, for a review of methods). The basic approaches are amenable to nonstandard models, even those with discontinuities and regime changes. However, they generally require the forward model calculation to be computed quickly, and they may have difficulty with high-dimensional parameter vectors. Recently developed MCMC methods can leverage derivative information from the forward model, efficiently producing posterior samples, even for high-dimensional parameter vectors. Such methods have proven effective in physical process models that also produce first-, second-, and even third-order derivative information in the course of the forward model run. These highly specific MCMC algorithms are generally developed in concert with the computational forward model and require a substantial amount of expertise in modeling and high-performance computing.

Sequential inverse (data assimilation) methods. The original Kalman filter still influences the research frontier in dynamic inverse methods, yielding an analytic Bayesian solution when the forward model is linear and the errors follow a normal distribution. As with simultaneous inverse methods, the difficulties come with nonlinearities in the forward model, and nonnormality in the errors, both in model evolution and observation. Research is driven by the ever-growing parameter or state space dimensions that arise from increased model resolution, evolving data products (more data, arriving more rapidly), and the computational challenges associated with large-scale models.

Perhaps the most common and successful deterministic data assimilation method is 4D-Var (Ide et al., 1997), assimilating data over a moving four-dimensional space-time observation window. It uses derivative information to produce best estimates of the large-scale parameter (state) vectors, ingesting new data to refine estimates at various intervals, iteratively solving a simultaneous inverse problem over the moving observation window. 4D-Var requires derivative information about the response of the model, and so it is applicable only to systems with smooth input–output maps, such as those found in geophysical systems, such as atmospheric systems. Uncertainty can be estimated using a normal approximation based on the data misfit function used in the 4D-Var procedure.

Ensemble-based methods for dynamic inverse problems (Evensen, 2009; Ott et al., 2004; Tippet et al., 2003) can leverage parallelism to estimate very high-dimensional parameter or state vectors and their uncertainty without the need for derivative information from the forward model. These methods have been used successfully in physical process models (e.g., weather or ocean modeling), although their efficacy for social system models is largely unexplored.

Ensemble-based approaches give approximate draws from the posterior distribution. If the unknown parameter or state vector is small and the forward model can run sufficiently quickly, more exact sequential Monte Carlo methods (Liu and Chen, 1998; Ristic et al., 2004) may be applicable. The dynamic example in Figure 4.5 uses one of these methods. Although limited by the size of the parameter or state vector that can be accommodated, sequential Monte Carlo approaches can handle nonlinear forward models that exhibit rapidly changing behavior and nonnormal error distributions—properties that apply to a number of social system models.

Ongoing research in numerical data assimilation tools for dynamic systems focuses on (a) estimation for cases where the computational cost of the forward problem is high, thus making the use of a large number of iterations or a large ensemble size unrealistic, and (b) the development of more reliable ways to quantify the uncertainty associated with estimates.

The state of the art in inverse methods is summarized in Box 4.5.

Page 76 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

BOX 4.5 State of the Art in Inverse Methods

Space and time scales: The relevant scales that inverse methods will inform about will depend on the resolution of both the forward model and data sources being used for the analysis. The space and time scales of the model and data should be comparable.
Fidelity: The fidelity of the results from analyses using inverse methods will depend on the fidelity of the forward model and on the quality of the data being ingested in the analysis. The fidelity of the model is an important consideration in analysis results that describe uncertainty in future scenarios or predictions.
Accuracy and precision: One of the goals of inverse analyses is to produce some characterization of accuracy or precision of the results. The accuracy or precision of the results depends on a number of factors, including the properties of the forward model, the quality of the data, and experience with previous results.
Predictions and scenarios: Deterministic inverse methods produce a single “best estimate” for a prediction; probabilistic inverse methods seek to describe the uncertainty in possible outcomes via error bars, probability distributions, or ensembles of possible outcomes or scenarios.
Uncertainty analysis support: Probabilistic inverse methods quantify the uncertainty in the inverse solution, in both simultaneous and dynamic settings. Deterministic and filtering methods can estimate uncertainty using ensembles or Gaussian approximations.
Validation and assessment support: Validation and model assessment are common elements of inverse methods. How well predictions from inverse methods will fare in new settings depends on the representativeness of both the forward model and the data used for estimation in the new scenario.
Computational requirements: Inverse methods can be applied to small forward models that run quickly on a laptop or to larger models on supercomputers. Some specialization of the inverse methodology is often required to deal with the properties of the forward model. Generally, inverse methods can be hundreds to millions of times more expensive to solve than the corresponding forward model.
Data requirements: Inverse methods require data to estimate model parameters and model structure. The information contained in the data about the parameters dictates how ill posed the inverse problem is, and thus what kind of regularization or prior information should be used, and what additional data sources may improve the results.
Difficulty to develop: Easy (for interfacing simple models with black-box-based inverse methods) to difficult (for developing adjoint-based implementations for derivative-based inverse methods with custom inverse solvers, which requires expertise in optimization theory, adjoint methods, and Bayesian statistics).
Reuse: Inverse methods must be supplied with both method- and problem-specific information, including smoothness penalties, prior information, observations and their uncertainties, and forward models and their uncertainties. Derivative-based methods (as opposed to black box) must be further supplied with derivatives of the misfit between model predictions and observations with respect to the model parameters.
Software/code availability: Open-source and commercial software for solving inverse problems using a variety of methods is widely available. Useful guides for available optimization software include the Optimization Decision Tree^a and the DAKOTA software.^b Data assimilation software, using ensemble-based methods, is also available,^c but requires substantial expertise to implement.
Training support: Textbooks, conferences, and short courses on inverse problems and methodology abound.
Data to simulated results: Given that the forward model and data sources are available, implementation time and effort ranges from hours to interface a simple model to a black box optimization or sampling method to a few years to develop an adjoint/gradient-based method for a complex model with custom inverse solver.

__________________

^aSee http://plato.asu.edu/sub/pns.html.

^bSee https://dakota.sandia.gov.

^cSee http://www.image.ucar.edu/DAReS/DART/; https://math.la.asu.edu/~eric/letkf.

Page 77 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

How to Make Useful for NGA

A wide variety of inverse methods exists for making model-based results more like the real-world system being investigated. To help narrow down the choices, NGA will first have to determine which models (or families of models) will likely be needed for their investigations, and then determine which types of inverse methods are most relevant, what external collaborations are necessary, and what NGA expertise needs to be developed. Because many NGA investigations are likely to be exploratory, seeking plausible outcomes of complex systems, it is likely that probabilistic, ensemble-based approaches for dynamic systems will be particularly relevant. For these investigations, NGA could do the following:

Become savvy users of data products and analysis results produced by inverse methods. Results might include satellite data products, weather forecasts, or hydrologic forecasts produced by domain area experts using relevant inversion methods. In these cases, it will be important to understand how to use the data products, how uncertainties were characterized, and product limitations.
Run or implement inverse methodology developed by outside experts. Inverse methods are available for many mature physical system models (e.g., the Weather Research and Forecasting model⁶ can be run in data assimilation mode). Because the inverse methodology is tailored to the specific model, and is often very involved, it will likely be necessary to partner with experts to refine the model-based predictions and understand the strengths and limitations of the results.

NGA Research and Development Needed

The development and use of formal inverse methods for social system models is a research frontier; advances in this direction have the potential to directly benefit NGA’s model-based investigations. For example, the megacities intelligence question (see Box 1.2) would likely benefit from the development of inverse methods that use observations to constrain the state of financial, health, transportation, and other urban social systems over time. Specific research and development efforts that could prove beneficial for NGA include the following:

Developing research partnerships with appropriate collaborators to develop and carry out inverse methods for social and coupled social–technical or social–physical system models and data;
Facilitating the development of inverse methodology for constraining the plausible states of social system simulation models to be consistent with available data up to the current time; and
Facilitating the development of inverse methodology to integrate the diverse forms of data that NGA uses and collects (e.g., satellite data, sensor data, geospatial data, and open-source data).

Advancing inverse methodology research for social system models will require partnerships with researchers knowledgeable about inverse methodology and social system models. It will also be important for NGA to work in partnership with these researchers to ensure that advances in this field are geared to NGA applications.

Finally, the diversity of NGA-relevant data may pose research challenges for inverse methods. If NGA wishes to develop inverse methodology that integrates a variety of different data sources, then research may be needed to understand how best to adapt existing dynamic, probabilistic (i.e., data assimilation) tools to their needs.

___________________

⁶ See wrf-model.org.

Page 78 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

SPATIAL STATISTICS, DATA MINING, AND MACHINE LEARNING

Like inverse methods, empirically based models and analysis approaches seek to combine models with data to better understand real-world systems. However, with empirical approaches the emphasis shifts to the data, informing empirical models using techniques from statistics, data mining, and machine learning. The underlying empirical model is typically selected for its simplicity, parsimony, flexibility, ability to handle vast amounts and varieties of data, or ability to effectively exploit the available computational architecture. Figure 4.6 compares an inverse method and an empirical method (Kriging) for estimating the space-time history of a process. In this example, the inverse method used a process model to produce a more accurate reconstruction, which also produced more accurate predictions of the process in the future. However, the inverse method took far more time and computation than the empirical method, which did not use a process model at all. With more data, the accuracy of the empirical method would increase. Moreover, if a suitable process model is not available because the system dynamics are unknown, an empirical model would be able to produce an estimate of the process.

Empirical methods combine system observations, empirical models, and algorithms for estimation and prediction. They are mainly in the purview of the overlapping fields of statistics, data mining, and machine learning. Rather than attempt to disentangle these fields, this section discusses classes of problems, methodology, and applications that are likely to be useful to NGA’s modeling endeavor. For example, the megacities intelligence question (see Box 1.2) would likely require empirical methods to discover crime hotspots, increases in unemployment, degradation of neighborhoods, and other urbanization trends that could trigger political, economic, or security problems. The Chinese water transfer questions would likely require models to detect changes in water availability arising from policy decisions on dam building, agriculture, and coal production.

FIGURE 4.6 Comparison of estimates of a synthetic space-time process using an empirical method (combining system observations with an empirical model) and an inverse method (combining systems observations with a physical process model). While the inverse method produces a slightly more accurate reconstruction than the empirical method, it requires far more time and computational effort. SOURCE: Adapted from Higdon (2006).

Page 79 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

Empirical models and methods can be divided into supervised and unsupervised learning approaches. Supervised learning approaches entail a wide variety of regression and classification techniques. In such settings, a large collection of observations is used, each containing features X describing the observation, and an outcome response Y, which can be numeric (for regression problems) or categorical (for classification problems). These methods use the observation pairs (X,Y) to estimate a model that predicts the response Y′ given a new descriptor vector X′. Hence, the resulting “model” is empirically derived. Examples likely to be of interest to NGA include methods for image classification and methods for prediction using spatially or temporally distributed data. In some cases, the structure of the model is prespecified (e.g., linear regression, classification and regression trees, or neural nets), using the data to estimate model parameters. In other cases (e.g., network models or deep learning methods for image classification), the structure of the underlying empirical model is also estimated from the data. In data-rich settings, it is common to assess the accuracy of such regression and classification models by using a “holdout” data set where the predictor inputs X are made available to the algorithm, but the actual outcome response is withheld.

In unsupervised learning, collections of observations are made, detailing information X for each observation. This information X is used (without any outcome or label variable Y) to identify structure, anomalies, changes, and relationships in the collection of observations. Such approaches may find salient correlations in high-dimensional data and connections between data points that humans would not think to look for. Since unsupervised methods do not require training data, they can be especially useful for exploring large, unstructured data sets. One popular unsupervised learning method is cluster analysis, which is used to find hidden patterns or grouping in data. Other popular methods detect anomalies, hotspots, and changes. Successful applications of unsupervised learning include computer vision, speech recognition, robot control, text mining, and fraud analysis (Alpaydin, 2014; Kou et al., 2004; Sebastiani, 2002; Sung and Poggio, 1998).

For NGA, empirical methods that account for temporal dependence, spatial and space-time dependence, and hierarchical structure are of particular relevance. Methodologies to account for these dependence features and structure include time series (Box et al., 2015; West and Harrison, 1999), space and space-time methods (Banerjee et al., 2014; Cressie and Wikle, 2015; Shekhar et al., 2011, 2015; Zhou et al., 2014), and Bayesian hierarchical models (Gelman and Hill, 2006; Gelman et al., 2013). Specialized methods are required for these situations, because most commonly available software assumes data cases are drawn independently and from a common distribution. Blind application of methods that assume data are independent will typically give biased predictions and can grossly understate the uncertainty. Appropriate empirical methods must use the available data to model the spatial, temporal, and hierarchical dependencies in the system, and they require a familiarity and expertise both in the methods being used and the system being investigated.

State of the Art

Empirical methods and software to specify and implement them have been driven by applications. The most common and generally useful traditional methods (e.g., regression and classification) have long been targets for software development and, hence, have mature implementations that run on a wide variety of computational architectures, ranging from a laptop (e.g., R, sci-py, and MATLAB), to a cluster (e.g., SAS), to data-intensive computing machines running the Hadoop environment (e.g., Salford Systems, Mahout, and Spark MLlib). These mature implementations contain assumptions, such as standard input formats (rectangular data files), common error variance, and independence between data cases.

Mature software is less likely to be available for more complex applications, such as those accounting for specialized model structure (e.g., space-time dependence, model changing with spatial location, and hierarchical relationships in the data) and data features (e.g., multiple, disparate data sources; missing data; large amounts of data; streaming data; outliers; and biases that depend on the data source). For some methods, community-supplied user packages in R, python, and MATLAB may fill the need. For example, many popular software packages implement spatial statistical and spatial data-mining methods, such as spatial summarization, object tracking, trajectory

Page 80 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

analysis, hotspot detection, finding spatial outliers, colocations, and location predictions to identify spatial patterns. Methods for spatial summarization include K-Main Routes (Oliver et al., 2014), and methods for hotspot detection include scan statistics and significant ring detection. These approaches are available as R libraries as well as specialized software, such as Crimestat (U.S. Department of Justice) and SaTScan (National Cancer Institute).

On the other hand, the chances of finding appropriate software for a new NGA application are slim. In such cases, methods will have to be developed and tailored for the application at hand. Developing methodology and software is commonly carried out in a high-level programming language, such as R, python, MATLAB, GeoDa (Arizona State University), and STAGE (Joint Warfare Analysis Center). Algorithms for estimation and inference will also have to be developed.

A more recent development in empirical, probabilistic modeling is the emergence of probabilistic programing systems.⁷ These systems allow the user to develop a customized model, and the system then automatically produces the computations required to use the data to estimate model parameters and to produce predictions with the appropriate uncertainty. For example, both JAGs and STAN⁸ allow the user to develop rather general hierarchical and spatial models, without having to design their own Markov chain Monte Carlo scheme to produce inference results. Such emerging software has the potential to speed up and facilitate the development of stylized empirical methods for nonstandard NGA applications, albeit on standard computing architectures. Software development for spatial methods that leverage modern data-intensive computing remains an open research topic.

Spatiotemporal data management and analytics in modern data-intensive computing architectures have become increasingly important to empirical methods (Eldawy and Mokbel, 2015; Wang et al., 2014). Extensive progress has been made on achieving user-friendly access and interaction with spatiotemporal big data; developing innovative analytics for a variety of geospatial applications, such as traffic prediction (Bast et al., 2014) and disease diffusion prediction (Sadilek et al., 2012); and developing novel data-mining methods for complex trajectories and networks based on massive streaming and sensor data (Tang et al., 2012).

The state of the art in empirical models and methods is summarized in Box 4.6.

How to Make Useful for NGA

To be useful to NGA, the rather large body of methodology available from data mining, statistics, and machine learning will have to be aligned to NGA data sources, applications, and computational resources. This alignment might best be explored and guided using pilot studies, likely in partnership with researchers from academia and industry, so that NGA can better understand strengths and weaknesses of different empirical methods for NGA applications. Focusing on (1) methods and approaches, (2) software and computational infrastructure for their implementation, and (3) data processing and curation could help NGA better exploit available methods in conjunction with computational resources and also better understand what future research directions to pursue. In all of these cases, new and continued partnerships with industry, national laboratories, and university research centers with appropriate cyberinfrastructure and security would help ensure that NGA-relevant systems, tools, and software continue to be developed and refined.

Currently available methods and approaches in statistics, machine learning, and data mining are clearly useful for NGA investigations. However, the standard use cases for which most of these methods were designed may not be directly applicable to geospatial data. Bayesian hierarchical models—particularly ones that link spatially connected data, data at different levels of resolution and aggregation, or disparate data sources—seem particularly useful for NGA. For example, poll predictions of FiveThirtyEight,⁹ which combine information from different polls, each with their own spatial coverage and biases, are based on these concepts. Bayesian hierarchical models

___________________

⁷ See probabilistic-programming.org.

⁸ See https://sourceforge.net/projects/mcmc-jags; http://mc-stan.org.

⁹ See http://fivethirtyeight.com/tag/2016-presidential-election.

Page 81 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

BOX 4.6 State of the Art in Empirical Models and Methods

Space and time scales: These are generally determined by the data on which they are based, as well as the goal of the analysis. Empirical approaches are especially well suited for applications with abundant data.
Fidelity: This is generally determined by the data on which they are based. Fidelity can range from low (using highly aggregated and noisy data) to high (using accurately observed, high-resolution data that are sensitive to all of the subsystem processes and spatial detail involved their generation).
Accuracy and precision: This is a function of the data and what is being estimated or predicted. Generally, more aggregated quantities can be estimated more accurately. For example, yearly national carbon emissions might be estimated to within 10 percent, but weekly emissions from a coal plant may only be estimated to within 100 percent. Systematic biases in data and observations are often inherited by the empirical methods that make use of the data.

Predictions and scenarios: Empirical analyses are typically developed to produce predictions and/or scenarios based on past data. How well empirical models and methods will predict in new settings depends on how representative the data used to train the models are to the new scenario.
Uncertainty analysis support: Most supervised empirical analysis methods come with approaches for quantifying uncertainty in the resulting predictions and other inferences. In spatial and space-time settings, estimation of uncertainties is more challenging because autocorrelation and spatial dependence must be accounted for. The quality of estimated uncertainties is often assessed by comparison with past data, when available.
Validation and assessment support: Empirical approaches are data driven; hence the processes of model validation and assessment are typically part of the analysis. Such assessments typically assume that past performance of the model reflects future performance. Model validation and assessment for extrapolative settings remains an open research topic.
Computational requirements: Some approaches can be developed and fit using a laptop, although demand for compute- and/or data-intensive resources is increasing. In some cases, the large volumes of data are preprocessed using large-scale computational resources, prior to more involved empirical analyses. A current challenge is adapting spatial methods to modern data-intensive computing architectures to handle large-scale geospatial data.
Data requirements: Empirical models require data to estimate model parameters and model structure. The properties of data (e.g., size, type, cadence, resolution, and accuracy) influence what empirical modeling methods are likely to be useful for a given scenario. Many models and methods must be adapted to use large amounts of data and data-intensive computing resources.
Difficulty to develop: Many empirical methods are available in high-level software languages, such as R and python, and so development is largely restricted to preparing data. Special considerations, such as large data volume or specialized spatial or temporal structure, typically require stylized models and estimation procedures, and additional time and expertise to develop.
Reuse: Many approaches can be reused in new settings, although the models will have to be retrained using data from the new setting. In addition, a fair bit of effort is often required to clean and preprocess data for empirical models and methods.
Software/code availability: Open-source and licensed software is available for fitting and developing empirical models. More specialized methods require in-house development.
Training support: Textbooks on data mining, statistics, and machine learning are available; and conferences and short courses are plentiful.
Data to simulated results: Once models have been fit to the data, simulation of outcomes or data is generally straightforward. Assessing the quality and accuracy of the simulated results is more involved.

Page 82 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

incorporating output from social or physical system models are less common and require additional development. Other useful methods include (a) clustering and other unsupervised and deep learning methods for finding structure in large volumes of data that are too much to analyze with a human in the loop and (b) methods for detecting change footprints and spatial hotspots and anomalies.

The software and computational infrastructure available for carrying out such empirical analyses may not align well with NGA needs. It is important for NGA to understand the limitations of currently available software (e.g., data volume that can be handled and computational time required for analyses) as well as new software for use on data-intensive computing architectures (e.g., cyberGIS, Hadoop, PY-SPARK, and R-SPARK). Pilot projects and case studies would allow NGA to evaluate software implementations of empirical methodology and determine their applicability to NGA applications and data. Software for spatial statistics and spatial data mining methods (e.g., hotspot detection via SatScan or CrimeStat) could be tested with U.S. Department of Defense and NGA data sets and use cases.

New methodology that leverages data-intensive computational architecture is just coming on line. For problems exhibiting high spatial complexity, emerging tools (e.g., GIS Tools for Hadoop, and SpatialHadoop) and the development of cyberGIS capabilities for exploiting high-performance and cloud computing can facilitate the integration of rich spatiotemporal data, analytics, models, and visualization. These approaches have proven useful for knowledge discovery in a number of domains, including geospatial intelligence, agriculture, hydrology and water resources, coupled physical–social systems, emergency management, geophysics, econometrics, and urban studies (Anselin and Rey, 2012; Wang and Zhu, 2008).

Data processing and curating approaches will be needed to align NGA’s geospatial data and empirical modeling efforts. It will be necessary to experiment with approaches for data selection, cleaning, wrangling, and other preprocessing required to prepare data for these methods. These preprocessing steps, often demanding in their own right, can have a substantial impact on the results. It may also be necessary to develop data curation technology or methods to facilitate searching, handling, extracting, and using NGA data.

NGA-Funded Research and Development Areas

Research needs will likely become more apparent as the universe of potential NGA applications is aligned with available methodology and technology. The special nature and variety of NGA’s data sources (e.g., satellite, sensor, traditional and nontraditional geospatially indexed, social media, open source and classified, and statistical) will drive research needs in supervised and unsupervised learning approaches. Many challenging problems that NGA faces require in-depth integration of diverse domain-specific and geospatial models with spatiotemporal data management and analytics. How to achieve this integration across various computational and spatial scales (Cao et al., 2015; Das Sarma et al., 2012) requires substantial research. Specific research directions that could serve NGA’s future needs include the following:

Developing methodology to combine diverse, disparate data for inference and decision making, including methodology to combine predictions or results of different approaches (e.g., expert judgment, neural net-based classifier);
Developing capabilities for accessing and formatting disparate data in ways that enable analysis products to be generated quickly for decision making;
Advancing unsupervised learning approaches for NGA-specific data and applications;
Developing new parallel formulations of spatial database management systems (e.g., SQL/OGIS standards), spatial statistics, and spatial data-mining tasks on current platforms (e.g., GPU, clusters, HDFS, MapReduce, SPARK) and on future advanced computing and cyberGIS platforms for modeling work;
Developing a testbed of common geospatial intelligence analysis tasks (e.g., detection and anticipation of

Page 83 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

spatial anomalies, hotspots, and patterns of life), proxy data sets (e.g., trajectories, maps, and satellite imagery) of different sizes, and computational intensity metrics (e.g., efficiency and scalability); and
Developing novel spatial and spatiotemporal methods that can take direct advantage of advanced data-intensive computational resources, such as formulating spatial processes as parallel processes that can be mapped naturally to parallel computing architecture (NASEM, 2016b; Tang and Wang, 2009).

The research directions given above should help NGA identify promising empirical analysis methods and then adapt them to NGA investigations. These adaptations may focus on extending methodological approaches designed without considering spatial or temporal autocorrelations. They also leverage new and rapidly evolving data-intensive and high-performance computing capabilities for two key tasks: (1) carrying out scalable geospatial modeling, based on advanced spatial and space-time analyses, and (2) integrating large-scale geospatial databases.

SPATIAL NETWORK ANALYSIS

Human sociocultural activity and the results of that activity are constrained by network dependencies. For example, individuals are influenced by those with whom they interact, organizations are constrained by their transaction networks, and countries are constrained by their alliances. Network analysis models that take into account these dependencies are a core technology for cultural geographic assessment and media analytics, a current area of emphasis for NGA. In addition, NGA’s megacities intelligence question (see Box 1.2) would likely require social network models to analyze agreements and trade networks among organizations that could help or hinder a response to natural, economic, or political disasters (e.g., Zhong et al., 2014). The Chinese water transfer questions would likely require social media analytics to determine which neighborhoods and which groups are likely to strongly resist migration.

Models and methods that take network dependencies into account are referred to as social network analysis, social media analytics, network science, link analysis, dynamic network analysis, or high-dimensional network analysis. They are a flexible class of graphical and statistical models that explain behavior in terms of the relations among entities. Simple social network analysis models focus on people and how they are related, such as through friendship, financial, or collegial ties. Dynamic network analysis and other high-dimensional variants include many classes of entities (e.g., people, organizations, ideas, resources, and locations) and many classes of ties (e.g., communicates with, borders, resides at). Network analysis models support reasoning at multiple levels (e.g., among people, organizations, or countries), and communications assessments for diverse media (e.g., social media analytics). High-dimensional and dynamic network analysis models can also be used to assess (a) change over time or space, (b) the spatiotemporal constraints on human activity within a specific environment, or (c) the effect of a change in constraints across different geographic regions (e.g., Kas et al., 2012; Medina and Hepner, 2011; Van Holt et al., 2012). Finally, social network tools can be used to support course-of-action assessment and, in certain cases, prediction, both of which are important for geospatial intelligence applications. For example, social influence models can be used to predict things like adoption of technology, formation of opinions, and change in belief. However, these models only take spatial factors into account by altering the strength of ties between actors to reflect distance, and by developing different social influence models for different regions.

Increasingly network modeling is being combined with other spatial statistical and machine learning techniques to help analysts reason using content with large high-dimensional network data at different temporal, spatial, and group levels of resolution. Use of machine learning and language technology adds the capability of a learning or training process. The emergence of spatial network analysis (e.g., Figure 4.7), which combines spatial and network reasoning, has led to new techniques, including a method for assessing information loss across different levels of resolution (Olson and Carley, 2008), identification of actor tweet location using social network information, and

Page 84 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

techniques for moving between trail data (e.g., who was where when) and networks (Davis et al., 2008; Merrill et al., 2015).

FIGURE 4.7 Heat map of Afghanistan in which each sector is colored by the average degree of centrality of the actors of interest who were geotagged within this region. The bright red sectors are those where there were more actors who were more connected to other actors. SOURCE: Based on an unpublished study of Afghanistan using data drawn from open-source data by Kathleen M. Carley and members of the CASOS team, Carnegie Mellon University.

State of the Art

In the past, traditional network models were focused on only one or two types of nodes (e.g., just people), a small number of nodes (i.e., less than 30), and a single time period. Such models, and the tools that support them, are the most common use of network modeling today. They can be useful to NGA for examining and reasoning about regional heterogeneity (e.g., models showing which organizations work together in different cities to support disaster response). Modern network analysis models, however, are more comprehensive and may have many types of nodes, have large numbers of nodes, and cover many time periods. These modern models and associated tools are more useful, because they can represent and use a wider range of geospatial data.

Network analysis models can be developed rapidly and are often used to assess open-source data (including social media), human intelligence, social intelligence, and survey and simulation results. Some data are also gathered from subject-matter experts (e.g., which leader is hostile to which). Data from multiple sources can be merged as long as the node IDs are matched. This is important because many sources of network data may have few if any spatial data. For example, few surveys track the location of the actors, only a small fraction of social media data are geotagged, and spatial data are often obscured (e.g., locations are made up or indeterminate) or intentionally hidden (e.g., cyberattacks spoof the IPs). Thus, data cleaning, link inference, and merger of multiple data sources is often needed to infer more of the location information.

While it is well recognized that social networks are spatially embedded, relatively little is known about how

Page 85 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

that embedding constrains and enables behavior (e.g., Barthélemy, 2011). Individuals are more likely to interact with those nearby, even if they have access to social media and the Internet. Population density within a region affects the sense of anomie, and interactions tend to atrophy as distance increases. However, there is no comprehensive source of all such findings.

Active areas of network modeling research of particular relevant to NGA include the following:

High-dimensional and dynamic network analysis, with particular attention to n-mode clustering techniques;
New scalable routines and or approximation techniques for large dense networks, particularly when they take into account spatial relations;
Linking social networks with other networks and/or node attributes (e.g., spatial network analysis, where the nodes have connections to each other and positions on maps); and
Assessing the robustness of measures for filling in missing data, and the inference of missing links (e.g., using temporal and spatial dependencies to infer social interactions, and using social interactions to infer geographic location).

Most metrics for assessing network models and community detection algorithms have been optimized for large, scarce data sets, and are quite scalable. These are useful when, for example, assessing the communication networks in social media in different countries. Metrics and community detection algorithms for dense networks, such as shared hashtag networks and high-dimensional networks, are still in their infancy and tend to scale poorly. In addition, specialized visualization tools and metrics for large-scale social media data, particularly Twitter, as well as new techniques for creating networks from streaming data are starting to appear (e.g., Hannigan et al., 2013).

The vast majority of network metrics and algorithms are focused on finding critical nodes, critical links, and groups; characterizing the topology; comparing and contrasting networks; and identifying change over time or predicting one network from other information or prior networks. However, data are not always sufficient to carry out these tasks. For example, metrics designed to identify critical nodes are sensitive to missing data, and in some situations, as little as 25 percent missing data has led to an 80 percent reduction in accuracy (Borgatti et al., 2006; Frantz et al., 2009). Few tools provide support for uncertainty assessment. Data collection strategies and the technologies that generate data can also create biases that influence what can be identified using network techniques. For example, snowball sampling from a single source (as might be done with cell phone data) can overemphasize the importance of the phone owners. The importance of an original tweeter can be overemphasized, because Twitter links both the tweet and any retweets to the original tweeter.

Spatial network measures and techniques are beginning to emerge. The capability to visualize networks on maps is well established (e.g., Olson and Carley, 2009) and toolkits are now available in three packages (Palantir, ArcGIS, and ORA). Many social media and cyberattack collection tools also show some network data on maps, but those tools are proprietary and are not used for analysis. Measures and techniques designed for spatial networks include “spatial” centrality measures, guidance for using links (e.g., when to use link inversion) when representing spatial distance, and network-based measures for assessing autocorrelation (variants of Moran’s I and Geary’s C). New bipartite spectral algorithms for clustering spatial and nonspatial data simultaneously are in the experimental stage. Tools for engaging in spatial network analysis are often built from scratch by researchers. Some tools are available in ORA and R, but ArcGIS offers only the most primitive of basic network metrics.

Analysts can now use suites of interoperable tools in a data-to-model workflow to conduct rapid ethnographic analyses and understand the lay of the land. Figure 4.8 shows a typical workflow in which the technologies used are interoperable, with the output from one modeling or analysis tool feeding into the next. An interoperable tool suite, rather than an integrated system, is used because of the rapid rate of change in the application program interfaces, the evolution of language technology, and the need to interface with data and reporting tools on different platforms or networks. Many of these workflows are instantiated in frameworks like Ozone or, more commonly, through the use of python scripts.

Page 86 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

FIGURE 4.8 Illustrative tool chain for network modeling. Data are collected from various sources, cleaned to remove nonrelevant information, and then information is extracted for the analysis.

In this workflow, data are collected from various sources. For example, special technologies may be used to download news articles and social media data. The key challenge is designing the search strategy so that the information of interest is part of the extracted data stream, the data stream does not contain such a high volume of irrelevant information that it overwhelms the analysis tools, and the timeframe for the collected data is appropriate. Special training and experience are generally needed to develop such queries. Next, the data are cleaned to remove data that are not relevant (e.g., tweets by bots) and often fused. Specialized tools are used for bot detection, deduplication, and topic identification. Finally, information needed for the analysis is extracted from the raw text data and metadata. Advanced language technology models are used to tokenize the raw text and then remove unique or low information concepts, and identify bigrams, actors, and locations. Sentiment analyzers may also be run and the sentiment attached to the tweet or the tweeter. The use of parallel processing and semiautomated tools such as NetMapper and ORA have reduced the time it takes to carry out these steps to as little as two weeks (Carley et al., 2012a).

The current state of the art in spatial network analysis is summarized in Box 4.7.

How to Make Useful for NGA

Network analysis models are likely to be useful when NGA is concerned with how patterns of relations affect behavior, whether that behavior is at the individual, group, organization, or state level. These models are also useful for characterizing and reasoning about various types of networks, such as transportation, communication, and infrastructure networks. Because the basic methods are already so developed, NGA could quickly increase

Page 87 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

BOX 4.7 State of the Art in Spatial Network Analysis

Space and time scales: Most network models do not consider space or time, although spatial and network regression techniques are used with temporal data. A few tools provide support for spatial and temporal networks. In these cases, the scale of the data determines the scales used in the models because the methodologies can support any space or time scale.
Fidelity: Network models are empirically driven and typically have reasonably high fidelity. However, the model fidelity depends on the data. For an investigation focused on interaction, for example, the model fidelity is higher if phone call data are used and lower if data on co-presence at events are used.
Accuracy and precision: These depend on the data used to build the model, the amount of missing data, and the level of inference needed to place actors or resources in specific situations.
Predictions and scenarios: Network models are often part of a scenario description. What-if analysis using comparative statistics, where things are moved, locations barred, or nodes added or removed are used for prediction. Temporal trends in spatial or network data, and certain metrics in network models (e.g., cognitive demand for predicting emergent leaders) are also used for prediction.
Uncertainty analysis support: This is difficult, with existing tools providing limited technical support.
Validation and assessment support: All existing metrics in commercial off-the-shelf software have been validated. A few new metrics in open-source software have not been validated. It may be possible to use bootstrapping techniques to assess the robustness of model results against missing data.
Data requirements: Network data are increasingly available from social and printed media, sensors, and surveys. Only a few tools (e.g., ORA, some of the i-graph routines in R) support analysis of networks with more than 100,000 nodes and their spatial relations. Many of the more interesting measures cannot be parallelized to run in a MapReduce framework. Approximation and incremental techniques are being developed. Processing the large data is often made possible by using the visualization cards in desktop machines. As the number of nodes (network size), the fraction of possible edges that are nonzero (density), and the number of networks increase, memory limits may be reached and disk processing may be too slow. In those cases, analysis is done in a distributed fashion or by using scripting techniques in batch mode or with special hardware. For most modern machines, these limits tend to be reached when there are more than 10⁷ nodes, or in large networks with densities greater than 0.4, or for more than 1,000 large networks.
Difficulty to develop: Low. Network models can be developed in minutes to hours. Spatial network models take longer. Increasingly, text-mining techniques are used to build network models from open-source texts.
Reuse: Modeling technology has unlimited reuse. Specific instantiated models are often reused.
Software/code availability: Multiple commercial off-the-shelf toolkits are available and some open-source tools also exist. A few tools have limited support for both network and spatial data.
Training support: Textbooks are available on developing and accessing network models, collecting data to support them, and interpreting visualizations. Many conferences, special short courses, and train-the-trainer programs are run on a regular basis.
Data-to-simulated results: Many network tools are designed to work in data-to-simulation workflows. Simple diffusion or robustness simulations are part of some network modeling tools.

its modeling capability in network analysis (a) by having staff with strong spatial network competence and (b) by obtaining spatial network data to use for training and for developing more advanced methods.

Few analysts have strong skills in network or spatial network analysis. If they are trained in GIS capabilities, they likely have little understanding of the strengths and limitations of network methods, and may use only the most basic measures and make mistakes (e.g., applying measures to multimode data without separating the modes). If they are trained in social networks or network science, they likely have little understanding of the strengths, limitations, and advances in spatial analytics. In this case, they may limit themselves to creating and comparing different networks for different regions and may not consider autocorrelation issues.

Consequently, developing expertise and skills in network analysis models, particularly in dynamic network

Page 88 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

analysis and high-dimensional network analytics, would likely be helpful. NGA modelers with this expertise could help integrate information from different sources, assess social media, and examine issues of resiliency, power, and influence, which are valuable for assessing regional differences. Developing expertise and skills in spatial network analysis would support assessments of the impact of logistical constraints, spatial environment, and spatial autocorrelation on social behavior. Skills in using existing tools to place networks on maps, assess the spatial distribution of ties, and identify the geographic span of key network members would also be valuable.

In addition to hiring staff with the necessary knowledge and skills, NGA could develop in-house expertise. Network and spatial network analytic and visualization capabilities could be taught at both an introductory and a more advanced level in the NGA College. In addition, NGA could send more personnel to university-based programs that train the trainer in network analytics for social media, high-dimensional network data, and spatial network analysis. Most training programs conducted as executive education or conference seminars are focused on traditional network analysis and do not cover spatial or temporal issues. Thus, NGA analysts may need to go both to sessions on traditional network analytics and to sessions on spatiotemporal network analytics.

Although simple spatial network data sets (e.g., air traffic flow) are available through Carnegie Mellon University, developing more relevant data sets would be invaluable for training. In addition, applications of spatial network models could be improved or refined using a number of different methods for analyzing and fusing data. For NGA, tools to search and fuse classified and open-source data would increase the speed with which these modeling technologies could be applied in new settings. For example, a simple but valuable technology would add spatial coordinates based on place names in a format that could be imported into network tools.

NGA-Funded Research and Development Areas

Network analysis models would be useful for a wide range of NGA applications. However, the current techniques have three key limitations. First, the availability of spatiotemporal network data is growing, but most methods cannot cope with space, time, and groups simultaneously. Second, many of the existing technologies support description more than prediction. Third, spatially tagged data are sparse, and even when the data exist, theories about geographic constraints on networks and network constraints on geography are fairly weak. NGA-funded research and development that could help overcome these limitations include the following:

Developing joint spatiotemporal-network techniques for assessing and visualizing the relations among the entities and activities of interest in high-dimensional data (e.g., autocorrelation visualization techniques);
Developing joint spatial network models to predict the spread of information, technology, and activities; the engagement of entities in activities of interest in regions of interest, given existing social networks; and the development of networks and activities of interest, given geographic constraints such as transportation and communication barriers;
Developing spatiotemporal-network-based sentiment-mining techniques for informing predictive spatial network models;
Developing data sets with multiple levels of spatial and social network data for use in developing theories and testing metrics;
Commissioning review papers that build a compendium of findings about the relation between spatial factors and network factors; and
Improving training in joint spatial network analytics and visualization.

Joint spatiotemporal-network techniques are needed because spatial statistical models and methods are limited to isotropic Euclidean spaces, which are inadequate both for spatiotemporal data (e.g., trajectories) and for spatial network data (e.g., crime reports in urban areas). In addition, network clustering and prediction techniques rarely take spatial or temporal distances into account. Specific research areas include (a) accounting for spatial

Page 89 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

autocorrelation or coverage when assessing centrality (e.g., actors that have more ties in more locations may be more critical than those with more ties in one location), (b) assessing when network structures migrate between spatial regions (e.g., as can happen when a gang moves to a new city but retains their network structure), and (c) developing techniques for assessing networks where the ties represent physical distance and for using spatiotemporal information to help identify covert networks. A good starting point would be to survey past uses of network analysis models that address dynamic spatial situations, such as air traffic monitoring or hostility networks.

Spatial network models that predict the spread of ideas, beliefs, activities, and technologies are needed because existing diffusion models in spatial analytics and network analytics are not integrateable, are unlikely to yield the same predictions, and have insufficient predictive capability. Research on spatial-network diffusion models, particularly reusable models, would improve forecasts of changes in human geography.

Extracting sentiment for large geolocated groups will be challenging. Sentiment miners currently characterize the sentiment in a document by the number of positive and negative words, usually English. However, the results can be misleading, because they do not provide guidance on what the sentiment is directed toward, and they do not account for negation or sarcasm. NGA would gain the greatest value by developing new sentiment assessment techniques that refine the sentiment based on the geographical and network features of the community, such as the confluence of information from time zone, location, local events, language, structure of local groups, and tendency to self-identify in networks through linguistic cues.

SUMMARY AND CONCLUSIONS

NGA will need a wide variety of models and analytical methods to improve its geospatial intelligence capabilities. For example, models and methods needed to answer the megacities intelligence question provided by NGA (How will worldwide urbanization trends affect regional political, economic, and security environments?) include the following:

Physical process models of environmental changes that could stress urban populations, such as sea-level rise and increases in summer temperatures;
Social system models to determine what changes in social systems (e.g., financial, cultural, ethnic, religious, and health) may trigger political, economic, or security problems across different geoterrains;
Coupled physical process–social system models to examine how an urban population may respond to an environmental stressor, such as a heat wave, water shortage, vector-borne disease, or air pollution;
Inverse methods that use observations to constrain the state of financial, health, transportation, and other urban social systems over time;
Empirical methods to discover crime hotspots, increases in unemployment, degradation of neighborhoods, and other urbanization trends that could trigger political, economic, or security problems; and
Social network models to analyze agreements and trade networks among organizations that could help or hinder a response to natural, economic, or political disasters.

Models and methods needed to answer the Chinese water transfer intelligence questions (How do agriculture and energy production and consumption change over time? How and where will populations, including rural communities, shift?) include the following:

A large-scale physical process model of the hydrologic system in China to predict surface flow, subsurface flow, and abundance of water under different water diversion scenarios;
Inverse methods to estimate or constrain key model parameters of a large-scale hydrologic model (e.g., spatially varying permeability, flow rates, and evaporation) to produce plausible predictions of water availability as a function of location throughout China;

Page 90 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

Integrated assessment models to examine the complex interactions among water, agriculture, and energy production and consumption in China;
Empirical models to detect changes in water availability arising from policy decisions on dam building, agriculture, and coal production;
Social system model scenarios of how affected populations are likely to respond to dam construction and involuntary migration; and
Social media analytics to determine which neighborhoods and which groups are likely to strongly resist migration.

Steps NGA can take to develop the sophisticated modeling and analysis capability needed to address these types of questions are summarized below.

Develop or Adapt Models or Methods at NGA

Data-driven models and analysis methods are amenable to near-term development, because NGA analysts already have some relevant knowledge and experience, the methodology is established, and software and training support are available. In particular, NGA’s experience with spatial and temporal analysis provides a foundation for developing or adapting spatial statistics, data mining, and machine learning methods. Methods that are especially promising for NGA include (a) Bayesian hierarchical models that link spatially connected data, data at different levels of resolution and aggregation, or disparate data sources; (b) clustering and other unsupervised and deep learning methods for finding structure in large volumes of data that are too much to analyze with a human in the loop; and (c) methods for detecting change footprints and spatial hotspots and anomalies. In addition, NGA’s growing emphasis in human geography provides a foundation for developing network analysis models to examine how patterns of relations affect behavior.

For both types of analyses, the basic methods are well established, and software and user support (e.g., textbooks, conferences, and special short courses) are readily available. However, some additional development and training are required to adapt these methods for geospatial data and NGA use cases. In addition, software and algorithms for data-intensive computing will likely have to be developed for spatial statistics and spatial data-mining methods. Training in network or spatial network analysis could be offered at the NGA College or obtained from university-based programs.

Collaborate with Outside Experts

NGA will need partners to help develop, adapt, and use more sophisticated models and methods (e.g., process models, coupled models, agent-based models, inverse methods, and spatial network models) as well as geospatial models that are not well supported by cutting-edge computational infrastructure. A substantial part of any group’s capability in sophisticated modeling is learned through partnerships, apprenticeships, and collaborations. Such collaboration could take many forms, including being a partner in the team developing a model or extending its use to other applications, a user of a team’s model or method, or a user of the resulting data products. Regardless, NGA will need to identify domain experts who can design models or scenarios relevant to NGA, run the model, interpret the results, or help NGA find useful existing model output. To use these models or model results effectively, NGA will need to understand their strengths and limitations for the geospatial intelligence task at hand.

Finding partners for NGA modeling efforts will not be trivial because of the classified nature of the work, the wide and changing variety of experts needed, and the need to nurture long-term relationships. Models of complex systems are typically developed by multidisciplinary teams with in-depth knowledge and experience in the scientific disciplines and computational capabilities relevant for the task at hand. However, bringing together diverse experts, who would learn from each other in the context of NGA’s priorities, could contribute to major

Page 91 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

breakthroughs in NGA-relevant problems. Major research universities, as well as organizations for which NGA has established relationships (e.g., defense and intelligence agencies, national laboratories, private-sector contractors, and NGA centers of academic excellence in geospatial science) may be a starting point for finding experts and modeling teams for NGA modeling efforts.

Fund Research and Development

Investments in research and development could strengthen NGA’s modeling capabilities in the years ahead. Where to focus these investments depends on what models and analysis methods are proving most useful for geospatial intelligence. Potential research areas are summarized below.

Extending the use of models to NGA-relevant situations. NGA investigations will likely make new demands on models, using them in settings for which they have not been originally designed, or at least not thoroughly tested. Examples include precise, near-real-time wind, wave, or weather predictions to support troop deployment, disaster relief, or dispersion and damage estimates from the release of hazardous materials in urban environments. In addition, such models may need to be combined with social system models to help decision makers prepare for social unrest, disruption, or migration. Substantial research is required to adapt physical process models, social system models, and combined physical–social system models to deal more reliably with these less common settings.

Improving understanding of human behavior. Social system models are only beginning to surpass expert judgment. Advancing their development, and the development of combined physical–social system models, requires fundamental research to improve understanding of human behavior. Promising areas of research include studies aimed at understanding how human behavior is constrained or enabled by the geography of the natural and built environment, including how geographic factors influence the development of social networks and communications among actors, and how cognitive biases influence the perception of space.

Speeding model development, testing, and run time. Intelligence questions are often time sensitive, and so research advances that speed up model development, testing, or run time could prove beneficial to NGA. Model development could be sped up through research aimed at facilitating the combination of existing subsystem models for NGA investigations. Model development and run time could be decreased through research and development of accurate, reduced-order models or emulators that effectively reproduce results of computationally intensive models. Developing simulation testbeds could aid all of these efforts and also facilitate assessments of model accuracy and speed.

Methodological research and development tailored to NGA-relevant models. The models developed or adapted for NGA purposes will have to be accompanied by customized methods that combine these models with data. Methodology for inversion, exploration of plausible outcomes or scenarios, quantification of prediction uncertainties, and model assessment will be required to bring model-based results more in line with available measurements. Such methodology is particularly needed for social system and physical–social system models. Possible directions in this area include development of inverse methods for constraining the plausible states of social system models to be consistent with data, and development of methods for formal verification and validation of model results against NGA-relevant benchmarks and test cases. In addition, research on how to adapt existing inverse methods to integrate the diverse forms of data that NGA collects and uses (e.g., satellite, sensor, geospatial, and open source) would be beneficial for all types of models.

Methodological research and development tailored to NGA data sources and needs. Research could facilitate the development of empirical methodology that is tailored to the data used by NGA. Examples include developing

Page 92 Cite

Suggested Citation:"4 Models and Methods Relevant to NGA." National Academies of Sciences, Engineering, and Medicine. 2016. From Maps to Models: Augmenting the Nation's Geospatial Intelligence Capabilities. Washington, DC: The National Academies Press. doi: 10.17226/23650.

×

methods to combine disparate data or results from different approaches and more accurately represent their uncertainty to support inference and decision making, and methods to cope with data that have spatial, temporal, and network components (e.g., to assess the activities of a terrorist cell over time). A related need is for sentiment-mining techniques that characterize the sentiment in a document based on the geographical and network features of the community, such as location, local events, language, structure of local groups, and tendency to self-identify in networks. Research is also needed to develop spatial and spatiotemporal methods that use advanced data-intensive computational resources, such as formulating spatial processes as parallel processes that can be mapped naturally to parallel computing architecture.