The various approaches employed in land change models (LCMs) have emerged from multiple disciplinary traditions, and future progress in improving LCMs will likewise draw on multidisciplinary developments. At this point in the development of LCMs, as a category of models that bridges and couples among the dynamics and processes of social systems and natural systems, the diversity of model types has served a number of scientific and practical goals. The growing demands on these models within the science of sustainability and for integration with environmental models require that the community of model builders and users take advantage of a number of opportunities to advance both the theoretical and empirical grounding of models. We identify opportunities in modeling that derive from the potential for better models, better use of data and cyberinfrastructure, better community infrastructure for LCM developers and users, and better use of best practices in model evaluation.
Advancement of Process-Based Models
The approaches on the pattern end of the pattern-process spectrum will continue to provide useful service in a number of scientific and practical settings, as outlined in Chapter 2. However, a number of new applications of models involving developing and evaluating innovative land-based policies, exemplified, for example, by payments for ecosystem services (PES) and REDD+ strategies for curbing deforestation and land degradation while providing income for forest-dependent communities, demand a stronger process (i.e. theoretical) basis for
LCMs. The theory, data, and methods needed for developing process-based models are arguably less well developed than those needed for models based on patterns. Modeling approaches are required that can be used to evaluate how these policies will influence human behavior and in turn affect land cover and human well-being (Nelson et al., 2009). For example, PES schemes pose a challenge for modeling because they alter the economic incentives to influence current land use behavior and thus land trajectories. Also, evaluations of REDD+ policies require an estimation of baseline emissions, e.g., the amount of deforestation that would happen in the absence of a new policy. While these baselines have tended to use machine learning and statistical models (i.e., data-based models) to project past rates of deforestation, process-based baselines could include demographic and economic changes that might affect future deforestation (Huettner et al. 2009). Additionally, evaluations of new policies require models that represent behavioral responses to new incentives or constraints, both of which require some understanding and representation of process. Because such policies tend to address only a few of a multitude of interconnected outcomes (e.g., numerous ecosystem services and human well-being factors), better process representation is needed to evaluate possible unintended consequences from such policies.
Because they account for the underlying decision-making processes of agents that determine land change outcomes, process-based models can be used to advance understanding of how human actors respond to changing environmental, economic, or policy conditions and to simulate policy scenarios of the impacts of a hypothetical policy change on land use outcomes. Process representations are particularly important when modeling complexity in land-change processes in which feedbacks can arise from interactions that are both within and between the socioeconomic and biophysical systems.
Despite meaningful advances, more work is needed to further develop process-based models that are consistent with theory, empirically verifiable, and useful for policy. Many efforts to date have focused on one or at most two of these goals, and fewer if any have accomplished all three goals. For example, structural econometric models are derived from economic theory and estimated using real-world data, but they are largely static and can only incorporate limited forms of spatial heterogeneity. Agent-based models are often specified using real-world data, can incorporate many more forms of agent and spatial heterogeneity, and are designed to step through time, but they are often ad hoc in their representation of market and other mechanisms and often lack the empirical or theoretical grounding for some of the assumptions that are necessary to operationalize a given model. As outlined in Chapter 2, it is possible to develop spatial equilibrium economic models that incorporate some form of dynamics and to develop agent-based models that are consistent with microeconomic foundations. In some ways, these different approaches to process-based modeling are converging and there are additional gains to be had from continuing to work toward narrowing the gap. For example, because of their added flexibility, agent-based models can
be useful in testing the maintained assumptions of economic structural models by comparing model predictions from long-run spatial equilibrium with a short-run constrained equilibrium subject to additional constraints such as incomplete information or borrowing constraints. The process of reproducing the results of analytical models with a computational model is sometimes referred to as “docking” and has been shown to be a useful way to build agent-based models that relax assumptions while building on solid theoretical principles (Brown et al., 2004).
By representing agents’ behaviors and their behavioral responses to policy, process models permit researchers to generate and compare predictions of land changes under baseline and alternative policy scenarios. The quality of such scenarios and predictions is limited by the maintained assumptions and process details in the models. Research on cognitive processes demonstrates substantial heterogeneity among agents in terms of their formation of values, preferences, attitudes, and norms and how these preferences are modified by environmental change (Meyfroidt, 2012). Additionally, theory and empirical research on forward-looking behavior underscores the importance of accounting for heterogeneous expectations over future outcomes that influence agents’ decisions in the current period (Irwin and Wrenn, 2013). Incorporating these new theoretical insights is critical for improving the structural validity and predictive capability of LCMs, and requires improved model formulations and better data on individual agents and their decision-making processes over time and at spatial scales commensurate with the individual agents. We address the issue of data availability in the section on opportunities in observation.
Cross-Scale Integration of Land Change Models
A major goal of the environmental science community is to develop a predictive and process understanding of the interactions of land change dynamics with climate; ecosystem biodiversity; and the cycling of water, carbon, and nutrients. The need for this understanding is manifested at scales ranging from parcels to the globe and is a central element connecting a set of scientific and policy groups’ recommendations such as the Grand Challenges in Environmental Science by the National Research Council (NRC, 2001). This challenge emphasizes both research to elucidate the primary feedbacks between socioeconomic, geophysical, and ecosystem processes critical to the coupling of land, water, and ecosystem change, and the ability to reconstruct and forecast historical and future scenario trajectories. To make advancements on these goals, two types of model coupling are required: coupling of LCMs at multiple scales, and coupling LCMs with other types of models. This subsection addresses the former; the next subsection addresses the latter.
Globalization is having the effect of coupling global-with local-scale drivers of land change, and land use decisions are increasingly driven by factors in dis-
tant markets in addition to local-scale factors (Erb et al., 2009; Seto et al., 2012). There is a growing separation between the locations of production and consumption of land-based commodities, including carbon stocks. Consumers outsource their land use to other regions or countries and a virtual land trade develops. In addition, land use is affected by remittances sent by migrants, the specific organization of global commodity value chains, channels of foreign investments in land, the transfer of market or technological information to producers via a diversity of networks (from farmer associations to Internet and cell phones), and the development and promotion of niche commodities that target narrow but wealthy market segments with high-value commodities produced in limited quantities. Modeling such teleconnections (or telecoupling) and network interactions is a major challenge that requires analytical methods that link multiregion input-output models with regional and local-scale models of land-based decision making (Würtenberger et al., 2006) and representations of social networks.
While the sector-based approaches outlined in Chapter 2 provide a framework for modeling interregional flows of capital, materials, and people, they represent entire sectors within entire regions as single representative agents. Such representations preclude incorporating an understanding of how heterogeneous decision-making strategies affect demand and supply of products and inputs (including land), and how interactions among actors within a given sector or region might produce particular patterns of production or consumption (Rounsevell et al., 2012). For example, the emergence of a cluster of activity within a region, like that which occurred in the information technology industry within the Silicon Valley of California and manufacturing in Shenzhen, China, can produce efficiencies and increasing returns to investment and create demands for land that are not represented or in quantities not reflected in aggregate models (Arthur, 1994). The spatially disaggregated economic and agent-based models provide a means to represent heterogeneity and interactions, but they have not yet been developed at scales that permit representation of global-scale flows. Although it is possible that such models could be developed, parameterized, and simulated at global scales, it is also possible that a scaffolding of modeling approaches, which specifies which models are used to pass different kinds of information among different scales of representation, will be a more efficient and effective for representing global-to-local interactions. Possible directions include either combining the aggregate and finer-scale models to link feedbacks and interactions at the finer scale within the context of global-scale flows, or using experiments at the fine scale to evaluate nonlinear dynamics that emerge and represent important sensitivities of results at the coarser scale.
Cross-Scale Integration of LCMs with Other Earth System Models
For years, models of a variety of environmental processes have taken land cover and land use as inputs to condition model parameters or set internal fluxes
or states, and they generate information that may in turn condition land management decisions and feed back to land change. Examples include models of Earth surface processes like exchange of water, energy, carbon, and nutrients with the atmosphere that are critical to weather and climate prediction (e.g., Bondeau et al. 2007; Lawrence et al. 2011); watershed models that generate flow, nutrients, and sediment to receiving water bodies (e.g., Ray et al. 2010; Bulygina et al. 2012); and ecological patch dynamics of growth, succession, and disturbance (e.g., Desai et al. 2007; Thompson et al. 2011). By coupling these models with LCMs, the ability to predict and understand the direct and indirect effects of land management decisions and policies on the trade-offs at short to long time scales on ecosystem services (e.g., food and fiber production, water regulation, maintenance of biodiversity, and carbon storage (Lapola et al., 2010; Nelson et al., 2009; Wiley et al., 2010) is improved. In the short term, or in areas not experiencing significant change, land change is typically considered a static model input and a coupling with dynamic LCMs is not required. However, interest in long-term forecasting requires some ability to couple models to represent feedbacks between environmental and land change dynamics. The ability to set up and carefully specify different scenarios facilitates the development of verifiable models that provide utility to policy makers and decision makers.
From a systems perspective, the degree and completeness of coupling of environment and land change processes needed within a model (or set of linked models) is dependent on which processes are included as endogenous dynamics, and which are prescribed as exogenous drivers and boundary conditions. More comprehensive models, representing more endogenous processes, may be required to evaluate and forecast trade-offs and interactions between different ecosystem services and land change, both in the short term and over the multidecadal time scales envisioned in climate change mitigation and adaptation. Examples of situations where land change produces trade-offs include those between carbon sequestration and freshwater supply from land conversion to plantation forestry and natural regrowth (e.g., Farley et al., 2005) and between low-density zoning for watershed protection and septic-derived nitrogen loading (e.g., Shields et al., 2008). Current LCMs use a range of simple to complex methods to estimate biophysical outcomes or consequences of land change, such that choices about model parsimony, comprehensiveness, and complexity affect the richness of environmental information that can be provided for decision makers. In any coupled model, a balance of the different components in terms of degree of complexity and data demands is preferred to promote representation of critical feedbacks.
The research and operational environmental models which use land use and land cover (LULC) as one-way inputs to determine model parameters or set internal fluxes or states are numerous. LULC are typically used as categorical variables with class-specific attributes that are used to generate model parameters. This is done either by assignment from look-up tables by LULC category or by
specifying class-specific equations that may use additional ancillary information (e.g., remote sensing radiances). Well-known examples include the use of LULC to assign or calculate properties including surface albedo, impervious area, or leaf area index (LAI). Human behavior (e.g., irrigation, fertilization, harvesting, and conservation practices) may also be set by land classes, or it can be attributed separately using associated demographic and economic information. The distinction, and confusion, between land use and land cover as inputs to these models is important and represents an area where data advances, described below, can advance the effectiveness of models. While land-cover inputs to biophysical models are common, the dynamic coupling of LCMs to these model is less so.
Opportunities for environmental processes to provide one-way inputs to LCMs also exist but have been less commonly employed. At the scale of individual land patches, a set of land cover classes will be developed within an ecological successional trajectory following disturbance or other forms of land conversion (e.g., agricultural abandonment, timber harvest or fire). These functions may use simple, rule-based succession trajectories (e.g., through Markov chains). However, there is significant potential to better couple process-based growth and succession models to represent both the processes of environmental change and feedbacks with land use and management. A set of models ranging from growth and yield curves through complex community ecological and biogeochemical models (e.g., Biome-BGC, CENTURY, ED, LANDIS) were designed to simulate ecosystem dynamics within prescribed species or life-form classes, or to simulate ecological population dynamics. Modeled changes in either ecological patch attributes such as standing biomass, water and nutrient availability, or habitat quality can then be used to condition land conversion processes. These models can also be sensitive to local edaphic or microclimate conditions, such that patch-specific trajectories can vary in space and time rather than following a domain-wide set of common, prescribed rules. As such, the heterogeneous information conditioning patch-or parcel-scale conversion can influence more spatially variable and environmentally coupled LCMs.
A more complete analysis of interactions between altered biogeochemical, hydrologic, and other ecosystem services and land change would incorporate two-way coupling through feedbacks. These feedbacks could be realized through altered supply and demand of critical ecosystem resources (e.g., water, food, and carbon credits) or from regulatory response to degraded air or water quality. While more comprehensive models would have the advantage of endogenizing these feedbacks, current understanding and ability to represent these interactions within coupled models are subject to high uncertainty, especially over longer time scales. Full two-way coupled models, in which land and environmental systems coevolve, are beginning to emerge. Schaldach and Priess (2008) reviewed a set of models that have been used to endogenize and couple environmental and land change processes. Model structures range from loose coupling with information passing between separate models (e.g. Claessens et al., 2009) to more tightly
coupled models in which common variables are processed by different modules or through unified equation sets, providing close feedback between environmental and socioeconomic processes associated with land change dynamics. Maintaining dynamics through the full system, rather than prescribing either the land change or environmental components of a coupled model, necessarily increases model comprehensiveness and complexity, and so a challenge is to balance and manage code complexity through a combination of prioritizing process and feedback selection and simplifying the component models, and by more sophisticated informatics. Other challenges include the need to determine how an output from one model relates to the input of another, the lack of standard scales in coupling (and associated aggregation/disaggregation problems), and the assessment and management of uncertainty from one model in another (e.g. Pijanowski et al. 2011).
One approach to implementing full coupling between LCMs and environmental process models would be through operationalizing the conceptual model developed by the Integrative Science for Society and Environment (ISSE) (Collins et al., 2008), which links human outcomes and activities with ecosystem functions through the identification and management of ecosystem services. As an example, a set of land models (e.g., the Patuxent Land Model, Everglades Land Model) combined land valuation and conversion over different uses, with simulation modules for coupled water, carbon, and nutrient cycling. Land cover class has a first-order impact on ecosystems by setting model parameters. Because land valuation can be subject to simulated ecosystem components, such as the productivity of agricultural land, a local feedback on decision making is developed at the parcel level, such that less productive agricultural land will have a greater probability of being developed. Extension of these models to quantify values of ecosystem services, and measures of human well-being (e.g., Costanza, 2000) developed at landscape to regional levels as a result of the coupled dynamics, could serve as a basis to complete the major feedback look in the ISSE conceptual model.
A number of LCMs can include this form of local feedback of ecosystem to parcel-level decisions on land conversion and, by extension, influence neighborhood-scale patterns. Larger-scale feedbacks of ecosystem processes to socioeconomic decision making can occur from runoff quantity and quality, which can initiate institutional responses and regulatory constraints on development and economic activity, or from land surface–atmosphere exchange through simulated emissions of heat, vapor, greenhouse gases, and other pollutants. This type of feedback would be a cumulative one derived from aggregated ecosystem patch behavior at larger regional to global levels. Such a feedback would also need to be endogenized within the model at these larger scales through representation of institutional actors that respond to observed changes by constraining land management, including land conversion and economic activities. To represent the appropriate spatial units at every scale and across different kinds of processes,
one option is to use a “class containment hierarchy” in which fine-resolution land patches are explicitly connected and progressively contained and linked within larger-scale units (e.g., hillslopes, subcatchments), defined as connected component regions that maintain class-specific common attributes and process dynamics. Hierarchical frameworks linking processes over multiple scales can be used to resolve fine-to larger-scale interactions.
Bridging LCMs with Optimization and Design-Based Approaches
The land change models reviewed in Chapter 2 are described as positive models that seek to explain and predict changes in land use and land cover using either a process-based or a pattern-based modeling approach. In contrast, policy-makers can also benefit from a normative evaluation of these predicted outcomes: Given a choice among a set of possible policies or designs, which policy will generate a landscape pattern that is “best” in some sense for society? Various evaluation methods have been developed in urban planning, geography, economics, and related disciplines to assess alternative land use or land cover outcomes. The challenge is to develop and use LCMs at scales relevant to design and to connect design to patterns on the landscape, and to use optimization approaches together with the LCM approaches we describe in Chapter 2. For example, one study linked an optimization model with a cellular automaton simulation to generate future projections that could be optimized for and evaluated on how well they met specific planning objectives (Ward et al. 2003). Other work has used optimization approaches based on LCMs coupled with watershed models to help in identifying and locating land uses that reduce watershed impacts (Tang et al. 2005; Maringanti et al. 2009). An important challenge is that the optimization approaches widely used in spatial sciences, from land use and ecosystem service planning (e.g., Polasky et al., 2008; Roetter et al., 2005; Seppelt and Voinov, 2002; Stewart et al., 2004) to site selection for businesses (Church and Murray, 2009), are extremely intensive computationally.
Marxan, a piece of widely used conservation planning software, provides a good example of a normative approach based on optimization to evaluating land use alternatives. This software is designed to solve complex conservation planning problems in landscapes and seascapes (Watts et al., 2009). Whereas earlier versions of Marxan mainly focused on the optimal allocation of reserved areas for nature conservation, later versions were extended with zones, providing land use zoning options in geographical regions for biodiversity conservation. The software allows any parcel of land to be allocated to a specific zone. Each zone then has the option of its own actions, objectives, and constraints, with the flexibility to define the contribution of each zone to achieve targets for prespecified features (e.g., species or habitats). The objective is to minimize the total cost of implementing the zoning plan while ensuring a variety of conservation and land use objectives are achieved. In one application, Wilson et al. (2010) used Marxan
to prioritize investments in alternative conservation strategies in East Kalimantan (Indonesian Borneo).
Integrated Valuation of Ecosystem Services and Tradeoffs (InVEST) is a spatially explicit software-based tool developed by the Natural Capital Project (http://www.naturalcapitalproject.org) that provides a means of comparing tradeoffs among ecosystem services by quantifying the value of natural capital in biophysical and economic terms. The modeling process starts with stakeholder-defined scenarios of LULC changes in the study region of interest. Given these scenarios as inputs, InVEST calculates the changes in targeted ecosystem services (e.g., including biodiversity conservation, water quality, and commodity production levels) for each scenario. The approach provides a means of quantifying ecosystem services in a spatially explicit manner and analyzing trade-offs among alternative scenarios or policy options, including how payments for ecosystem services can alleviate trade-offs in which private markets result in an insufficient provision of ecosystem services by landowners. For example, Goldstein et al. (2012) applied InVEST to evaluate the environmental and financial implications of alternative land use development plans for the largest private landholder in Hawaii, Kamehameha Schools. They examined the implications of these alternative land use scenarios for multiple ecosystem services, including biofuel feedstocks, food crops, forestry, livestock, and residential development. They predicted the changes in these ecosystem services for each land use scenario and then used observed prices or parameter estimates of nonmarket values from the literature to translate changes in ecosystem services into monetary benefits and costs. They found, for example, that diversifying agriculture can generate additional financial returns and contribute to climate change mitigation through increased carbon storage, but that trade-offs exist between carbon storage and water quality. Based on this information, the private landholder developed a land use plan to meet private financial goals that also generated societal benefits through climate change mitigation, improved food security, and rural economic development. These calculations in InVEST could also be thought of as objectives to be attained in an optimization process.
Clear synergies can be achieved from integrating positive LCMs with normative methods such as Marxan and InVEST, and traditional design approaches, which could provide a means for generating and evaluating land change scenarios and policy or management mechanisms that meet some specific goals. Positive models have the capacity to explain or predict land change patterns and processes that are associated with specific trends or policy changes and normative models elucidate the trade-offs associated with predicted outcomes and identify desirable outcomes considering those trade-offs. When used in combination, LCMs and normative approaches could be used in analyses of the trade-offs relative to multiple objectives generated by the change. Process-based LCMs and normative approaches together can provide meaningful guidance to policy makers regarding the potential benefits and costs of a policy by describing the predicted
effects of a policy on land change. Data-based LCMs integrated with normative approaches can also be used to explore how projected land changes compare to some specified objectives and possible ideal outcomes (Seppelt et al., 2013). Because many existing optimization models include spatial patterns as explicit objectives, combined analyses of positive and normative models might be useful to fully incorporate the shape, size, distribution, and connection among different land units into the analyses of land system dynamics and trade-offs. Advances in combining positive and normative approaches will likely require adaptations of both, extending optimization approaches, like Marxan, to include dynamic land systems and adapting LCMs to couple more directly with the environmental and other models that are used to evaluate and quantify outcomes, using tools like InVEST.
The second set of opportunities is not necessarily associated with the availability of data, but rather in how the enormous quantity of new data can be incorporated into LCMs, how land change modelers can learn about and adapt modeling approaches to use these new data sets, and how land change modelers can help inform the development of image-processing algorithms and data collection schemes that can generate products for the next generation of land change models. This opportunity identifies both gaps in the availability of data for LCMs, particularly in limiting the characterization of processes of land change, and ways to fill those gaps. Additional opportunities are presented through new cyberinfrastructure, discussed in the next section. These challenges have been mentioned by others and some progress towards meeting these challenges is being made. However, the committee believes that more progress on these topics is needed to mainstream these advances into LCMs and respond to user-requirements.
Improved Capture and Processing of Remotely Sensed Data
New data sets are required that can provide information on the dynamics, stationarity, and complexity of land and land-related processes. There has been a significant increase in new satellite and airborne sensors, which has resulted in an explosion of data and new analyses, but the development and application of models has not kept pace with developments in data. Large data sets provide new opportunities for land modeling but create various needs for automated processing. Methods that were developed for analysis of small-to medium-sized sets of images or similar packages of related data need to be adapted to handle long time series and larger geographic extents, often at finer spatial resolutions. For example, new image-processing algorithms that use objects rather than pixels as the unit of analysis may provide new types of data opportunities to link satellite-based LULC information with land management information (Pasher and King,
2010; Zhu and Woodcock, 2012), and to develop models that operate on patch or parcels rather than pixels, which are the most common unit of analysis in LCMs. High-spatial-resolution data, such as that provided by QuickBird, are able to provide fine-scale habitat and plant diversity information (Hall et al., 2012) and active sensors, like LiDAR, provide canopy structural information (Ardila et al. 2012; Walton et al. 2008; Lehrbass & J. Wang 2012; Morsdorf et al. 2004; Sohn & Dowman 2007), both of which create opportunities to develop LCMs that generate land-cover outputs with detail that goes beyond nominal categories. Significant efforts at mapping LiDAR during leaf-off periods is supporting DEM creation and flood management purposes, but more leaf-on LiDAR data is needed for these applications related to vegetation canopy structure. Additionally, new and improved image-processing algorithms are providing new information about land use and land-use characteristics, such as land-use intensity (Franke et al., 2012). These algorithms can be deployed to develop data-based LCMs that are more directly sensitive to land use.
Essential to future progress in LCMs is the continuity of satellite-, airborne, and survey- based observations that build on the existing record of Landsat, as well as national surveys and censuses, in order to estimate and calibrate LCMs. A majority of respondents to the committee’s informal questionnaire mentioned the importance of capturing historic data, such as aerial photos and old land records, as well as maintaining satellite mission continuity. With the Landsat record now spanning more than 40 years and the entire archive now available free of charge, new algorithms are being developed that utilize the entire (or most of the) time series. These high-frequency temporal observations provide new types of information on land cover and land use, such as disturbance (Baumann et al., 2012; Stueve et al., 2011; Zhu et al., 2012) and land-use intensity (Maxwell and Sylvester 2012), which were not available with the less frequent observational data commonly used prior to the opening of the archive. In addition to creating opportunities for new data inputs to LCMs, the higher temporal frequency of images permits the kind of temporal analysis used to describe LULC temporal dynamics at coarser resolutions (Eastman et al. 2009; deBeurs and Henebry 2010) and at moderate to fine resolutions. In combination with the growing length of the archive of Landsat images, these new analyses will facilitate a better empirical understanding of spatial and temporal non-stationarities in land-change processes, which can ultimately improve our understanding of key variables and processes that need to be incorporated in LCMs. Furthermore, the higher frequency of available observations will drive demand for restructured LCM frameworks, especially for those that are more data-based (i.e., statistical, machines-learning, and cellular approaches), and that accommodate more frequent observations and more recent observations, perhaps through use of data assimilation approaches (e.g., Rodell et al. 2004). These approaches have not been used with LCMs to date.
Historical aerial photo data sets offer the potential to extend existing records
of land cover and its change even further into the recent past, before the availability of satellite imagery (e.g., Sylvester et al. 2013). Object-based analysis and machine learning algorithms are particularly promising technologies to create land use information from historical photos. Contextual information from object-based approaches promises to improve classification of aerial photography, despite inconsistent spatial and spectral information in these data (Laliberte et al., 2004). As longer time series data become available, e.g., greater than 70 years, the challenges and opportunities of modeling land change through periods of structural economic and technological change becomes very real. While Chayanovian and Boserupian theories of development provide a starting point for understanding livelihood changes on these time scales, few structural models exist of how land systems evolve over longer time frames, though new models could build upon theoretical and empirical work in economic and demographic literature (e.g., Galor and Weil 2000; deSherbinin et al. 2008). While longer term economic and population forecasts are already incorporated as inputs to a variety of LCMs that rely on statistical approaches to allocate the land implications of these changes spatially (e.g., Bierwagen et al. 2010), endogenizing these changes in longer-term structural models would permit a better representation of complex dynamics and feedbacks between land use and livelihoods.
Remotely sensed data are being used in new ways to generate socioeconomic information, such as the use of the nighttime lights product to estimate variables related to energy consumption (Zhao et al., 2012; Kiran Chand et al., 2009; Townsend and Bruce, 2010; De Zouza Filho et al., 2004), and economic activity (Chen and Nordhaus, 2011; Henderson et al., 2012). By providing a means to estimate key socioeconomic variables in spatially and temporally explicit ways, these new analyses provide a basis for new approaches to parameterizing LCMs. For example, data on energy use or economic activity could be used to better represent a diversity of livelihoods and land-use strategies, which could convert to better representations of how and where land is likely to change.
The growth of small Earth observation sensors is another important development in remote sensing. The constellation of newer and smaller satellite and airborne platforms includes many from private companies and private-public partnerships such as Specim hyperspectral airborne sensors (Eagle, Hawk, Owl, and Dual) from Finland, Itres Compact Airborne Spectrographic Imager family of sensors from Canada, and Proba-1 from the European Space Agency, to name only a few. Hyperspectral sensors provide the opportunity to develop richer biophysical attributes of the land surface that could provide new measurement inputs to LCMs. For example, because these sensors are sensitive to canopy chlorophyll and nitrogen (e.g., Ebbers et al 2002; Kaye et al. 2005), it may be possible to use them to infer information about variations in land management behaviors, which are hard to measure. Some of these smaller satellites are configured for specific applications in specific regions. For example, the Disaster Monitor Constellation of small and low-cost Earth observation sensors developed by Surrey Satellite
Technology Limited in the United Kingdom can be used for land monitoring at high spatial resolutions, which include NigeriaSat-1, NigeriaSat-2, NigeriaSat-X, Beijing-1, and UK-DMC-1, among others. The Advanced SCATterometer onboard the Metop satellite is a follow-on to the European Remote Sensing scatterometers and provides a soil moisture product at coarse spatial resolutions (25 and 50 km) and at nearly daily repeat cycles (Brocca et al., 2010). Soil moisture retrieved in this way could be used as an input to LCMs, for example to help parameterize a model that includes decision to irrigate as one of its processes.
Integration of Heterogeneous Data Sources
An important challenge to making the most of remotely sensed data for use within LCMs is to integrate them with a variety of heterogeneous data sets. Land change information at a variety of spatial and temporal resolutions can be integrated with socioeconomic and biogeophysical data for coupling of LCMs and other types of models such as models of climate change, ecosystem services and biodiversity, energy use, and urbanization. There is also a need to go beyond LULC in LCMs and incorporate other dimensions of land. As discussed above, new remote sensing sensors and approaches are showing promise in better retrieving land-cover dynamics, land-use variables like intensity, biophysical variables like plant nutrient contents and soil moisture. Data fusion approaches have shown promise for these purposes (Lunetta et al. 1998; Sun et al. 2003; Mutlu et al. 2008). Other variables, including land function, land use density, land tenure, land management, and land value are difficult to characterize on the basis of remote sensing data alone. Land function includes the provision of goods and services related to the intended land use as well as benefits from aesthetic values, cultural heritage, and preservation of biodiversity. Information on land use density, though it can be estimated from nighttime lights images, might benefit from additional data about residences, buildings, or employment. LCMs increasingly need to represent information and processes about land management decisions, often at high temporal resolutions, such as crop types, irrigation, fertilizers, and urban development patterns. While soil moisture and canopy nutrients can help, data on land management decisions (e.g., permitting new urban development) and policies (e.g., zoning and stormwater incentives) need to be available at different administrative levels (e.g., local, county, and regional land use plans). Poor availability of spatially explicit ecological data, such as crop pollination, timber production, and land-based carbon resources, constrains assessments based on ecosystem service models like InVEST. In all of these cases data from remote sensing are useful, but insufficient sources that need to be combined with other available data. Combining and leveraging various data sources to create hybrid data products that draw together remotely sensed, spatial, and social data can create new types of information products.
To support further developments in the use of remotely sensed data to
estimate aspects of the land surface that have greater relevance to assessment of both its human and ecological value, and to make these data available for adaptation within LCMs, ongoing in situ observations and survey programs are needed on all of these topics. Additionally, though data on land value and land ownership cannot be collected through remote sensing, they are often available locally in the developed world but more inconsistently available in the developing world. Unfortunately, no consistent program for compiling the data exists, so the research community also lacks good, reliable access to data on land value and land ownership (i.e., cadastral data). Opportunities for compiling land parcel data have been outlined elsewhere (NRC, 2007), and data on land values, based on transactions, can be collected and compiled (e.g., Zillow and Trulia), but these data remain an expensive component of many LCM projects. Understanding and communicating the limits of data availability and their implications are important throughout the modeling process.
Data on Land-Change Actors
Land change is the cumulative result of the decisions and interactions of a variety of actors—households, firms, landowners, and policy makers at local, regional, and global levels. Current models, as well as their theoretical and empirical bases, are limited to some extent by their use of (a) aggregate data that miss important sources of spatial heterogeneity; (b) cross-sectional data that prohibit causal identification; and (c) a dearth of microdata on the characteristics, preferences, and decision-making processes of households, firms, policy makers, and other agents whose actions determine land change outcomes. These limitations are especially marked for the process-based approaches such as structural economic and agent-based. Better data on these actors and their beliefs, preferences, and behaviors is critical to improving the theoretical underpinnings, structural specifications, the predictive ability, and the usefulness of LCMs in evaluating the consequences of alternative policies. These data should be spatially explicit and available for multiple points in time so that they can be used to specify dynamic spatial models of land change processes.
Despite the increasing availability of spatial data on land change, data on the individuals whose choices and interactions generate observed land changes are often missing. For example, though parcel-level property tax data have been increasingly used in LCMs, these data omit information about households, for example, income, race, presence of children, education, and other variables that influence household location choices. Researchers have compensated by combining the parcel data with data from the U.S. Census Bureau on household characteristics, but these data are only publicly available at a more aggregated spatial scale (block group or tract) and traditionally were available only every 10 years. Since 2006, the U.S. Census Bureau has published the American Community Survey, which provides data on a subset of household characteristics for
a sample of households in the United States. The data are produced annually to represent an aggregation of observations of a period of years, which depends on the spatial scale of observations. Although this approach adds temporal dynamics, it is achieved through temporal averaging which limits its usefulness. Creating data sets that contain spatial data on individual characteristics and behaviors over time will require considerably more effort and resources.
In the absence of a systematic and purposeful data-collection effort on land change actors, many ad hoc approaches to generating these data have emerged. Several of these approaches are quite promising. The approaches that have been used to collect data on agent characteristics, decision processes, and behaviors represented in agent-based models include surveys, field and laboratory experiments, participant observation, role-playing games, and inference with statistical methods (Robinson et al., 2007), most of which involve significant expense. An innovative example of the latter is the use of restricted microdata that are available from some government agencies. For example, the U.S. Census Bureau operates 14 secure research data centers located in different parts of the United States that provide opportunities to work with restricted-use microdata on households and firms. Restricted access to confidential data on farmers and farming operations in the United States is available through the U.S. Department of Agriculture (USDA). These data permit new research questions regarding the underlying economic or behavioral process to be studied by providing additional information about individual characteristics and location.
For example, Kirwan (2009) used individual data on farms and farm rental rates to identify the effect of government agricultural subsidies on farm rental rates. The individual-level farm data are critical to identifying the causal effect, which otherwise would be impossible to separate from correlated unobserved variables using more aggregate data. Finally, there are examples of innovative models estimated with microdata from a survey. For example, Conley and Udry (2010) modeled the role of technology spillovers in influencing agricultural production decisions in rural Ghana. These authors collected intricate details about the neighbors with whom pineapple farmers in Ghana communicate and what they share with each other about their production practices. They then used these microdata on social interactions to estimate a microeconomic model of technology spillovers based on social learning. Game-based approaches, participatory mapping, and participant observation approaches support more deductive approaches that avoid the major assumptions about human behavior involved in statistical modeling (e.g., Castella et al., 2005). These approaches are generally applied for smaller areas and may be challenging to scale to larger areas.
Making Systematic Land Use Observations
In addition to consistency and continuity provided by remotely sensed observations, the reliance of LCMs on heterogeneous data, many of which are not
completely observable remotely, means that a healthy LCM enterprise relies on robust and ongoing on-the-ground observations of multiple dimensions of natural and human systems. Because survey programs on land characteristics, like those on water, are divided among multiple agencies and geographies, integrating data for understanding and predicting changes in the land system can be particularly challenging. For example, in the United States, data on forests are collected by the USDA Forest Service through its Forest Inventory and Assessment, data on farms through the Census of Agriculture, data on demographics through the Census Bureau, and so on. All agencies use different sampling schemes, temporal return intervals, and geographic aggregation units.
The Natural Resources Inventory (NRI), developed and implemented by the Natural Resources Conservation Service, has been the only national-scale, repeated sample of land use, but it was not designed or intended to serve that purpose. Nonetheless, important research on the drivers of land use change has resulted from that program (Lubowski et al., 2006). The loss of continuity in fine-scale NRI land use data is a setback for forecasting land change in the United States at fine scales.
For these reasons, the committee obtained information from the community about the potential need for a national land observatory, or a national survey of land resources. The idea was raised at one public meeting, where it was discussed constructively by participants. We followed this discussion with an informal questionnaire that reached members of the LCM community. Over 100 responses to the questionnaire revealed support and interest in using data from such a survey. Although beyond the scope of this report to outline the design for such a survey, we conclude that a program to collect spatially referenced data with linked records on land patches, land parcels, and land users sampled through a purposive design and maintained through repeated waves over time presents a significant opportunity for the LCM community. Such a program would facilitate greater understanding of land change processes, would allow hypotheses to be tested, and would improve our predictive ability.
A number of the challenges noted above have the potential to find solutions through contemporary advances in cyberinfrastructure. In the following sections, two areas are described in which cyberinfrastructure advances represent potential opportunities for land change modeling.
Crowd Sourcing and Distributed Data Mining
A key data need for better construction, calibration, and validation of structural models is in the area of microdata on agents, especially for process-based LCMs. The ability to collect and analyze very large amounts of data on individual
behaviors, much of which is referenced in time and space, has grown tremendously over the past decade. Examples include point-of-sales data on individual purchases by consumers, location-aware technologies that track individuals in space and time, and Internet activities that reveal social networks. Additionally, computationally and labor-intensive processes are increasingly being conducted by distributed groups, assisted by the increases in the computational power of computers alongside high-speed Internet access. Lazer et al. (2009) viewed this development as an emerging computational social science that is based on researchers’ ability to harness these data. However, as Miller (2010) discusses, the privacy and propriety issues are not trivial and are mostly unresolved.
Additionally, the LCM community could benefit from distributed data collection facilitated by Global Positioning System– and Internet-enabled mobile devices. A number of recent projects have successfully combined data from traditional sources with geospatial and other data that are crowd sourced from a relevant population and illustrate how data-collection efforts might be structured to facilitate model parameterization. Citizen-contributed data supported the implementation of Ushahidi in Haiti following the 2010 earthquake, which helped plot at least 4,000 distinct disaster events (Zook et al., 2012), where universities and nonprofit agencies played important roles in disaster response. Information was provided by volunteers and aggregated for visualization, use, and analysis. Micropayments for microtasks, following on the model of Amazon’s Mechanical Turk, have also shown promise as a means for data collection (Kittur et al., 2008), including social survey data. These data could be used as inputs on the heterogeneity of actors in agent-based approaches. Statistical and econometric approaches to parameterizing the behavior of land-use actors could take advantage of these data, but issues related to uncertainty in these data require further investigation (Flanagin et al. 2008). Given the potential for large volumes in these data, and problems associated with unknown and variable data quality, data mining and machine learning approaches may be the most promising approaches for extracting model inputs from them. Extensible data tools on mobile devices have also been used to enhance the participatory nature of efforts to collect microdata on agents. Google Maps, and other cloud-based mapping technologies, are already being used in environmental monitoring projects to create geospatial data sets that are coproduced by the public and scientists (Connors et al., 2012; Goodchild, 2007). Examples in international agriculture include the Avaaj Otalo study, which used an interactive voice forum for rural farmers (Patel et al., 2010); another study used mobile phones for collecting information at various points in the coffee production process for small farmers (Schwartzman and Parikh, 2007); and the Digital Green project delivered targeted information to marginal farmers through participatory networks (Gandhi et al., 2009). Combining these data-collection approaches with LCMs has the potential to improve both availability of microdata and the degree to which findings from LCM projects make their way to a diversity of participants in the land system.
A second opportunity for which cyberinfrastructure developments show promise is the increasing ability to meet the computational demands of some of the modeling approaches outlined above. Given increasing data volumes and model interactions that might be expected for some modeling applications based on the opportunities outlined above, developments in processors, data storage, and network bandwidth all offer important improvements. For example, coupling land change and environmental process models at high resolutions provides opportunities to explicitly incorporate information on fine-scale patch or parcel adjacency, connectivity, and shape, which regional to global-scale models often leave out or attempt to parameterize as subgrid-scale phenomena. Models that incorporate finer-scale spatial interactions over larger spatial domains would provide benefit to regional and global-scale models. The advent of spatially distributed models in the environmental sciences has both required higher-resolution information to resolve shape, adjacency, or connectivity, and developed a greater demand for this information, including LULC. Distributed data storage, which can be used to maintain archives of large longitudinal data sets, together with increased network speeds facilitates these kinds of model developments and model couplings. These developments will surely require that taking advantage of these opportunities will require new approaches to engineering and implementing LCMs. In another example, the integration of optimization approaches into land change modeling to represent agent decision making and to develop optimal land patterns and functions, particularly at finer resolutions and over heterogeneous areas, requires use of both advanced computational tools and new heuristic approaches to improve their computational feasibility (Batty, 2008; Wright and Wang, 2011).
Advances in processing power are increasingly based on deployment of multiple processing cores and increasing numbers of processors. Distributed computing takes advantage of processors that are linked across networks and present opportunities for distributing modeling and simulation tasks. New architectures like graphic processing units (GPUs) also offer enhanced capacity. Taking advantage of this enhanced computing power requires that models be written to take advantage of parallel processing, that is, the partitioning of computational tasks among multiple processors running simultaneously. When significant data communication is required between parallel processing tasks, the advantages of parallel processing can be reduced and careful design of the parallel algorithm is required. For this reason, some modeling approaches and problems will be able to benefit from these developments in computing more than others. Li et al. (2012) report a 30-fold increase in processing speed for a cellular automaton model running on a GPU versus a traditionally developed model. Tang and Bennett (2011) were also able to achieve between 10 and 40 times the processing speed by running an agent-based model of opinion diffusion on a GPU.
Progress in land change modeling is partially impeded by the continued reinvention of modeling environments, frameworks, and platforms by various research groups. Below are some specific findings regarding research infrastructure that could facilitate solutions to overcome this barrier. Specifically, we identify three kinds of infrastructure investments that would facilitate integration, comparison, and synergy across the community of land change modelers: model infrastructure, data infrastructure, and community governance.
The community infrastructure envisioned for land change modeling might be modeled on existing structures developed within other fields. For example, the atmospheric modeling community has developed a community infrastructure for building, providing data inputs to, comparing, validating, and learning from atmospheric and related models aimed at global change science. This has evolved as the Community Earth System Modeling community, and it includes a number of working groups focused on specific aspects of the Earth system (http://www.cesm.ucar.edu/). One working group, focused on the “Community Land Model,” is developing modeling capabilities that focus on “ecological climatology” in order to better link physical, chemical, and biological aspects of the land surface to atmospheric processes (http://www.cgd.ucar.edu/tss/clm/). The effort does not include any of the social and economic processes needed to model land use dynamics, but represents a potential model for infrastructure development and governance for future community efforts in land change modeling. This includes regular open meetings, a community model development approach, model intercomparison activities, and compilation of data sets and activities for model validation. The heterogeneity of approaches to LCM outlined in this report may require a structure that accommodates a wider range of applications. These issues are explored below.
Model and Software Infrastructure
A model infrastructure would address the need for models, model code, and model platforms that can be used to avoid duplication of effort among various constituents in the land change modeling community. Such an infrastructure should be open source to permit contributions from and availability to participants from throughout the scientific community. Some of this infrastructure exists in various forms already, as existing open-source platforms and models. The challenge for the LCM community is assembling this existing infrastructure and building on it in such a way that it can serve as a platform for (a) further advancing fundamental understanding and representation of land change processes and (b) integration with a wide range of biophysical and socioeconomic models for evaluating the impacts of land change.
Existing open-source models have served the community well and have allowed scientists to include land change dynamics in studies across various fields and applications. For example, SLEUTH is a cellular model that has been used extensively in studies of urbanization (Clarke and Gaydos, 1988; Clarke et al., 1997; Herold et al., 2003) and its effects on urban landcape dynamics (Berling-Wolff and Wu, 2004; Syphard et al., 2005) and watershed impacts (Claggett et al., 2004; Jantz et al., 2010). CLUE, also a cellular model, has been used widely to generate land change scenarios and impact assessments at regional scales (Lesschen et al., 2007; Veldkamp and Fresco, 1996; Verburg et al., 2006; Wassenaar et al., 2007). UrbanSIM, a microsimulation model that is similar in character to agent-based models, has been used in a number of cities to develop forecasts of urban development, travel demand, and environmental impacts (Waddell, 2002) and has been used in Seattle (Waddell et al., 2007); Paris (de Palma et al., 2007); Detroit; Durham, North Carolina; Honolulu; and Houston, among others. Use of existing models is attractive at least partially because of the time required to build models. A number of challenges with using existing models include (a) poor understanding on the part of the user of the underlying mechanisms and parameters, (b) related to the first, inappropriate application of a model in situations or at scales for which it is not suited, and (c) difficulties of understanding code structures and details, which can make modifications very time consuming to make. Problems (a) and (c) above are made more acute when models are developed with proprietary processes and codes, because users have a harder time assessing and adapting these models.
To facilitate more expeditious construction of models and greater ease of model modification and integration, a number of open-source modeling environments have been developed that are either intended specifically for land change modeling or more general modeling environments that are suitable for land change modeling applications. In the former category, Dinamica EGO provides an environment for graphical construction of scripts that implement cellular models based on a number of primitive operations, referred to as functors (www.csr.ufmg.br/dinamica/). While the framework has the more general applicability of model builders within GIS packages (like ArcGIS and Idrisi), it has been most commonly applied to land change questions (e.g., Soares-Filho et al., 2006, 2010; Thapa and Murayama, 2011). The Open Platform for Urban Simulation (OPUS) was developed by the team that produced UrbanSim as a more general model development environment for building and testing urban models (Waddell et al., 2005; www.urbansim.org/downloads/manual/dev-version/opus-userguide/). OPUS uses Python to access object codes that can be used to build more complex models.
A wide variety of other general modeling environments have been used for building land change models. For agent-based models, the earliest open-source tool was Swarm (www.swarm.org), which required models to be developed in the objective-C language. Repast (repast.sourceforge.net) offers similar software
functions to Swarm, but the models can be developed in the more common Java language, C++, or Python. Among many other agent-based modeling platforms are NetLogo (http://ccl.northwestern.edu/netlogo/), in which models are developed within its own high-level programming language, MASON (http://cs.gmu.edu/~eclab/projects/mason/) based on Java, and Cormas (http://cormas.cirad.fr/indexeng.htm) based on Smalltalk. Each of these environments provides software tools that can be incorporated into new agent-based models (in the form of programs) that can be used to represent model components, control model function, and evaluate and visualize model output. Because cells can be treated as agents in these model environments, cellular models can also be implemented using these platforms. The Global Trade and Analysis Project (GTAP) has served as an important platform for developing a variety of computable general equilibrium models, including those related to land use change (www.gtap.agecon.purdue.edu). While the model platform itself is not open source, the GTAP database that is the core of the project has been developed through open-source institutional arrangements. Econometric models are generally developed with software platforms aimed at statistical analysis. R is an important open-source platform for development of statistical models, including econometric models (http://www.rproject.org/). R provides tools for data calculation, statistical estimation, and visualization that are accessed through the R scripting language.
Infrastructure to support future developments in land change modeling will surely need to build on these existing resources, but efforts at coordination toward the needs of land change modeling will be beneficial. Such coordinated efforts should aim toward identifying the various constituent processes of land change and developing software components that represent those constituent processes. Formal descriptions of such components can become an important step toward combining parts of models and developing modules that can be changed or interoperated. For example, Parker et al. (2008) described a “conceptual design pattern” for agent-based models of land use change that serves as an example of the kind of general model descriptive framework that can be envisioned and implemented. This conceptual design pattern describes land change processes in six conceptual design considerations that might define modules of any given land change model: information/data, interfaces to other models, demographics, land use decisions, land exchange, and model operation.
Apart from attempts to develop modules that can interact within a common framework, the development of formal model descriptions can help with communication and replication of existing models or model components. Just as standards for descriptions of data have been critical to the advance of data sharing and interoperability, descriptions of models are equally important, but they are less well developed. A first attempt was made at a metadata content standard for computational models (Smith et al., 2001), but this standard has not been further developed. The Overview, Design concepts and Details protocol was proposed (Grimm et al., 2006) for agent-based models and has been widely used within the
community of researchers using these models. However, further development of such standards and protocols within the land change modeling community could help further advances in model development and application.
A data infrastructure would provide access to a common set of data resources that are necessary for running and validating models of land change. The majority of respondents to the committee’s user community questionnaire expressed some level of support for such a common set of data resources. The second section of this chapter outlines data sources that are essential to the land change modeling enterprise, from historical data on land use and land cover change at multiple scales to a variety of demographic, economic, and policy inputs to land change models. The challenge of modeling land changes is exacerbated by the diversity of data requirements and the need for these data to be collected over time. Although a variety of data sets exist to support these needs, further developments in improving spatial and temporal resolutions and better representing changes over time would be facilitated by a formal data infrastructure to support land change modeling.
Existing resources include a variety of national and regional agencies supporting data on land cover change, often provided by space agencies as products from satellite image programs. For example, the National Aeronautics and Space Administration (NASA) supports the “Global Land Cover Facility” as a provider of image data and derived land cover products (http://glcf.umiacs.umd.edu/data/). GlobCover is a product provided as a service of the European Space Agency in conjunction with the United Nations Food and Agriculture Organisation (http://due.esrin.esa.int/prjs/prjs68.php). Aside from satellite image data that can be used to collect land cover information consistently over regional to global extents, existing data at subnational and local levels is more heterogeneous and, therefore, difficult to compile in comparable formats. Some efforts have been made to do so for land and demographic data sets. For example, global historical land cover data (1700-2000) have been compiled through a number of research projects that have been aimed primarily at supporting global Earth system dynamics models with dynamic land cover information (Hurtt et al., 2006; Klein Goldewijk and Ramankutty, 2004). Furthermore, the Center for International Earth Science Information Network, supported through NASA’s Socioeconomic Data and Applications Center, compiles and provides access to a variety of global socioeconomic data that can support land change model development. Compiling comparable data from local-level cadastral, land use, survey, and other data sets is an important challenge for the land change modeling community.
Future infrastructure developments need to further support compilation, curation, and comparison of the heterogeneous data sources for input to, and parameterization and validation of LCMs. This component of the infrastructure
for land change modeling requires open access to, documentation of, and structured organization of heterogeneous data for land change science. A couple of existing data infrastructure models are worth exploring, and also connecting to, as they include data to which the land change modeling community might reasonably expect to connect. GEON serves as a set of software and data resources that supports data sharing and integration in the Earth sciences communities and has focused on digital elevation, geophysical, and bore hole and well data (www.geongrid.org). Early stages of this network required development of common semantic frameworks for describing and modeling data with heterogeneous semantic and spatial definitions and scales, and the links between them (e.g., Vaccari et al., 2009). The Consortium of Universities for the Advancement of Hydrologic Science has developed a hydrological information system that links together data on the hydrologic environment (his.cuahsi.org) and was developed through a similar data publishing and integration process (e.g., Horsburgh et al., 2009). Similar to both of these projects, a data infrastructure to support land change modeling would need to recognize the different thematic data that are necessary; recognize their heterogeneous semantic, spatial, and temporal referencing; and develop a structured system for access and integration in the form of a global integrated land information system.
A number of promising developments in this direction might be helpful to the development of such an integrated system. Examples include, first, the recent support by the National Science Foundation for the Global Collaboration Engine, which aims to facilitate integration of data and models working at various scales specifically for the land change modeling community (ecotope.org/projects/globe). The project aims to provide global data that can be used to enhance comparability among diverse case studies, which are a common mode of data analysis within land change science. Second, the TerraPopulus project aims to integrate data on population, land cover, climate, and land use across the globe and over time. The population data include commonly used aggregated microdata on individuals, which are compiled as part of the IPUMS-International project (www.terrapop.org). Finally, the Geoshare project (https://geoshareproject.org/) aims to coordinate global data relevant to economic analysis of global agriculture and land use systems. Each of these projects is still new at the time of this writing, so how they develop to support land change modeling has yet to be proven.
Community Modeling and Governance
A community modeling and governance infrastructure that supports developments in land change modeling would provide mechanisms for decision making and advancement of modeling capabilities within a broad community and toward specific, achievable goals and capabilities. A community of land change modelers could settle on a series of specific goals and endpoints and work together with input from that broad community to move modeling and data capabilities forward
in ways similar to those outlined in the previous two sections. The majority of respondents to the committee’s informal user community questionnaire either supported or saw value in such community models.
Two existing structures serve as potential models for how such a community structure might work. The first is that used for the Community Earth System Model (CESM), which currently has a sub-component called the Community Land Model, described above. This model effort is carried out by a group of researchers who seek funding on their own to provide advancements within a democratically governed framework, organized by working groups, to make changes to the model. All changes must be freely available with open code and documentation, which is the responsibility of the developer. For example, a new working group on a community land use model could decide to further develop the details of the conceptual design pattern outlined by Parker et al. (2008), or some other framework, that can then serve as a basis for development of plug-and-play land change modules that could be used to construct a variety of different land change models that are linked to a variety of other environmental models. The newly formed Societal Dimensions Working Group with the CESM framework could be an institutional location for the work, or some subset of it.
A second structure is that offered to the community of modelers applying agent-based modeling to understand socioecological systems in the form of the Network for Computation Modeling for SocioEcological Science, which maintains openabm.org as a platform for sharing open models and resources and is working on developing and furthering protocols for model documentation and development. A much looser confederation of modelers, this structure provides a rule-based framework within which modelers can contribute a wide range of models and around which specific outcomes or goals do not need to be agreed upon.
There are a variety of practices that can enhance land change modeling to make it more scientifically rigorous and useful in application. Some of these practices are established but not always followed, whereas others require more research to test and establish. A set of reviews and standards have been produced of best practices for environmental modeling (e.g., Crout et al., 2008; EPA, 2009). Here we summarize best practices in the evaluation of LCMs in four broad categories: sources of uncertainty, sensitivity analysis, pattern validation, and structural validation.
Sources of Uncertainty
Uncertainty in LCMs can come from a variety of different sources. The data concerning inputs and values of model parameters usually have some level of
error. These data describe the boundary conditions (e.g., initial land cover) and exogenous dynamics (e.g., price fluctuations). Additionally, the model structure itself will have some uncertainty associated with it, including the processes represented in the model, their interactions, and how they are represented mathematically or algorithmically. The uncertainty in these aspects of the model can stem from both incomplete information about their historical states, due to uncertainties or unavailability of data, and variations in their states over some observation period (i.e., nonstationarity in the process). Substantial uncertainty in forecasting future states is often due to nonstationarity in processes. Nonstationarity may exist due to changes in exogenous conditions that cannot be endogenized within the model, and shifts in processes that are poorly understood (e.g., changes in human decision making due to developing cultural attitudes or preferences). Although quantification of model uncertainties provides important evidence about model efficacy, these uncertainties must be placed in the context of an understanding of the effects of nonstationarity in the process on the predictive ability of any given model.
There are as many possible measures of stationarity as there are measures of change. A process might be stationary according to one measurement but not according to another measurement. It is essential to understand whether a process is stationary according to particular measurements when LCMs are used to extrapolate historic trends. It is possible that the land change process during the historic calibration interval is different than a more recent time interval that is used for validation in terms of the structure of the process or the magnitude of various drivers. If this is the case, an extrapolation model will not have a high measurement of validity. Thus, it is necessary to understand the stationarity in the process before engaging deeply in empirical-based modeling. Many modeling exercises begin by creating a business-as-usual scenario, which is a scenario that extrapolates historic trends. However, if historic trends have been nonstationary, then historic business has not been usual, in which case it makes little sense to construct a business-as-usual scenario.
Sensitivity analysis is an established procedure whereby the investigator examines the variation in model output due to specific amounts of variation in model input, parameter values, or structure. Sensitivity analysis can be useful to evaluate the importance of uncertainty arising from multiple sources and to understand better the situations in which the modeled system may show important changes in behavior. Evaluation of the sensitivity of a model to one or more parameters can be evaluated by perturbing a parameter’s value over a specific range, thus creating a range of outputs. The rate of change of results relative to inputs provides an assessment of sensitivity. Sensitivity will vary both by parameter and by the initial value of the parameter that is perturbed, such that sensitivity
may be a local property. Extension to two or more parameters is accomplished in a similar manner, by simultaneously perturbing multiple parameters, and facilitates evaluation of interaction effects of the parameters (Ligmann-Zielinska and Jankowski, 2008).
A similar approach can be accomplished by perturbing values of one or more data inputs to establish sensitivity of the model to the range of exogenous forcing information, or to initial conditions. Additionally, sensitivity analysis can be applied to model structure, both for cases where separate models will be evaluated and where there are options for different process representations in the same model. Because differences in structural or dynamic characteristics of a model are important elements of sensitivity, comparison of single-map outputs may be inadequate for evaluating model sensitivity, and evaluations may need to be made over the entire course of a model run (Ligmann-Zielinska and Sun, 2010) or in ways that compare across multiple runs of the same model (Brown et al., 2005).
It is important to perform sensitivity analysis in a manner that relates to the particular research question because models can make many minor or self-cancelling errors that are ultimately not important for a model’s particular purpose. For example, in a model whose purpose is to simulate carbon dioxide emissions as a result of deforestation, errors of omission in predicting land change that balance with errors of commission can be ignored as not important in terms of the goal of estimating total carbon emissions. In one study, a comparison of seven different carbon maps indicated that uncertainty in the quantity of carbon is much more important than uncertainty in how the land change model simulates the spatial allocation of deforestation (Gutierrez-Velez and Pontius, 2012).
Model selection sometimes makes a large difference in results, but sometimes model selection is not the most important factor. For example, a comparison of the predictive accuracy of the output maps from two models, cellular automata/Markov versus Geomod, found that the variation in results between models was less than the variation within a model due to parameter selection.
Best-practice modeling should result in models with a level of complexity no greater than what is required for a specific project or application. Models that have too many parameters and assumptions are difficult to calibrate and validate. Sensitivity analysis offers one method to prioritize research and determine the most important parts of the model to develop. If the results of a LCM are insensitive to certain processes or parameters, then the model or efforts to determine parameter values can be simplified. This allows prioritization of effort and resources toward more sensitive processes, parameters, and input data. Thus, sensitivity analysis can facilitate the design of research to simplify the model and to focus effort on the most sensitive parts of a model.
Evaluation of model performance often requires comparison of model simulations with observed outcomes. Simulations from LCMs usually produce maps of land use, land cover, or some other land-related variable. A standard approach to evaluating the simulation of a land change model is to develop the model through calibration with historical data, for example using two or more maps of land cover during the calibration time interval. The calibrated model then simulates a validation to another time point for which reference data are available. The map of simulated change is then compared with the map of actual reference change during the validation interval to evaluate the differences based on some set of metrics. This comparison requires three maps: the reference map at the start time of the simulation, the reference map at the end time of the simulation, and the simulation map at the end time of the simulation. This three-map analysis shows how the simulated change compares to the reference change by revealing five components: (1) reference change simulated correctly as change (i.e., hits), (2) reference change simulated incorrectly as persistence (i.e., misses), (3) reference persistence simulated incorrectly as change (i.e., false alarms), (4) reference persistence simulated correctly as persistence (i.e., correct rejections), and (5) reference change simulated incorrectly as change to the wrong gaining category (i.e., wrong hits) (Pontius et al., 2011). The relative value of each of these five components can be used to compute quantity disagreement and allocation disagreement (Pontius and Millones, 2011).
The three-map comparison and its five components reveal the accuracy of the land change model versus a null model that predicts complete persistence. Where the land change model generates a miss, the null model would also produce a miss. Where the land change model generates a false alarm, the null model would produce a correct rejection. Where the land change model obtains a hit or a wrong hit, the null model would produce a miss. Thus, if the modeler computes the five components of the three-map comparison, the modeler has produced a comparison with a null model. A frequent blunder is to compute a two-map comparison between the reference map at the end time of the simulation and the simulation map at the end time of the simulation. This two-map comparison cannot distinguish between correctly simulated change (i.e., hits) and correctly simulated persistence (i.e., correct rejections).
After the modeler sees the map of the five components, there are a variety of more detailed ways that the modeler can compare the pattern of simulated change versus the pattern of reference change. There are a plethora of pattern metrics that consider the spatial distribution of the patches in the map. Such metrics can consider the patches’ numbers, sizes, and shapes. The particular research question should dictate whether details concerning the configuration of the patches in the map are important. For example, if the application concerns biodiversity protection, then it is likely to be important to consider whether forest is in one
large patch or several smaller patches. If the goal is to measure the quantity of carbon emission, then the configuration of the patches is probably less important. It can be tricky to select a metric that is mathematically rigorous, intellectually assessable, intuitively interpretable, and practically useful (see Box 3.1). A necessary best practice is to match the measurement of the model with the purpose of the modeling exercise for the particular application. This is an area that requires more research.
Whatever metrics the modeler adopts, it is important to use the metrics to
The Challenge of Selecting a Pattern Metric
Selecting an appropriate pattern metric that can indicate process is a challenge. Many modelers are interested in measuring the output of maps based on the spatial pattern metrics of the maps, such as number of patches. The figure below contrasts three cases where we compare the land change between two time points. All three cases have one patch of forest at time 1 and demonstrate a process where deforestation occurs on the edge between forest and nonforest. However, this single process generates different patterns due to interaction between various initial configurations and quantities of change. In this example, case A has a different initial configuration than cases B and C, while case C has less deforestation than cases A and B. Cases A and C have one forest patch at time 2, while case B has two forest patches. This illustrates how the number of patches can be sensitive to an interaction between the configuration of the initial landscape and the quantity of change.
compare the output from a land change model to the output from a corresponding naïve model that is applied to the same study site. A naïve model is one that is based on a simplistic conceptualization of the land change process and that offers a baseline that is easy to understand and implement. For example, a naïve model of deforestation could allocate the simulated deforestation on the edges of the initial forest patches. Then the output from the naïve simulation could be compared to the output from a more complex model. It is important to compare the output from a complex model to the output from a naïve model to measure whether there is any increase in predictive ability in the more complex model. A naïve model might use randomness to allocate change, but researchers frequently already know that the process of change is not random; thus, a random model is likely to produce an extremely low baseline. A naïve model that is based on one simple idea such as proximity to a single feature is likely to generate a much more challenging baseline than randomness. For this reason, it can be misleading to use metrics, such as kappa, that compare model output to a random pattern (Pontius and Millones, 2011). The literature sometimes uses the term neutral model to convey the idea of a naïve model that offers a baseline for comparison to a more complex model; however, if neutral models are based on randomness, then such neutral models are likely to produce an unchallenging baseline.
If there is no baseline for comparison, then the investigator is frequently tempted to use universal standards for model performance, such as defining good as greater than eighty-five percent agreement between the simulated map and the reference map. Universal standards for model performance are problematic, because they are by definition not specific to any particular research question or study site.
The concepts of equifinality and multifinality also need to be considered when selecting a metric for model assessment, especially when that metric measures only the pattern in the output map (Brown et al., 2006). Equifinality is the situation where two different processes produce the same result. For example, uniform versus highly variable patterns of risk aversion might, in some settings, produce identical patterns of agricultural activity. In this situation, it is possible that the model uses an incorrect process to produce the correct pattern.
In other cases, a process-based model uses the correct process to generate an incorrect pattern. Multifinality is the situation where a single process has the ability to generate many different patterns. One possible cause for this phenomenon is path dependency, whereby a few initiating events occur due to a poorly understood process, and then those events trigger numerous other processes. For example, there might be tremendous uncertainty where a corporation will build a facility, but then the facility generates urban growth near wherever it is placed. Thus, a model can simulate correctly the process of growth that follows the initial siting of the facility, but the model realizes that there is uncertainty in the placement of the initial facility. In this situation, a process-based model simulates the
correct process, but it might not produce the correct pattern, as measured by a particular pattern metric.
Models may have predictive accuracy in the sense that they generate predicted land use patterns that exhibit a close correspondence to the actual land use pattern at some point in time. Models can also have process accuracy, which Brown et al. (2005) define as consistency between real-world processes and the processes by which locations or land use patterns are determined in the model. Devising ways of validating model processes remains a challenging task in part because the underlying processes that give rise to observed land use patterns are themselves not fully observable. In addition, because more than one process may generate a qualitatively similar land use pattern, there is not a one-to-one mapping between the hypothesized underlying process and the predicted pattern. Finally, interactions and other sources of nonlinearity imply that many processes related to land changes may be path dependent, in which case small random or poorly understood shocks in the process may cause large deviations in observable land use–pattern outcomes. The implication is that the underlying process cannot be discerned based on the observed patterns and additional information is needed to identify the underlying process (Epstein, 2006).
Process validation may occur at several different levels of modeling. In the simplest case, the focus may be on identification of one or more key structural parameters of the process. Data over time can be extremely useful in addressing this challenge. Panel data techniques commonly used in econometrics permit the researcher to control for unobserved spatial heterogeneity, for example, by controlling for all spatial dependence so that a causal effect can be identified. Robustness checks are a common means of validation, for example, by using an alternative identification strategy or a falsification test that can discern spurious effects of the data.
Grimm et al. (2005) offer an approach to process validation that makes use of comparisons of observed and predicted patterns, like those outlined in the previous section. However, in their pattern-oriented modeling approach, the emphasis is on identifying multiple dimensions of pattern that may be very different from one another in character. For example, depending on the goals of the model, the different patterns produced by an agent-based model that could be compared with data, could include maps of land cover, distributions of income, rates of deforestation over time, and numbers of actors engaged in off-farm income. The patterns are classified as primary patterns (i.e., those that the model was built to explain) and secondary patterns (i.e., those that the model can generate but are secondary to its primary purpose). The argument is that the more patterns a model can reproduce, and the more disparate those patterns are in character, the more likely we are to be able to validate the mechanisms by which the model produces those
patterns. This is an indirect approach, but offers promise for structural models (like agent-based models) that can produce various types of outcomes.
More challenging is validating the assumptions that are necessary to specify a model. For example, models of land development or household locational choice are based on maintained assumptions regarding the structure of producers’ costs or households’ preferences from which the specific functional form of the model is derived. Validating these assumptions requires collecting additional information, devising strategies to test these maintained assumptions, and quantifying the degree of uncertainty surrounding these assumptions given that they are sometimes unverifiable. Kuminoff (2009) provides an example of how the maintained assumptions of functional form, preference distributions, and neighborhood delineation (all used in structural econometric models of household locational choice) can be assessed in terms of their influence on model results. This approach provides a means for quantifying how uncertainty regarding the maintained assumptions of the model impacts the model’s predictions by clarifying how each assumption influences the model results. Brown et al. (2005) proposed a strategy for quantifying the degree of spatial uncertainty that arises when processes are path dependent, which limits the model’s predictive accuracy. They concluded that it is possible to determine an appropriate level of path dependence or stochasticity in the model by comparing results from one model across a wide range of models and landscape patterns. More work along these lines is needed to validate process-based models and to evaluate the reliability of a model’s predictions, which is particularly important for guiding policy. This need applies equally to structural and reduced-form economic models as well as agent-based models that rely on a number of maintained assumptions about the agent bidding and market-interactions processes.