Evaluating Interventions in History: The Case of International Conflict Resolution
Paul C.Stern and Daniel Druckman
When decision makers at any level organize interventions to prevent, mitigate, or resolve international conflicts, they are at tempting to change the course of history.1 They therefore need to learn lessons from history about why past interventions had the results they did—that is, they need to have the interventions evaluated in a relatively systematic way.
Opinions are sometimes quite polarized about whether and how a scientific approach to evaluation can help conflict resolution practitioners. One extreme view, sometimes attributed to social scientists, follows from a simple model of how science gains knowledge and how that knowledge is used. Classical thinking about infectious disease illustrates this model. Each disease is caused by a specific microorganism, and the microbiologist’s task is to identify that organism so that applied scientists can find effective vaccines and treatments. When a disease follows the classic pattern, every case is sufficiently alike that, once it is correctly diagnosed, the prescribed prevention or treatment will be universally effective. Various scientific methods are used to identify infectious agents, develop vaccines and treatments, and evaluate their effectiveness and safety.
Conflict resolution practitioners are like physicians in that they work to prevent or control noxious situations.2 Few of them, however, believe that violent international conflict follows the classical model of infectious disease in which each condition has a single cause and a small number of effective treatments that can be identified and evaluated by scientific analysis and applied independent of the situation.3 Practitioners are typi-
cally suspicious or even contemptuous of generalizations put forth as “scientific.” The extreme view sometimes attributed to them is that, because each international conflict situation is unique, scientific approaches that seek general laws cannot provide useful insights. In this view, useful knowledge is highly case specific. It requires detailed understanding of the cultural, political, and historical contexts affecting the parties to a conflict and experiential knowledge about the parties, their motives, and their susceptibility to influence. In this view, useful knowledge can be gained but not from systematic scientific investigation.
Neither of these extreme views is satisfactory—and neither actually describes what competent scholars or practitioners do. International relations scholars do more than apply standard scientific techniques of measurement and analysis when they try to understand the causes of international conflict and its cessation. They know that the phenomena are difficult to categorize and quantify and virtually impossible to manipulate in the style of experimental microbiology. The best a scientifically oriented international relations scholar can hope to do is to apply some of the methods of social science, such as event analysis, comparative case study research, simulation, and modeling, and make inferences carefully and judiciously, understanding the limitations of each method. In making inferences, competent social scientists act a bit like practitioners, taking advantage of detailed knowledge about specific cases and their contexts to temper the conclusions that may seem at first glance to flow from their analyses.
Similarly, skilled practitioners do more than rely on case-specific knowledge to guide their actions. They typically search history for similar situations and are influenced by their judgments of what was effective in those situations. For example, a foreign minister’s expressed desire to avoid “another Munich,” “another Vietnam,” or “another Somalia” is likely to be more than rhetoric used to justify a decision in the face of political opposition. A practitioner who sees striking similarities between a current situation and the situation preceding a well-known policy failure of the past is likely to treat very seriously the notion that the approach that failed before would fail if tried again (e.g., Khong, 1991). Thus, skilled practitioners benefit from acting a bit like social scientists: examining a body of presumably relevant evidence, drawing tentative conclusions from it, and making inferences about the likely effects of the interventions they are contemplating in the new situation. They temper those inferences, of course, with their specific knowledge of the current situation. But to the extent that they believe history holds lessons for them, they are acting like empirical social scientists and ought to find it useful to have thorough and carefully considered analyses available. There are serious dangers, of course, in relying on single historical analogies for
policy guidance (Neustadt and May, 1984). Practitioners can gain more reliable insights from more sophisticated social science approaches, such as the careful analysis and comparison of several relevant cases (George, 1979).
It is possible to move beyond caricatures of social science and diplomatic practice by distinguishing among three types of knowledge, all of them useful to conflict resolution practitioners. One is case-specific knowledge of the current situation facing the parties to a conflict; the historical, cultural, and geopolitical contexts of the conflict; the internal dynamics of the decision-making groups for each party; the political pressures affecting decision makers on each side; and so forth. A second, which George (1993) calls generic knowledge, crosses situations and focuses on particular strategies of intervention. It takes the form of propositions that, under certain kinds of conditions, a particular type of intervention can be expected to yield certain kinds of outcomes. To make such generic knowledge useful in practice requires not only that the propositions be correct but also that the practitioner can accurately classify the situation at hand as to the types of conditions present. Thus, making generic knowledge useful requires case-specific knowledge. George (1993) also discusses “actor-specific behavioral models,” which include general propositions about the behavior of a particular actor, such as the state leader who is the target of an influence attempt. These propositions take the form that under certain kinds of conditions the target actor can be expected to behave in certain ways. When an actor-specific behavioral model is correct, it offers a form of generic knowledge about an actor. As with generic knowledge about intervention strategies, generic knowledge about actors must be combined with case-specific knowledge to be of practical value.
It may be presumed that case-specific knowledge comes only from practical experience and that generic knowledge comes only from systematic research and analysis—that specific knowledge is “practical” and generic knowledge is “theoretical.” We do not accept either of these presumptions. For example, generic studies of actors and strategies can create typologies of situations that are very useful for building case-specific knowledge—the concepts tell observers what to look for in a situation. And generic knowledge can be greatly informed by the introspection of experienced practitioners who have developed useful practical distinctions among situations and working hypotheses about how conditions affect outcomes that can be tested and refined by systematic research. Thus, we are suspicious of theory in this field that does not have a strong basis in practice, and we accept the aphorism attributed to psychologist Kurt Lewin that there is nothing so practical as a good theory.4
Our concern here is with how systematic analysis can help distill the
lessons of history and thus aid the practice of international conflict resolution. Social scientific analysis can make practical contributions in several ways. It can help diplomatic practitioners check their tentative judgments about the lessons of history against the evidence and confirm or refine their judgments accordingly. For instance, it can test inferences from history against a wider range of relevant historical evidence and thus help keep practitioners from making errors because of gaps in their experience or overreliance on single historical analogies. Analytical studies can critically examine the assumptions underlying conventional wisdom about which interventions work under which conditions and may sometimes reveal weaknesses in policy thinking and suggest ways to refine it. By examining historical cases systematically, analytical studies can identify the conditions that have been favorable to the success of a particular strategy in the past and thus help practitioners identify aspects of a new situation that are especially important to consider in making policy choices. They may also identify past situations that practitioners have not considered that may have useful lessons to teach.
Although social science can be useful to conflict resolution practitioners, it does not replace judgment. An analogy to the way medicine uses biological science can help clarify what social science can and cannot offer. Medicine is a practice that has a scientific base. Physicians use biological science in diagnosis to tell what signs, symptoms, and test results are the best indicators of the nature of a patient’s disease. To make an accurate diagnosis, however, a clinician must also rely on case-specific knowledge and clinical judgment. This includes not only the specific patient’s test results but also clinical knowledge about how to interpret evidence (e.g., patients’ reports of symptoms) and judgment about how, in a particular case, to combine evidence from different sources (symptoms, physical examination, lab tests) that may not all point to the same diagnosis.
In treatment, physicians also draw on both scientific knowledge and clinical knowledge and judgment. Scientific research can tell which treatments are generally most effective and identify special conditions under which the usual treatment is contraindicated and an alternative treatment should be tried. But case-specific knowledge is required, among other things, to determine whether special conditions apply and to decide whether the patient will accept the usual treatment or might, because of other medical conditions or personal characteristics, respond better to an alternative treatment.
Social science can aspire to be useful to conflict resolution practitioners in the same ways that biological science is useful to physicians. It can develop and refine taxonomic categories that make it easier to accurately diagnose conflict situations, and it can develop empirically supported
general propositions about the conditions under which and the processes by which particular interventions are likely to ameliorate particular kinds of conflict situations. But practitioners still have to rely on their judgment and experience as well as case-specific information (e.g., field reports) to diagnose situations, interpret ambiguous information, select interventions and combinations of interventions, and choose the right time to act. They must also judge how to deal with constraints on choosing the best-quality policy, such as the need for policy support, the limits on resources for policy analysis, and the impacts of decisions on other policy goals and domestic politics. And they must make choices about how risky a policy to adopt, how to resolve value conflicts embedded in policy choices, and the relative value of expected short- and long-term benefits. George (1993) provides a more detailed discussion of the major types of judgments practitioners must make for which social science can presently offer little assistance.
The medical analogy is imperfect in that the social science of international conflict resolution is not as well developed as the biological science of disease. Because of the nature of international conflict, there are reasons to believe it never will be. The next section identifies and critically discusses the key challenges of taking a social scientific approach to evaluating interventions for international conflict resolution. It identifies the most serious obstacles to achieving a quality of knowledge that meets rigorous scientific standards. The following section suggests ways to make progress in the face of these obstacles. It proposes strategies for developing useful evaluations of conflict resolution techniques even in the face of the impossibility of achieving the highest levels of verification. We conclude that a systematic approach to learning from the experience of conflict resolution based on social scientific techniques and concepts can yield useful generalizations about what works under which conditions and thus make a modest but important contribution to practitioners’ skill. We also identify strategies for developing and validating these generalizations.
CHALLENGES OF EVALUATION
Compared with evaluating the efficacy of a medicine for malaria, it is very difficult to draw firm conclusions about the effectiveness of an intervention to reduce, eliminate, or transform an international conflict. This section identifies the major difficulties that face a social scientist who would like to evaluate such an intervention. Although none of these methodological difficulties of evaluation are unique to international conflict resolution, the scale of the difficulties and their conjunction around this particular kind of social intervention make the evaluation of interna-
tional conflict resolution efforts different from the evaluation of many other kinds of social interventions.
In the standard model of social science, researchers develop and test hypotheses about relationships among variables, including causal relationships. For conflict resolution interventions the key variables are the types of intervention; the consequences of those interventions, judged in terms of success; and the factors that may ultimately influence these consequences. A researcher must define each of these with sufficient clarity to allow other researchers and practitioners to duplicate the researcher’s procedures or ratings of events and situations. It is difficult to achieve this level of clarity with the phenomena of international conflict for several reasons, as this section shows.
Defining the Intervention
Interventions in international conflicts can be considered analogous to treatments in scientific experiments, but neither practitioners nor researchers are as precise in defining types of interventions as scientific canons prescribe. For social scientific analysis it is critical to define each type of intervention precisely enough to know how to classify each specific case. But the terms that describe international conflict resolution activities are not nearly this precise. A single term often refers to a family of related procedures with varying objectives rather than to a single “treatment.” For example, peacekeeping missions consist of many activities serving many functions in local, regional, and international contexts: peacekeepers may be stationed between combatants as an interposition force following a ceasefire, may defend the victims of international aggression, may monitor elections following a peace agreement, may restore law and order in the absence of government authority, may quell civil disturbances, and may establish safe havens or “no-fly” zones. The definition of peacekeeping has expanded with the increasing number of operations over the past decade (Diehl et al., 1998). Similarly, the term interactive conflict resolution and related terms such as problem-solving workshops and interactive problem solving have referred to a variety of interventions that have some overall similarities but also considerable differences in their operations and objectives (see Fisher, 1997; Saunders, Chapter 7; Rouhana, Chapter 8). Some aim to develop concrete proposals for immediate action by the parties to a conflict, while the immediate goals of others are limited to improving mutual understanding and establishing informal lines of communication. Even the traditional conflict resolution approaches of negotiation and mediation re-
fer to a variety of forms and processes. Negotiation may be formal or informal; it may be bilateral, trilateral, or multilateral or it may occur as part of conference diplomacy. Mediation may take the form of facilitation, good offices, the use of ombudsmen, or even slip into arbitration; it may be practiced by parties who can and do use material inducements or threats or by mediators who have little power of this sort.
Should activities with such different content be lumped together for evaluation? Perhaps not, because the factors that affect success may not be the same for all of them. The use of a common umbrella term like peacekeeping or mediation suggests some similarity of purpose (various peacekeeping missions aim to improve relations among conflicting groups in a region, and various forms of mediation seek to help the parties to a conflict find a common ground that can lead to lasting agreements). But is such a common purpose a good enough guide for classifying interventions? If not, how should classification be done? The challenge is a serious one.
Umbrella terms are particularly appropriate if there is a useful conceptual model to go with them. For instance, theories of deterrence provide a conceptual model within which it is possible to understand a variety of policies as instances of the same general concept and to offer postulates about which deterrence strategies are likely to work well in which situations. If a single conceptual model can do this for peacekeeping or interactive conflict resolution, it would demonstrate the usefulness of the umbrella terms.
Conflict resolution interventions are generally intended to alter the course of events in a particular direction, usually from violent to nonviolent interactions or, more ambitiously, to transform relationships from hostile and unstable to friendly and enduring (i.e., they may aim for “negative peace,” defined as the reduction of violence, or “positive peace” defined in terms of transforming relationships; Galtung, 1969). The absence of violent conflict is the most obvious observable criterion for success of a conflict resolution technique. But it is not the only possible criterion, and it may not be the best one. Some analysts have recommended measuring success in terms of specific changes in a peace process that indicate progress toward a negotiated settlement or a lasting peace. For example, Stedman (1997; Chapter 5) defines success as the weakening of actors opposed to the peace process vis-à-vis those engaged in it. Such process-based criteria can be assessed independently of the intensity of violence in the short term and may be preferable indicators under some conditions. For instance, some spurts of extremist violence during the
Israeli-Palestinian peace process during the 1990s occurred as a direct consequence of progress in peace talks and had the immediate effect of bringing the negotiating parties closer together.
Some observers, stressing international norms of human rights, self-determination, or democratic participation, suggest that conflict resolution efforts should not be considered successful without improvements in these aspects of the well-being of people affected by the conflict. Sometimes, the violence of civil war has been greatly reduced by the establishment of a repressive and authoritarian regime (e.g., Zaire in the 1960s), but many observers would not consider this outcome a success or an instance of true conflict resolution. This is a good example of the achievement of negative peace without positive peace.
The definition of success may also vary with the standpoint of the judge. The principals to the conflict, various interested third parties, and representatives of international and nongovernmental organizations may all have different criteria of success. Sometimes, what looks like a resolution from a certain external standpoint may look quite different from the inside. U.S. interventions to resolve the conflicts in Guatemala and the Dominican Republic in the 1950s and 1960s may have looked like conflict resolution from Washington, but to many Latin American observers the result was an imposed repression. Also, elites and general populations in a country in conflict may see success differently. A settlement that seems successful to national leaders or to outsiders who claim to see the big picture may seem not to be a resolution at all to members of populations forced to sacrifice as part of the settlement. Historical examples include populations that were moved between Greece and Turkey after World War I and between India and Pakistan after partition; the Bosnian conflict of the mid-1990s is likely also to seem unresolved from the perspective of groups that feel aggrieved by the settlement. If different parties have different definitions of success, which one is an analyst to use? The issue here is that many settlements have winners and losers, and in such cases the winners are likely to consider the settlement more successful than the losers do.
It is also difficult to define success when an intervention has multiple or competing goals. An example was the economic sanctions against South Africa under apartheid. One goal was to reduce intergroup violence; another was to achieve adherence to international norms, such as human rights. The two goals were not entirely compatible. For some participants in the embargo, a period of increased internal violence in South Africa was an acceptable price to pay for changes that would establish human rights and, eventually, majority rule.
To the extent that there is no consensus on what constitutes success, it is difficult to judge whether it has been achieved. A possible solution to the problem of defining success is to define multiple criteria and to judge
the effectiveness of an intervention separately against each criterion. We return to this possibility later.
Setting Reasonable Expectations—How Much, How Soon?
Closely related to the challenge of defining success is that of setting reasonable expectations—deciding how high to set the bar. The challenge here is to be clear about how much change to expect from an intervention. Some interventions are expected to do only part of the job of resolving conflict or preventing violence, so it is unfair to judge them failures simply because the whole job remains incomplete. For example, an economic sanction may be designed to get a party to negotiate. If the negotiation then fails to yield a settlement, it is unreasonable to judge the sanction a failure. Similarly, a single problem-solving workshop with a few members from the opposing sides in a civil war cannot reasonably be expected to end the war by itself, though it may contribute to that result by improving communication. If an intervention is expected only to contribute to conflict resolution and not resolve the conflict by itself, evaluation requires clarity about what it is expected to contribute.
It is also important to set the appropriate time for assessment. How long should it take for an intervention, such as economic sanctions, to work? What looks like failure at one time might later turn into success. On the other hand, a settlement that looks successful in the short run may lead directly to violent conflicts in the future. The classic example is the 1919 Treaty of Versailles, which brought to a close the “war to end all wars” but generated resentment that contributed to World War II. A more recent example may be the consociational governmental arrangements in Lebanon that seemed to be successfully managing conflict into the 1970s but that may have contributed to conflict later on, when the formulas for group representation no longer fit the distribution of the groups in the population.
Thus, evaluation requires setting reasonable expectations as to what a particular intervention should accomplish and over what period of time. Observers may disagree not only about the appropriate definition of success to apply in an evaluation but also on how much change the intervention should reasonably be expected to accomplish and on the time period that should pass before pronouncing success or failure.
Identifying Relevant Characteristics and Contingencies
Scholars and practitioners are well aware that the consequences of any intervention depend not only on the intervention itself but also on the way it is carried out and on external contingencies that may influence its
outcomes. The latter may include unexpected events in other parts of the world, domestic political and economic forces in the countries in conflict or those intervening, personal characteristics of leaders, and so forth. The relevant contingencies may include both preexisting conditions and events that intervene in time between the intervention and its expected effects. Evaluation efforts should seek to determine the effects of an intervention holding such contingencies aside or, better, to specify how the outcomes depend on the conjunction or interaction of the intervention with particular contingencies. But which factors are likely to be important? It is difficult to know this a priori; consequently, specifying the important contingencies is a continuing challenge.
Selecting Cases for Analysis
Generalizations in social science are supposed to apply to some universe of cases. The standard social scientific approach to developing empirical generalizations is to define or enumerate the universe of cases of interest and, if there are too many cases to study them all, to investigate a representative sample of cases—that is, a sample that approximates the entire population in terms of the distributions of the key independent and dependent variables that will eventually become part of the explanation. There are formidable difficulties in applying this scientific approach to the study of international conflict resolution; however, other case selection strategies can help in developing useful generalizations (see below).
Enumerating the Universe of Cases
The appropriate universe of cases is determined in part by the choice of how broadly or narrowly to frame the topic under analysis. This so-called frame of comparison issue (Collier and Mahoney, 1996) is typically discussed in terms of how broadly or narrowly the independent variable (the type of intervention) is defined. This might be called the conceptual framing of the topic. A narrowly conceived topic makes research easier because the universe of cases is smaller, and, because the cases are likely to be more homogeneous in their cause-and-effect relationships, these relationships are easier to discern. However, conclusions drawn about a narrowly defined universe are not intended to apply outside that narrow frame. A more detailed discussion of such tradeoffs appears in Collier and Mahoney (1996). It is sometimes useful to divide a type of intervention into subtypes, to study each separately, and then to compare results to see if the subtypes follow different processes or produce different outcomes.
Topics are also framed historically. In recent years, for example,
researchers have been concerned about whether the end of the Cold War has so changed the international context that pre- and post-1989 conflict resolution efforts should be treated as parts of different universes. As with decisions about conceptual framing, the choice of a historical frame presents a tradeoff in which a narrowly defined research task is easier but yields more limited results. Comparing different time periods can illuminate both the similarities and the differences between them.
Even if an analytical problem is given a clear frame conceptually and historically, it may still be difficult to enumerate the universe of cases. Interventions such as negotiation are so widespread that it is virtually impossible to locate all cases, although this may be less of a problem for official intergovernmental talks. Further complicating this issue is the fact that many negotiations and third-party activities are kept secret, based on the assumption that secrecy may contribute to effectiveness. Recent evidence suggests that secrecy enhances flexibility in negotiation (Druckman and Druckman, 1996).
Moreover, it can be difficult to determine accurately whether an intervention was a serious effort at conflict resolution or only a symbolic action. Often threats, economic sanctions, and the delivery of foreign aid are publicly represented as if the intent is to help resolve conflict in the target country when the main objective is something else, such as to placate public opinion in the country that took the initiative. It may be inappropriate to treat a purely symbolic effort analytically as part of the same universe as a serious intervention, even if symbolic efforts may have an impact on the conflict.
Getting an Appropriate Sample
A serious difficulty in the study of diplomatic activity is that the universe of known cases may be a biased sample of the full universe of cases. For instance, successful third-party mediations tend to be widely publicized, but many failures are kept hidden. If the known cases are biased toward success and lacking in cases that exemplify the routes to failure, a representative sample of the known cases would have the same biases. A similar problem of bias arises if all cases can be identified but the subset with adequate data for analysis forms a biased sample (e.g., if data are systematically lacking on mediations involving authoritarian governments). These possibilities can make it quite difficult to determine whether a sample of conflict resolution interventions is appropriate for drawing inferences about the universe of instances in which a technique was used.
Even when one can be fairly certain that all cases of a particular type of intervention are known, there remain serious problems in selecting an appropriate sample of them. One solution, only rarely available, is to
examine the entire universe of cases. For example, Blechman and Wittes (Chapter 3) examine what they say are all cases between 1989 and 1997 in which the United States threatened to use force against another state to advance its international political objectives. Although there can be no objection to this sampling strategy, it does not lead to unequivocal conclusions. Typically, one can examine the universe of cases only when it is quite small—and in such cases the data are typically rich enough to be consistent with more than one explanation for the available cases. Thus, the conclusions are likely to be only tentative, even regarding the cases examined. The conclusions should be considered even more suspect if they are to be generalized to future cases because with a small number of cases—even when this number includes all cases that exist—little is likely to be known about whether the outcomes are contingent on conditions that were common to all past cases but might not hold in the future. In short, sometimes there is not enough historical experience to draw firm conclusions, even by examining all of the cases.
When all cases cannot be examined, sampling theory prescribes random selection from the universe to assure representativeness—at least when large sample sizes can be analyzed. Large-sample analyses have occasionally been carried out on international conflict resolution techniques (e.g., Bercovitch, 1997, on mediation; Druckman, 1997, on negotiation), but typically only a few instances of the use of an international conflict resolution technique are available for study or resources are insufficient to study a large sample. In such situations the statistical theory of randomization warns that randomly selected small samples may not closely approximate the population because the presence or absence of a particular extreme case can have a strong effect on the sample average. For testing simple bivariate hypotheses, sample sizes in the dozens are typically necessary to assure representativeness and meaningful results; examining multivariate relationships involving complex contingencies may require samples in the hundreds.
Because of these difficulties of case selection, the strategy of representative sampling often fails to bring the benefits to the study of international relations that it brings, for instance, to survey research. Thus, it is not clear that a representative sample is always the most useful one for building generic knowledge. Because of this and for theoretical reasons, researchers sometimes use purposive sampling, in which cases are selected on theoretical grounds according to a taxonomy that specifies the important types of cases that should be considered and the variables that should be observed. There is a lively debate among methodologists about whether purposive samples have inherent limitations, particularly when selection is based on the values of an outcome variable, such as in a study restricted to “successful” interventions. Discussions of the issue of selec-
tion bias can be found in Achen (1986), Geddes (1990), King et al. (1994), and Collier and Mahoney (1996). We return to this issue in a later section.
A serious problem with reasoning from small samples, even if they are randomly selected, is that any factor that is constant in all of the cases studied will not show up as important in the results of the study. For example, the Cold War international regime is a constant in all pre-1989 studies of conflict resolution techniques, so these studies cannot determine whether bipolarity in the international system moderates the effectiveness of the techniques studied. The problem is a general one: it is always appropriate to ask whether new historical conditions invalidate the conclusions of past research.
Researchers should therefore be cautious and self-critical about claims that the cases they examine appropriately represent the intervention type about which they wish to generalize. They should be especially alert to the possibility that their samples may not include variation on particular variables that may prove important to the success of an intervention technique. This may happen because of limitations of the sample or because the universe of known cases has limited variation.
Observing and Measuring Interventions and Outcomes
A characteristic problem with historical data is that the events usually cannot be directly observed or measured as they happen. They must typically be observed indirectly and with hindsight, making it impossible to have the sorts of reliable and dispassionate observations that scientists rightly prize. Thus, interpretation is required to determine which interventions were used and what their outcomes were. Researchers may have access to press accounts, to documentary evidence, and to the recollections of actual participants, but these different sources of evidence have characteristic biases and sometimes tell quite different stories. Moreover, experience suggests that understanding of what happened sometimes changes as new sources of information become available. Analyses of recent historical events are particularly vulnerable to unrecognized biases and gaps in the available data because the normal processes of research and academic debate have not yet revealed them. Measurement problems are especially severe in evaluating what happened in the course of events that were not widely observed, such as closed negotiations or unpublicized mediation efforts, and how events might have been affected by the perceptions of parties in a conflict situation who are not available for interviews or whose recollections are suspect.
Challenges of Inference
Making causal inferences about conflict resolution efforts—the main objective of evaluation—is risky business. The strongest evidence that social science can provide about causation comes from controlled experiments, and history does not lend itself to such experimentation. It is rarely possible to achieve the necessary control and when it is, this is usually unethical or politically unacceptable. Consequently, analysts usually rely on various forms of nonexperimental data—events occurring in time without careful manipulation and control. They examine interventions, outcomes, and extrinsic events and attempt to infer causation.
In this approach the main challenge of inference is that, although interventions always precede outcomes, they may not cause them. Events outside the control of the intervening actors may lead a conflict to intensify despite a set of interventions that would otherwise have been effective or to diminish even though the deliberate interventions have had no effect. Such associations between events are called spurious. Because each international conflict situation is unique in some way, it is difficult to draw firm conclusions from historical experience and particularly to make judgments about the causal efficacy of interventions. This section discusses some of the reasons for these difficulties. A later section discusses ways to address them.
Comparing Events with Counterfactuals
To support a conclusion that an intervention had a particular effect requires answering the following question: What would have happened if the intervention had not been tried when it was? It implies a comparison between what actually happened after the intervention and alternative histories in which the intervention was not tried, or was tried earlier or later in the conflict, or a different intervention was tried. This involves comparisons with hypothetical or counterfactual worlds, which history does not provide (Tetlock and Belkin, 1996). As discussed in the next section, social scientists have developed an array of techniques that attempt to find appropriate comparison conditions from the real world as a substitute for controlled experimentation and for the counterfactual worlds that would provide the most convincing answer to the “what if…” question: for example, simulation (Guetzkow and Valadez, 1981), quasi-experimentation (Cook and Campbell, 1979), analysis of coded sets of events data (Bercovitch, 1986), focused case comparisons (Faure, 1994; Putnam, 1993), process tracing (George and Bennett, 1998), and surveys. Each of these analytical strategies is problematic, but each has different limitations. Used together, the social science approaches can
reduce the difficulties posed by the need to make inferences about counterfactuals.
A particularly difficult challenge in making inferences concerns comparisons between what actually occurred after an intervention and what would have happened if the intervention had been considered but not used. This comparison is difficult for two reasons. First, it can be extremely difficult to find cases in which an intervention was considered but not used because there may be no record of the rejected alternative. Second, such cases are unlikely to be fully comparable to ones in which the intervention was used because they will tend systematically to have characteristics that the decision makers believe predispose the technique to failure.
Assessing the Roles of Extrinsic Events
Evaluation requires determining how much an outcome should be attributed to specific conflict resolution efforts and how much to events independent of those efforts. For example, an outcome may be predetermined by an antecedent condition and would have come about even if the intervention being studied had not been used. Once a peace process gets to a certain point, for instance, it may become inevitable that the parties will seek mediation and that it will be successful. If the conflict is then satisfactorily resolved, the mediation may deserve a small part of the credit but not much. Or the form of a new democracy’s electoral system may be strongly determined by its history. If the system put in place is a historical accident rather than a choice, the consequences, whether peaceful or not, cannot reasonably be attributed only to the structure of the electoral system.
Outcomes may also be affected by events subsequent to a conflict resolution effort that are not part of that effort but affect the outcomes. Consider, for example, the assassination of Israeli Prime Minister Rabin in the midst of his negotiations with the Palestinians. That event probably changed the course of the negotiations—in fact, the assassin intended that it do so, although he may not have achieved the result he desired. But this dramatic turn of events makes it very hard to discern the roles of the various interventions that had been intended to move the negotiation process forward. Would they have yielded the same outcome if the assassination had not happened?
The longer the time between an intervention and its expected effect, the harder it is to evaluate the intervention because there is more time in which intervening events can occur and influence outcomes. Thus, the inference problem is especially serious with techniques that are expected or intended to have delayed effects, such as economic sanctions, efforts to
build civil society in emerging multiethnic democracies, and interactive problem-solving workshops. To evaluate the effects of such delayed-effect techniques, it is important to postulate causal mechanisms that the intervention might set in place—to offer hypotheses or predictions about how the intervention will change the course of the conflict—and to identify indicators that can be used to assess the hypotheses or predictions. Such hypotheses make it possible to observe whether history is following a path postulated to lead to conflict resolution and thus to conduct partial or interim evaluations.5 For example, one way that problem-solving workshops may be effective over the long run is by establishing relationships of trust between potentially influential members of communities in conflict that years later facilitate agreements between the communities, when the participants in the workshops have risen to decision-making positions. It is possible to observe indicators of the operation of this causal mechanism: continued informal communication between workshop alumni from opposing sides after a workshop ends; back-channel communication between these alumni when they later have the opportunity to participate in formal negotiations; an increase in alumni in prominent leadership positions during periods just before breakthroughs in the negotiation process; and so forth. The explanatory strategy of comparing a theoretically predicted course of events with the flow of history has been called process tracing or monitoring (George, 1997).
Assessing Contingent Relationships
Even if extrinsic events do not affect outcomes on their own, an intervention’s outcomes are usually contingent on its context. Thus, practitioners want to know the conditions under which an intervention is likely to succeed. However, social scientific analysis of this issue is challenging. Taking a purely empirical approach may not be fruitful because there are normally a very large number of potentially relevant events going on relative to the number of interventions available for study. Thus, history normally leaves us with multiple, sometimes conflicting, explanations based on different causal variables and each consistent with events. In quantitative methodology this problem is often referred to as an over-specification or overdetermination problem or a shortage of degrees of freedom: there are a large number of variables available to explain the outcomes of interventions compared with the number of historical cases to test the explanations against (Campbell, 1975).
Since it is not possible to create new historical events to solve this problem, other approaches are needed. One way to increase the ratio of cases to variables is to simulate events or processes. The simulation can be construed as an experiment using an analog to the historical process of
interest. By replicating the simulation a number of times, it is possible to evaluate the effects of an intervention on any number of cases. However, inferences from simulations depend on the assumptions that the variables being studied are among the important ones determining the consequences of the intervention and that their effects are not modified in major ways by other variables not included in the simulation. We discuss simulation in more detail in a later section.
Researchers also develop and test theories about how or under what conditions particular interventions work. Such theories focus attention on a small set of variables and presume that the others make no difference to the effects of an intervention. For example, deterrence theory posits that deterrent threats are more effective when they are made by an actor who has both the capability and the commitment to carry them forward and when the recipient perceives the threat as sufficiently credible and potent (George and Smoke, 1974). This set of hypotheses contains few enough variables to allow testing against historical data. Theory can also help by making a series of intermediate predictions—predictions about the process an intervention is expected to set in motion. Such predictions allow each case to support a larger number of hypothesis tests. We discuss this approach, sometimes referred to in terms of “causal mechanisms” and “process tracing,” in more detail below. The challenge for theory is to develop the necessary hypotheses about the contingencies that matter for particular conflict resolution techniques or about the processes they set in motion. One value of the analyses in this book may be to help build such theories and hypotheses.
Accounting for Indirect Effects of Interventions
Peace processes sometimes take a twisting course in which the short-term effects of an intervention in turn alter events in the future, sometimes in the opposite direction. For example, progress in a peace process may lead to subsequent violence by “spoilers”—groups that are not participating in the process or that do not want it to succeed (Stedman, 1997; Chapter 5). The effects of the peace process on such spoilers can turn back on the peace process, sometimes derailing it but at other times increasing the resolve of the participants to reach a settlement. One implication of indirect effects is that there may be several causal paths to the same outcome (sometimes called equifinality or plurality of causes). For example, a program of interactive conflict resolution workshops may help advance the peace process in several distinct ways and might be counted successful if it moves the process along any one of these causal paths (Saunders, Chapter 7). Similarly, there may be several possible outcomes from the same set of initial conditions. These possibilities are important for evalu-
ation efforts to consider, though they can make it extremely difficult to attribute causation.
Accounting for Actors’ Perceptions
Sometimes an intervention is misperceived (e.g., a threat is not believed or an inducement is viewed as a threat). Such possibilities are discussed extensively in the literatures on the effectiveness of deterrent threats (e.g., George and Smoke, 1974; Jervis, 1976; Lebow, 1981) and on crisis decision making (Holsti, 1989; Tetlock, 1998). When misperception can be documented, it is misleading to blame the failure on the intervention technique, although it may be very useful to advise practitioners on how to act so that their intentions are correctly perceived.
The Context of Multiple Interventions
In most conflicts there are numerous interested third parties, and each of them may be doing several things at once to address the conflict. This situation makes it hard to assess the effect of any single intervention because the intervention cannot be meaningfully abstracted from other simultaneous interventions that may or may not be coordinated with it (see Kriesberg, 1996). In fact, diplomats often do several things at once specifically because they believe one intervention will make another more effective. Again, the challenge is to build theories of enough specificity to give guidance on how to examine the evidence. There are normally too many variables and too few cases to draw conclusions based on a purely empirical approach that examines all of the combinations of interventions and other factors that may influence the course of a peace process.
MEETING THE CHALLENGES
The above challenges are serious enough to indicate that social scientists should be modest about how much insight they can offer practitioners. The phenomena of international conflict are too complicated and too resistant to normal scientific methods to make it possible to produce simple lawlike generalizations. Nevertheless, each of the above challenges has faced other social scientific endeavors in the past, and some of the strategies that have been tried in other fields are available in this one. Even where these standard strategies have only limited value, there are ways to address the challenges effectively enough for careful analysis to add something useful to practitioners’ understanding.
We have noted that a lack of generally accepted concepts is a major problem for evaluating international conflict resolution efforts. For instance, concepts like peacekeeping and track two diplomacy do not have the same meaning to all observers, and their meanings may shift over time. Other social scientific fields have faced this sort of problem and have made progress, even while conceptual debates have continued. For example, the study of group cohesion has been characterized by conceptual disagreements since the 1940s. Social psychologists have disagreed about whether cohesion is a single attribute (e.g., morale) or a collection of them (e.g., morale, shared understandings, teamwork, productivity). They have also disagreed about whether it applies only (or primarily) to small groups or also to larger units such as organizations or even nations. The field has progressed because social scientists developed several competing concepts of cohesion with enough clarity and specificity to allow reliable measurements of each and with enough theoretical elaboration to generate testable hypotheses about how the various aspects of cohesion are expected to affect group behavior. With careful measurement of each clearly defined concept and with sets of testable hypotheses, it became possible to continue the conceptual arguments with an empirical referent. That is, it became possible to assess the usefulness of each concept of cohesion for understanding and predicting group performance. Eventually, a widely shared concept of cohesion may be adopted, but in the meantime much is being learned about group processes and performance (for a discussion of these issues, see National Research Council, 1988).
This intellectual history, which has analogs in several other social scientific fields, holds some lessons for the newer field of international conflict resolution with its many loosely defined concepts, such as peacekeeping and mediation. The most general lesson is that for a field to progress it must develop its concepts with sufficient clarity to allow for reliable measurement and must develop theories and hypotheses with enough specificity to allow for empirical assessment. Four types of conceptual advances are important.
First, the terms that define interventions need clear referents if the interventions are to be evaluated meaningfully. An important first step is developing taxonomies. For example, Diehl et al. (1998) have identified a variety of activities that have been called peacekeeping. This kind of taxonomic effort helps highlight important issues such as whether different kinds of missions require different kinds of training for peacekeepers. A continuing discussion of the peacekeeping concept is beginning to clarify the meanings of the term in the post-Cold War setting (e.g., Druckman and Stern, 1997). Some writers have identified a number of
distinct types of nontraditional diplomacy, or “tracks” (Diamond and McDonald, 1991).
Useful taxonomies of interventions are those that embody, at least implicitly, propositions that the variations between taxonomic types are associated with variations in outcomes, either generally or under certain conditions. To the extent that a taxonomy implies testable hypotheses, it takes an important step toward developing theory that can guide further analysis and refinement of the taxonomic categories (see Bennett and George, forthcoming, on taxonomic theory).
Several of the chapters in this volume, particularly those that cover relatively new interventions in the library of conflict resolution techniques, deal explicitly with conceptual and taxonomic issues. For example, Reilly and Reynolds (Chapter 11) attempt a classification of electoral systems into four broad categories that they believe are useful for finding the appropriate electoral structure for managing conflicts in different types of divided societies. Stedman (Chapter 5, also 1997) develops a typology of “spoilers” to peace agreements that he believes will help practitioners choose the best strategy for defeating spoilers and advancing peace processes. Ghai (Chapter 12) presents a classification of types of autonomy and decentralization of power, and Hayner (Chapter 9) develops a classification of transitional justice mechanisms and, within that, defines attributes that can be used to classify various truth-seeking efforts.
Second, the undifferentiated concept of “success” must be specified. Because success has so many possible meanings, it is useful to identify the various outcomes that signify success to at least some of the potential evaluators of an intervention. In this way an intervention can be evaluated separately against each outcome. Analytical efforts can focus on whether an intervention has particular outcomes rather than on whether it was a “success.” Of course, evaluating whether a particular outcome has occurred and whether it is an effect of the intervention or of other factors presents challenges of measurement and inference. We discuss these below.
The evaluation of peacekeeping missions provides an illustrative example. An exchange of views among analysts (Druckman and Stern, 1997) has identified a number of meanings of success: accomplishing the mission’s mandate, containing the conflict in the host state or region, advancing acceptance in the target area of larger values such as world peace and justice, and (especially for humanitarian missions) reducing human suffering among the local population. Because practitioners from different countries or organizations, such as the United Nations, nongovernmental organizations, or national military organizations, may have different goals for the intervention or different perspectives on success,
they may disagree on whether a particular intervention was successful. However, with the help of careful analysis, they may come to agree on what its outcomes were. Systematic analysis of what is responsible for each of the outcomes of peacekeeping missions can help inform practitioners’ judgments about whether and how to support particular peacekeeping missions, in light of their objectives.
Third, it is important to define reasonable expectations for an intervention: to select a time horizon for evaluation and—when the time horizon is long—to identify interim indicators of progress. If these choices are to be other than arbitrary, it is necessary to have a theory of the peace process and of how a particular intervention might affect it, or at least a set of working hypotheses about how the intervention might affect the peace process over time. Interactive problem-solving workshops provide a good example. Many of the desired effects, such as an overall decrease in intergroup hostility or the development of reliable channels for nonviolent resolution of grievances, take a long time to become manifest. The process is intended to change relationships and ways of thinking among the participants and, eventually, among the communities in conflict. It is fair for a practitioner to ask that evaluation not be finalized for years or even decades, but it also fair to ask in return whether there is any way to make an interim evaluation. Those who support slow-acting interventions should be prepared to estimate how long it will take for them to bear fruit and what the course of their progress might be, so that even if it is not possible to reach a final conclusion for many years, it is possible to judge whether the process is moving along as expected.
Fourth, it is important to develop conceptual frameworks that identify contingencies that may shape the outcomes of an intervention or processes by which it may have its effects. This is because the limits of historical data make it virtually impossible for a purely inductive approach to yield useful understanding of how, when, and by what mechanisms an intervention works. Conceptual frameworks link concepts together and form the basis for theory.
Central to meeting the conceptual challenges is the task of developing theory, or at least the elements of theory. Among these elements are taxonomies of interventions, outcomes, and contingencies that matter. Taxonomies are a step prior to the development of theories in that they are necessary for stating hypotheses or theoretical propositions of the form that intervention A will have outcome B under conditions C and D but not E and F. A well-developed theory also includes a sufficiently explicit conception of the process surrounding an intervention so as to embody expectations of what outcomes should be observable after defined amounts of time or at particular well-defined stages of a successful process or of how the intervention can be expected to change the trajec-
tory of a conflict situation. The knowledge gained from exploring such propositions, hypotheses, and expectations about contingent relationships and temporal processes can provide practitioners with a useful diagnostic guide for action. They are also necessary, although not sufficient, for rigorous evaluation.
Selecting Cases for Analysis
We have already noted that the standard approach in social science to making generalizations about a set of phenomena is to observe a representative sample of those phenomena and presume that what holds for the sample holds also for the population. This strategy works well when the population is easily enumerated and it is possible to choose a large representative sample within which the important variables are well measured for all cases. Unfortunately, it is rare for any of these conditions to be met for studies of international conflict resolution techniques. Even examining the entire universe of cases—a viable strategy when the universe is small—has limitations because the data often allow for multiple explanations, making interpretation inconclusive. For such reasons it is important to develop reliable methods of purposive sampling.
Advocates of purposive sampling in the case analysis research tradition hold that statistically representative samples, which in most instances are impracticable to obtain, are not necessary for making useful inferences from case study data (e.g., Bennett and George, forthcoming) and that samples can be appropriate for making inferences if there is no way of knowing whether they are representative or even, in some cases, if they are known not to be statistically representative. They propose that understanding can be greatly advanced by analyzing samples of cases that cover the expected range of variations on the theoretically important factors or variables or that focus on cases that are critical for resolving important theoretical questions.
There is much to be said for this argument, especially in the early stages of theory development. It is easier to meet this criterion of an appropriate sample than to meet the criterion of representativeness because it is not necessary to enumerate the universe of cases. However, the researcher must specify the variables with respect to which the sample must be appropriate—the potentially important factors affecting the outcomes of an intervention and the important outcome variables. The claim that a sample study contributes to knowledge normally rests on a theoretical presumption that it has examined the range of variations on one or more of the variables that matter and therefore illuminates the contingencies that affect outcomes. It is advantageous for theory development to be
explicit about which variables matter, and it is especially important when generalizations are being offered on the basis of a small sample of cases.
As yet there is no general theory of purposive sampling beyond the admonition to select the sample to suit the research objectives. It often makes sense to include cases generally considered successful and cases generally considered unsuccessful, but beyond that what should the logic of purposive sampling be? Does the region of the world matter? How important is the identity of the intervening actor or that actor’s past relationships to the parties in a conflict? How important are conditions affecting countries adjacent to the location of the conflict, and which conditions are the important ones?
Researchers who use purposive sampling resolve these questions by making judgments based on explicit or implicit theoretical propositions. The results of their studies often lead them to modify these propositions. Thus, the questions get resolved in stages. If researchers are explicit about the theoretical presumptions that provide their rationales for selecting samples, others can question their judgment. Meanwhile, their analyses will produce tentative results. The combination of empirical results and critical debate is likely, over time, to lead to better rationales for case selection, improved theory, and more complete knowledge of the phenomena in question.
We have noted that the usefulness of samples of any kind may be compromised by a lack of sufficient data about possible cases. Sometimes, interventions occur that analysts never learn about because they were secret or unpublicized. Sometimes there is not enough information about a particular intervention to determine whether it was indeed a member of the class being examined. Sometimes more complete data are available for some kinds of cases (e.g., those publicized as successful) than for others. The best way to address such problems is through a combination of awareness of the possibility and openness in analysis. Analysts should look for hidden cases, explain their case selection, and leave it to future reviewers and readers to consider whether addition of other cases to the analysis or more detailed research on some of the cases would have altered the conclusions.
Observing and Measuring Interventions and Outcomes
Although it can be difficult to be sure an intervention is what it seems (to classify the independent variable), the central measurement challenge lies in measuring outcomes (the dependent variables). Changing the focus from “success” to outcomes is a first step, but it is not always easy to agree on whether particular outcomes have occurred or on which outcomes are relevant to success.
Dealing with Incomplete Information
Often, key information about an intervention, its context, and the outcomes is missing, and sometimes researchers are unsure how incomplete the information is. It is sometimes known or reasonably suspected that government agencies have classified files on a matter or that certain key participants are hiding or distorting information in order to defend political positions or personal reputations. It makes sense for analysts, in addition to specifying the sources of their information, to speculate about the kinds of information that may be missing and the kinds of distortion the available information may contain. It is also useful for analysts to make efforts to develop information from sources that vary in their perspective on the conflict, so that each information source can be used as a check on the others.
Achieving Reliable Measurement
One problem is to specify particular outcomes well enough that observers with access to the same information will agree on whether they occurred—to achieve what methodologists call interobserver reliability. Reliability is relatively easy to achieve for certain indicators of successful outcomes, such as a signed agreement or a reduction in the number of deaths in communal violence, but these are rarely the only outcomes of interest. It can be difficult for observers to agree, for example, about whether a peace process is stalled or still moving forward, whether or not the police force in an emerging democracy is making progress in upholding norms of human rights, or whether an absence of violence indicates conflict resolution or conflict suppression.
There are two main strategies for arriving at adequate interobserver reliability. One is to rely on operational definitions—standard procedures that an observer can follow to decide on whether an outcome has occurred. This approach is easy to imagine for an outcome variable like deaths from violence. An observer could consult official records or, in the absence of those, reports coming from the different sides to a conflict and use a prearranged formula for arriving at an estimate when the reports disagree. Not everyone would agree with the result, but the procedures for arriving at it would be explicit, so that criticisms could be focused on those.
For many important outcomes, using operational definitions is not feasible because there is nothing approaching agreement on a procedure for measuring the outcome in question. In such instances it is possible to use another strategy, directly comparing the assessments of expert judges. This is what analysts do in an informal way when they interview the important practitioners involved on all sides of a peace process in order to
construct a story of how it developed and to determine which interventions were responsible for the results. This strategy is most successful at demonstrating interobserver reliability when the interviewer can develop a set of specific questions or probes to elicit each informant’s judgment about whether particular outcomes were present at particular times.
An important complication in evaluating outcomes, especially when relying on expert judgment, is that all observers do not have access to the same information. Often, important parts of a peace process are private— for instance, deliberations among negotiators on one side of a negotiation are seen only by that side, and the progress of a problem-solving workshop is rarely recorded electronically, so only those present know what happened. Those present at important private moments have relevant information others lack, but they also often have motivations that can distort their memories and their reports, so that others may suspect the accuracy of their accounts. There is no standard procedure for addressing this problem, but it helps to be aware of it, to search for corroboration from participants whose incentives to distort may be different, and to temper conclusions when it appears that much of the available information is biased in one way or another. It may be possible to assemble a group of experts at the same place and time or to confront each of them with other experts’ judgments in an effort to move toward a consensus of expert judgment, as is done in Delphi panels (Frei and Ruloff, 1989).
There are well-known tradeoffs between reliability and validity. Attempts to maximize agreement between coders in order to enhance reliability (accuracy), as in mechanical coding, may result in a distortion of the concept being assessed, thereby detracting from validity (meaning). For example, it is possible to obtain highly reliable measures of concessions in a bargaining experiment, where concessions can be defined as the numerical difference between an offer made at one time and the offer the next time. It is much more difficult to reliably quantify a concession made in an international negotiation, as it must be inferred from suggestions, exploratory proposals, and packages that combine several offers. Coding is typically an interpretive exercise and tends to result in a loss of intercoder agreement, although possibly enhancing validity. The extent to which it enhances validity depends, however, on the adequacy of the coding categories and on the sampling of appropriate materials. Analysts often invent coding systems to capture the essence of a phenomenon and thus enhance validity. To enhance reliability they tend to adopt standard categories that can be used repeatedly. Standard categories may be applicable across intervention types (e.g., discussions in formal negotiations and problem-solving workshops) but may sacrifice validity. Analysts often observe that the coding categories that seem most useful are not the same from one type of data to another.
Developing Indicators of Success
We have noted that evaluation is complicated by the fact that short-and long-term definitions of success may be quite different. This difficulty can be addressed in part by focusing on particular outcomes rather than overall “success.” Evaluations can be separated according to time horizon, with outcomes at different times analyzed separately. As noted, it is important to have short-term indicators of progress even for interventions intended mainly to have long-term effects. This is so partly to provide interim indications of progress and also to allow for meaningful evaluation even in cases in which intervening events not brought about by the intervention throw the process of conflict resolution off its intended course.
One way to develop interim indicators of progress for interventions that have a long time horizon is to postulate mechanisms believed to lead from the intervention to the desired long-term results and identify indicators that can be used to tell whether events are moving along the desired track. For instance, practitioners of interactive problem-solving workshops as a way to improve intergroup communication over the long term might postulate that one mechanism leading to long-term effects such as formal agreements between the parties is through improved communication and trust between workshop participants from opposing sides and their advancement over time to more influential positions in their groups. One could assess whether this mechanism is operating by examining various indicators, such as improved communication between the participants immediately after the workshop, continued communication between them a year or two later, the rise of workshop participants to positions of increased influence, and evidence of sensitivity to and accommodation of the opposite group in the policies they influence. Using such indicators of intermediate progress improves on an outcomes-only approach to evaluation by adding a process element to the evaluation and by strengthening the case for a causal link between the intervention and long-term outcomes.
Another important measurement issue is that of setting realistic expectations for interventions. This issue is especially important for evaluating interventions that are intended to contribute to a peace process but are not expected to produce peace by themselves; however, the issue has not received much attention to date. A first step is to raise the issue—to ask practitioners in interviews and to discern from their writings what their expectations have been for the short- and long-term effects of particular interventions. Their own expectations constitute one reasonable test of success. Of course, different practitioners may have different expectations, even for the same intervention. Conducting a dialogue among
reflective practitioners on what expectations are reasonable can lead to more defensible indicators.
Practitioners’ own expectations are an imperfect guide for the analyst, however. One reason is that practitioners may state unrealistically high expectations to get political support for intervening or unrealistically low ones to increase the chances that the intervention will be judged successful. Also, they may have expectations unrelated to the peace process that influence their statements. For instance, undertaking an intervention or claiming success may benefit a practitioner’s career or the standing of the government agency or nongovernmental organization he or she represents. In such cases an intervention may be successful from the practitioner’s viewpoint even if it has no effect on the conflict, and the practitioner’s view of reasonable expectations may differ from the one an analyst would want to adopt.
In the course of the project leading to this book, we have encouraged scholars and practitioners to interact around the questions of interim indicators and reasonable expectations and have asked authors to address the issues explicitly. The chapters that follow help clarify these issues for several conflict resolution techniques. A good example is Hayner’s (Chapter 9) indicators that postconflict truth-seeking efforts are producing reconciliation between past adversaries.
Inferences from Data
Social scientists have developed many analytical techniques for analyzing claims about cause-and-effect relationships, and numerous textbooks have been written that classify the techniques and assess the strengths and weaknesses of each. The texts and typologies they present usually emphasize applications in a particular discipline. For discussing the challenges of inference about the effects of international conflict resolution interventions, it is useful to group the methods into three broad categories: experiments and simulations, multivariate analyses, and case study methods. This section discusses the possibilities and limitations of each.
Experiments and Simulations
The distinguishing feature of experimental methods is that a researcher deliberately manipulates an independent variable—a variable that is hypothesized to have an effect—and, controlling for other variables that might affect the outcome, observes the consequences. All conflict resolution efforts are experiments in the sense that they are deliberate and intended to have an effect. Experimental methodology is devoted to
identifying ways to conduct and collect data on experiments in order to support strong conclusions about cause and effect and rule out alternative hypotheses for explaining the observed outcomes. There is little experimental research on international conflict resolution because actual conflict situations do not permit experimental controls and because, for most types of intervention, conceptual models are not yet sufficiently well developed even to conduct laboratory simulations (negotiation processes are an exception to this generality).
The conditions for drawing strong conclusions from experiments are rarely, if ever, met with international conflict resolution interventions. The main condition is that all variables that might affect the outcome are either explicitly manipulated or adequately controlled. Adequate control may be achieved either by holding a variable constant—a condition researchers may be able to approximate only in the laboratory—or by randomly assigning each situation in which an intervention might be tried to receive either the intervention or some control or comparison condition. Because these conditions are rarely met for studies conducted outside the laboratory, an alternative approach referred to as quasi-experimentation has been developed (Cook and Campbell, 1979). Quasi-experimental research involves using surrogates for experimental manipulation. For example, a quasi-experimental study might compare the consequences of an intervention in one situation to the consequences in another situation that is comparable in important ways. A researcher might compare the consequences of different efforts to mediate the same conflict at different times, thus achieving control over some of the important variables extraneous to the negotiation strategy (Stedman, 1991, used this strategy as part of his study of mediation in Zimbabwe in the 1970s). Quasi-experimental research methodology makes explicit the limitations of each research design and the most important threats to valid inference about causation that arise with each type of research design. It thus helps researchers evaluate the extent to which these threats can be ignored in a particular study, taking into account its features and the pattern of results obtained (see Robson, 1993). Classical experiments, in which cases are randomly assigned to intervention conditions, generally do a better job than quasi-experiments in ruling out alternative hypotheses; for quasi-experiments and other research methods, it is important to specify each rival hypothesis and seek evidence to rule it out.
Experimental evidence is sometimes used to draw conclusions about international conflict resolution. Researchers subject individuals or small groups to controlled manipulation of variables believed to affect behavior in actual international conflict resolution situations, and use the results to test hypotheses about the effects of those variables. This strategy obtains the chief advantage of experimental method—the ability to rule out rival
hypotheses about causation within the experiment—at the cost of correspondence with real-world conflicts. In methodologists’ language it achieves stronger internal validity (the ability to infer causation) at a cost in external validity (the ability to apply conclusions from the situation studied to other situations).
A laboratory experiment, such as a study of the effects of stress on the accuracy of perception, may have very strong internal validity and potential relevance to real-world conflicts but be highly questionable in terms of its external validity. One approach used to increase external validity is the simulation experiment. An attempt is made to preserve the rigorous features of experiments (random assignment, controls) while representing key aspects of the conflict setting of interest. By building in aspects of the actual setting, it is believed that the results will be relevant to that setting. Of course, this is an empirical issue that is best evaluated by comparing findings obtained in simulations with those obtained in the field (for more on these issues see Guetzkow and Valadez, 1981).
Experimental research on conflict resolution has been most useful for identifying particular aspects of complex interventions that are critical sources of variation in outcomes. This progress has come primarily in research on small-group interventions such as negotiation, mediation, and interactive problem-solving workshops. Simulations typically include considerable detail to enhance external validity and a careful experimental design to allow statistical separation of the key variables posited to affect processes and outcomes and to enhance internal validity. For example, studies simulating the conflict between the Greek and Turkish Cypriots have explored hypotheses about the impact of focusing on values in negotiation. It was possible in a simulated setting to compare interventions using Burton’s (1986) idea of confronting value differences in prenegotiation sessions (“facilitation”) with Fisher’s (1964) notion of focusing only on interests (“fractionation”). Facilitation was found to produce more cooperative negotiations than fractionation (Druckman et al., 1988), and further experimentation uncovered the specific factors or mechanisms that accounted for the positive effects of facilitation (Druckman and Broome, 1991).
Experimental research on mediation has provided insights into the ways that mediators’ role definitions, their tactics, and aspects of the negotiating situation influence their effectiveness. Some studies show that when a mediator is seen as having no stake in the outcome and when hostility between parties is high, pressure tactics (leverage) are more effective than rewards in producing concessions (Harris and Carnevale, 1990; Carnevale and Henry, 1989). Other experiments show that mediators are more likely
to be taken seriously if they suggest compromises early in the talks rather than later because the implications for who gives up what are clear and do not favor one party over the other (Conlon et al., 1994). Also, agreements are likely to be more effective if the mediators encourage the parties to generate and test hypotheses about the sources of the conflict and to take ownership of any agreements that result (Kressel et al., 1994).
An illustrative finding on interactive problem solving comes from a simulation by Cross and Rosenthal (1999), who recruited Palestinians and Israelis to participate in a discussion of several issues that divided these groups. Forty dyads were randomly assigned to one of four approaches to organizing the discussion: distributive bargaining, in which participants emphasize group positions and bargain about them; integrative bargaining, in which they identify interests and then seek to expand the alternatives; interactive problem solving, in which they identify needs and engage in joint problem solving; and a control condition in which participants receive no instructions on how to discuss the issues. Participants using the interactive problem-solving approach became less hawkish in their attitudes than those in the other conditions, and those using the integrative bargaining approach, to the researchers’ surprise, became more hawkish than those in all the other conditions. The study examined only attitudes, not negotiating behavior or outcomes.
These examples illustrate how experimentation can be used to investigate the effects of well-defined aspects of conflict resolution interventions on attitudes and behaviors. Experimentation is well suited for clarifying the mechanisms responsible for effects and thus contributing to an explanation of why an intervention works the way it does. The usefulness of the approach is limited because the criteria for conducting strong experiments are too stringent for collecting and evaluating data on some types of conflict resolution interventions (e.g., the design of national electoral systems, peacekeeping missions). Further, it is difficult to simulate the many aspects of international interventions. Nevertheless, experiments can make useful contributions to knowledge as part of a multimethod research strategy (e.g., Hopmann and Walcott, 1977, compared experimental simulations with the coded results of an actual arms control negotiation). Insights from other research approaches can be evaluated in a more precise way with experiments, experimental hypotheses can be studied in field contexts, and the results from experiments can be compared to those obtained from other methods in a search for conclusions that do not depend on the research method. The classical experiment is also a benchmark or a point of comparison in evaluating the results of nonexperimental methods of analysis.
Multivariate Data Analysis Methods
These methods measure aspects of the historical record of past international conflicts and conflict resolution efforts and search for regularities in that record that qualify as generic knowledge. Researchers who use these methods typically examine a number of aspects of each of several interventions of a particular type. Their measurements may be qualitative, such as simple tabulations of whether particular conditions were present or absent, or they may involve numerical measurements (e.g., numbers of people killed) or the development of indicators (e.g., measures of attitudes derived from surveys or analyses of public statements). Techniques of multivariate data analysis are designed to investigate the strengths of associations and sometimes the temporal ordering of events or indicators in order to support some hypotheses and rule out others concerning the causes of these associations and temporal orderings. Although most often used on datasets involving large numbers of separate events, the approach can also be applied to small numbers of events if many observations have been made of each. Thus, even case materials, if properly prepared, can be subjected to multivariate analysis.
A common use of multivariate analysis for research on international conflict resolution begins with the compilation of so-called events data on conflicts and on the efforts that have been made to resolve them. For example, Bercovitch and colleagues (e.g., Bercovitch, 1989, 1986; Bercovitch and Wells, 1993; Bercovitch and Langley, 1993) have assembled a dataset on more than 300 mediation attempts since the end of World War II. Each case is coded in terms of such indicators as type of mediator (e.g., individual, organization), mediator’s resources, mediator’s status, strategies used, types of issues, duration and intensity of the dispute, timing of intervention, complexity of the dispute, and outcome (impasse, partial settlement, full settlement). These researchers used quantitative techniques to analyze the coded data and test hypotheses about relationships between the outcome of the intervention and conditions that may affect the outcome, such as the use of directive or passive mediation strategies, the nature of the issues as ideological or interest based, power imbalances between the parties, and early versus late intervention. They also developed and tested various causal models of the connections among features of the disputes and the mediation outcome (Bercovitch and Langley, 1993). This analytical approach makes it possible to examine the outcomes of mediation as a function not only of the mediation itself but also prior contextual conditions (features of the dispute)—something that is difficult to examine by simulation.
The multivariate approach has also been used to study negotiation. Using primary sources (interviews with delegates) and secondary sources
(published accounts), it is possible to code each of a series of negotiation cases in terms of such characteristics as the issues (large or small), parties (strong or weak), conditions (public or private), processes (bargaining or problem solving), and outcomes (impasse, partial agreement, comprehensive agreement). Researchers have used such data to define distinct types of negotiations (Chasek, 1997) and to organize them along such dimensions as the size of the negotiation (bilateral, trilateral, multilateral) and the complexity of the issues (Druckman, 1997). Such analyses provide an empirical basis for developing typologies based on profiles of negotiation characteristics. They may also enable practitioners to consult the historical record for past cases that may be instructive for present purposes.
Multivariate datasets like these can be compiled for a wide variety of interventions. For instance, they have been used to illuminate the effectiveness of economic sanctions (Hufbauer et al., 1990), the factors conducive to ethnic conflicts (Gurr, 1993), and the relationship of language policy to communal conflict (Laitin, Chapter 13). In all cases their usefulness depends on the relevance of the variables chosen, the validity of the coding, and the level of detail. The chief strength of the approach is that statistical analysis allows researchers to consider numerous possible causal mechanisms using more cases than they can evaluate with the unaided mind. The chief weakness is that much of the richness of each case is lost when the case is reduced to a list of indicators. There is obviously a tradeoff between breadth and depth in case analysis, and multivariate data analysis normally exhibits both the strengths and the weaknesses of breadth.
The above examples use a cross-sectional approach—that is, they analyze multivariate data in which each case is assigned a single value for each characteristic being coded, without regard to time. While the cross-sectional approach can, with sufficient data, probe complex contingent relationships among variables, it is limited for investigating causal mechanisms because the operation of these can only be assessed across time. It is possible to apply statistical techniques of causal modeling to test the consistency of the data with a hypothesis about causal mechanisms (e.g., Bollen, 1989; Stevens, 1996), but the strength of the inferences from these techniques, even with sufficient data, is always limited by the lack of a temporal dimension.
Multivariate analysis can also consider change over time—and examine causal mechanisms more directly—by analyzing data arranged in a time series, in which the same variable is coded at many points in the history of a case to allow the study of temporal processes. In one such time series approach, historical data are used to “postdict” known outcomes. The goal is to develop a conceptually coherent account of a
conflict resolution process that is consistent with the known outcome. For example, in a study of base rights negotiation between Spain and the United States in 1975–1976, Druckman (1986) showed that the agreement resulted from a sequence of identifiable crises and turning points. The approach can also be used to identify a mechanism for the operation of an intervention that is consistent with the observed chain of events. Hopmann and Smith (1978) showed that the outcomes of the 1962–1963 partial nuclear test ban talks resulted from certain actions taken by nations outside the negotiations. This multivariate time series research design is quite useful for identifying how the temporal pattern of interactions between the conflicting parties and interventions by external actors led to the ultimate outcome. Postdiction can be considered as something like an experiment in that theory can be tested by comparing the actual outcome with “predictions” made on the basis of the theory and the initial conditions.
Historical data that are coded by time of occurrence can be used to evaluate the impacts of planned interventions and to trace the processes by which these impacts occur. The research approach is referred to in the technical literature as interrupted time series analysis. An example is a study of five mediation efforts in the conflict between Armenia and Azerbaijan between 1990 and 1995. Mooradian and Druckman (1999) demonstrated that each mediation effort had limited effects on the time series of events. However, the historical pattern of continuous violence was altered dramatically after Azerbaijan’s major military offensive in 1993–1994, apparently due to a hurting stalemate suffered by both sides. The pattern of conflictual events before the offensive (October 1992 to March 1993) changed to a cooperative trend following the fighting (May 1994 to September 1994), only to turn again to conflict by October 1994. A long time series of events enables the analyst to ascertain the extent to which each intervention alters the trend and to consider whether the effects are immediate or delayed. Inference is complicated by the fact that the multiple interventions are not independent of each other—for example, the effect of a successful intervention may be due in part to the fact that past efforts failed. In addition, the validity of the time series is sometimes hard to establish.
Many of the problems of inference involved in time series analyses have been addressed by quantitative methodologists. For example, they have developed techniques for differential weighting of distant and recent events (referred to as exponential smoothing), compensating for the dependence of events on similar events in the recent past (statistical controls for autocorrelation), accounting for possible explanatory factors that are associated with each other (stepwise regression for unique variance explained), and taking account of changes in the estimated subjective
probabilities of events (Bayesian analysis, Markov chain processes; see Frei and Ruloff, 1989, for discussion of these techniques). These techniques have been used in sophisticated forecasting methodologies (e.g., Duncan and Job, 1980) and model-fitting analyses of interactive processes (e.g., Druckman and Harris, 1990; Patchen and Bogumil, 1995, 1997). They have not, however, solved some of the fundamental conceptual and measurement problems of multivariate time series analysis, such as the need to compare events with counterfactuals, the lack of valid measures for some important variables, and the lack of sufficient size or variation in the sample of historical events.
Multivariate analysis techniques, like other methods, have their characteristic strengths and weaknesses. As already noted, their chief strength is that they can simultaneously consider more cases, and more aspects of a single case, than a human analyst can comprehend. This capability is particularly useful for analyzing the effects of contextual factors on conflict resolution efforts because the potential effects are numerous and because useful indicators are available for many contextual variables. In such situations, multivariate analysis can reveal patterns that might otherwise go unnoticed. The chief limitations of the approach are those of the available data and concepts. The methods can only be applied when a sufficient number of historical cases (or a sufficient richness of data on a single case) exist for quantitative comparison—normally, dozens of data points are necessary to test a single bivariate hypothesis, and more are needed to test hypotheses involving a conjunction of several variables. Also, the available data must include reasonably valid measures of the variables that are central to the desired analysis. Sometimes the variables for which valid measures are available for numerous cases are not the ones of the greatest theoretical or practical interest, and sometimes the variables of greatest interest are not well enough conceptualized to allow for valid measurement. In such instances, multivariate analysis has obvious limitations.
Multivariate analysis is particularly useful at the current state of theory development for uncovering patterns that deserve further analysis by other analytical methods. When patterns evident in cross-sectional studies suggest causal hypotheses, it may be useful to explore those hypotheses further by using simulation experiments or detailed analyses of individual cases through time. Similarly, patterns that emerge from time series data are also worth further examination by other methods, particularly intensive examination of case material. A major value of quantitative research approaches is through the discipline they impose on thinking. The measurement efforts that these approaches demand (e.g., specifying operational definitions, developing valid indicators) force researchers to be precise about their concepts
and may thus sharpen analysis and raise the level of debate among scholars and practitioners.
Enhanced Case Study Methods: Structured, Focused Case Comparisons and Process Tracing
The case study is one of the classical methods of political science, and its uses and limitations for making inferences are well known (e.g., Smelser, 1976; Ragin, 1987; Collier, 1993; Collier and Mahoney, 1996; King et al., 1994). Our interest here is in refinements of the traditional case study approach that have been developed over the past two or three decades to increase the rigor of the approach and overcome some of its limitations, particularly the problem of noncomparability across cases and the difficulty of using case material to test hypotheses. Traditional case studies can be useful by demonstrating that a particular case is inconsistent with an existing theory and thus stimulating scholarly research and rethinking by practitioners. However, the contribution of traditional case studies to cumulative knowledge has been limited by noncomparability across cases: there is typically no way to test the conclusions from one case study against evidence from other case studies because they fail to include information needed for the comparison.
The focus here is on two particular refinements to the case study approach: the method of structured, focused case comparisons, and process tracing. Both methods improve on the traditional case study by being more theoretically explicit. By stating in advance which variables are worth examining and which processes are worth tracing (and by implication which are not), these approaches make it possible to focus case-based research and thus to build knowledge cumulatively.
Structured, focused case comparisons differ from the traditional case study approach in that cases are selected and case descriptions developed with particular theory-guided questions or conceptual issues in mind (Lijphart, 1971; George and Smoke, 1974; George, 1979; Collier, 1993; Putnam, 1993; Faure, 1994). The method requires that an analytical protocol be developed before the case studies are conducted that defines the variables of interest and some of the researcher’s key questions about them. This allows the researcher to compare the cases on the central issues of interest. The structured, focused case comparison method cannot, as a rule, be applied to previously completed case studies because they usually lack information demanded by the protocol.
A well-known application of the structured, focused case comparison approach has been to test deterrence theory. Researchers select a set of cases they judge to represent successful and failed deterrence and then examine the historical evidence on each case to answer theoretically rel-
evant questions such as whether the deterrent threat was credible, whether it was clearly communicated, whether the leaders of the target country followed principles of rational decision making, and the like (George and Smoke, 1974; Lebow, 1981). This research has focused especially on deterrence failures because a failure that occurred when all of the theoretical conditions for success were in place would call into question the principles of deterrence theory and because of the difficulties of establishing deterrence successes (when a deterrent is successful, the result is often that nothing observable happens). Blechman and Wittes’s paper (Chapter 3) in this volume on the use of threats of force uses a form of structured, focused case comparison.
The structured, focused case comparison method requires a theory or conceptual framework that is sufficiently well specified to generate the list of factors or variables that must be considered in each case. The method is particularly attractive when a testable hypothesis exists along with unambiguous indicators of the relevant variables that are obtainable from available historical information. It also requires that several relevant cases are available. Sometimes, useful results can be obtained with fewer than a dozen cases—a contrast to the requirements of the multivariate quantitative approach.
Compared to traditional case study research, structured and focused case comparisons have the advantage of comparability: the same information is collected about each case using the same methods. Because only selected information is needed about each case, it may be possible to do more structured comparisons with a given set of resources than unstructured comparisons, but the method can miss information on aspects of the cases that are not presumed in advance to be important. This is both the advantage and the disadvantage of research methods informed by explicit conceptualization. Structured and focused case comparisons can to some extent overcome this problem by being flexible about the way information is extracted from cases. Flexibility allows insights to be discovered in individual cases that may have been missed in the answers to structured questions. These insights can then be examined by collecting the necessary information on the other cases under study.
Compared to the multivariate analysis approach, structured and focused case comparisons examine fewer variables and fewer cases (the selection being made on theoretical grounds) but provide much more detailed information on the variables they do examine. Because of these differences, the multivariate approach has a comparative advantage for exploratory analysis, like traditional case studies; in contrast, structured and focused case comparison is comparatively well suited to testing hypotheses from theory and refining the theories it tests. It is sometimes possible to treat a set of case comparisons as a source of events data and to
apply multivariate quantitative analysis to the case-based data. For example, Table 3.1 summarizes the results of Blechman and Wittes’s analysis of the uses of threats of force by the United States since the end of the Cold War by coding a number of key variables, usually as present or absent. Such a table of categorical data can be analyzed statistically in the manner of cross-sectional events-data analysis, although this particular data table may be small enough for adequate analysis by inspection.
Just as structured case comparison is a sort of qualitative analog to cross-sectional events-data research, the process-tracing approach is a sort of qualitative analog to time series data analysis. In the process-tracing approach (Bennett and George, forthcoming), a researcher postulates one or more processes or “causal mechanisms” by which a set of initial events, including a conflict resolution intervention, might lead to a set of outcomes.6 The historical record is then searched for evidence that the postulated processes did or did not occur. The process-tracing approach can allow for multiple tests of the same hypothesis in a single case, thus dampening the criticism that a single case study cannot test a hypothesis because a single test is never statistically convincing. However, repetitions of similar conditions in the course of the same conflict are not independent in a statistical sense. This situation presents a threat to the validity of cause-and-effect generalizations drawn from process-tracing studies, analogous to threats to validity in quantitative time series research. Process tracing could, in principle, be used in a way that allows statistical tests of the relative explanatory power of different theories; however, we are not aware of any such applications. It is important to note that, for drawing inferences about learning and other experience-dependent processes, statistical nonindependence between events is not a problem and in fact is necessary for a case to be informative (Bennett and George, forthcoming).
An important difference between enhanced case study approaches and multivariate data analysis is that the former requires an explicit theory or conceptual framework while the latter does not. The enhanced case study approaches are an improvement over traditional case studies precisely because of their greater conceptual explicitness. It is useful to make a similar distinction regarding how multivariate quantitative research is conducted. This approach can be employed as a form of nearly pure empiricism by simply analyzing whatever indicators are available across a set of cases to see what regularities emerge. However, the results will be unsatisfying if the available indicators do not include measures of the important variables affecting outcomes. Thus, multivariate analysis is likely to yield more useful results if concepts are made explicit—if an effort is made at the start to specify the key variables and to develop
indicators for them. This form of multivariate analysis is enhanced in much the same way as enhanced case study methods are.
We are suggesting here that there can be some convergence between case-based and multivariate quantitative approaches. We further believe that progress in understanding depends on such convergence. Both case study and multivariate research approaches to international conflict resolution were initially used in an exploratory mode to examine the available evidence (either case material or quantitative indicators) and to search for empirical regularities. This empiricist strategy has not led to strongly supported generic knowledge, but it has generated hypotheses that can be tested with more carefully focused research, using either case-based or quantitative research modes. It has also led to refinements in understanding, in which bivariate hypotheses about relationships between interventions and outcomes give way to conditional generalizations. These are statements or propositions that specify the conditions (sometimes called moderating variables) under which such relationships are likely to occur. Propositions about these conditions contribute to more nuanced knowledge, as exemplified by contingent theories of conflict resolution (Fisher, 1997).
Because of the limitations of each approach, progress seems most likely if both methods are used. One promising way to do this is to apply quantitative methods to data gathered by case study methods. This has occasionally been done with traditional case study data (e.g., the work of Ember and Ember, 1992, using ethnographic data from the Human Relations Area Files) and with enhanced case study material (e.g., quantitative scaling of data from negotiation cases; Druckman, 1997). The approach can be applied to several of the topics in this book, including the use of threats of force (Chapter 3), electoral system design (Chapter 11), and language policy (Chapter 13). Quantifying case study data may make for more precise comparisons between theories in terms of how well they explain available data and may also show more clearly where the data are inconclusive. Following a similar logic, statistical techniques of time series analysis can be applied to data from process-tracing case studies. It is also possible for quantitative researchers to build on the results of case comparisons by designing large-N studies that focus on the key variables identified in case-based research.
Another promising strategy for combining research methods is to use the results of multivariate studies to guide the development of protocols for structured case comparisons and process-tracing studies. Whenever multivariate research identifies a statistical regularity, it generates a hypothesis that could be tested in case-based research. Case study researchers may find the indicators from multivariate research too restrictive for their way of thinking, but they have the option to add depth to these variables in their
protocols. We believe that this sort of interplay between methods is much more likely to be productive than a continuation of arguments about which method is superior, such as have frequently appeared in the literature on research methodology in international relations.
The challenges of evaluating efforts at international conflict resolution and our suggestions for how to meet those challenges are summarized in Table 2.1. This work leads to three major conclusions. First, theory development is key to addressing many of the most serious challenges of building knowledge about what works in international conflict resolution. This point has also been emphasized by others (e.g., Lijphart, 1971; Eckstein, 1975; George, 1979; Bennett and George, forthcoming). Second, understanding is most likely to progress through a dialogue between theory and
TABLE 2.1 Challenges of Evaluation and Strategies for Meeting Them
Defining the intervention
Develop and improve taxonomy of interventions
Enumerate outcomes of interest
Setting reasonable expectations (how much? how soon?)
State hypotheses about process
Identifying relevant contingencies
Develop hypotheses about contingencies
Enumerating the universe of cases
Frame the topic conceptually and historically; look for hidden and incomplete cases
Getting an appropriate sample
Specify bases of sample selection
Observation and Measurement
Dealing with incomplete information
Develop multiple information sources
Achieving reliable measurement
Develop operational definitions; seek expert agreement
Developing indicators of success
Develop interim and long-term indicators
Comparing events with counterfactuals
Use multiple analytical methods (experimentation, multivariate analysis, case study)
Assessing the roles of extrinsic events
Assessing contingent relationships
Accounting for indirect effects of interventions
Develop and test hypotheses about contingencies
Accounting for actors’ perceptions
Develop and test hypotheses about mechanisms and processes
Assessing the context of multiple interventions
experience, with progress in each leading to refinements in the other. Third, the empirical research enterprise should use a strategy of “triangulation” (Campbell and Fiske, 1959; Cook, 1985) that relies on multiple sources of data and multiple modes of analysis to correct for the characteristic sources of error or bias in each and to help analysis converge on results that can be accepted with reasonable confidence.
The practical concern with how best to develop generic knowledge about what works in international conflict resolution leads to a perhaps surprising conclusion: there is a critical need to develop theory. This conclusion follows from the recognition that improvements in the quality of theory would help meet each of the major challenges of evaluation.
The needed theories would combine three elements. First, taxonomies, which can focus on types within a kind of intervention (e.g., peacekeeping missions), characteristics of interventions (e.g., strategies of coercive diplomacy, strength and speed of the application of economic sanctions, procedures used in problem-solving workshops), external contingencies (types of conditions affecting the link from intervention to outcomes), and types of outcomes (e.g., types of deterrence failures). The most useful taxonomies are presented with enough specificity to allow each phenomenon to be reliably classified. Second, postulates about causal mechanisms and processes that specify the ideal working of one or more types of intervention, including the processes by which intervention changes the course of a conflict and the outcomes that may result. Third, contingent generalizations—propositions or hypotheses that link the outcomes of a particular type of intervention to the characteristics of the intervention and the external contingencies that shape these outcomes. Ideally, theory also specifies the processes by which these characteristics and contingencies have their effects, thus linking contingent generalizations to causal mechanisms. This is a very demanding set of requirements given the current state of knowledge, but it is an appropriate list of objectives for theories.
Theories with all three elements would incorporate most of the suggestions in Table 2.1 and in this chapter for how to meet the challenges of evaluation. They would thus help meet the most fundamental challenge of evaluating conflict resolution interventions, posed by the fact that a tremendous number of events may be consequential for the outcomes of an intervention, especially if the outcomes are delayed in time. Without some guidance from theory about which events to examine and which aspects of those events matter, a researcher or practitioner faces an unsorted mountain of relevant and irrelevant detail and must make the inevitable choices about what is and is not worth considering on the basis
of unstated theoretical presumptions. The history of science shows that more progress is made when such presumptions are made explicit so they can be tested and refined through cumulative research.
Well-specified theory also helps meet many of the more specific challenges to developing generic knowledge. It helps meet the major conceptual challenges by providing clear definitions of the types of interventions, the outcomes that might be considered indicators of their success or effectiveness, and the characteristics and external contingencies that may influence the relationships between interventions and outcomes. By explicitly identifying processes and causal mechanisms, a well-specified theory helps analysts focus their attention selectively on those events that follow an intervention that are postulated to have important effects on its eventual outcomes. Well-specified theory helps meet the challenges of case selection by clarifying which cases fall within the universe of any particular type of intervention. By specifying the important variables affecting outcomes, theory provides a rationale for purposive sampling and a basis for judging the appropriateness of samples selected in other ways. It helps solve the problem of a shortage of cases relative to explanatory variables by reducing the number of the latter and by postulating processes that lead to multiple testable hypotheses about each case. Finally, by specifying ideal outcomes and the processes that lead to them, theory can provide clear expectations about the course of a conflict after intervention, thus helping both researchers and practitioners evaluate progress and consider the next steps.
Knowledge about the techniques and concepts of international conflict resolution is not nearly well enough developed to strongly confirm a theory about even one of the techniques. Nevertheless, enough thinking and study have been done to state fairly well specified theories about many of them and subject those theories to focused empirical analysis. Pressing ahead with clear theoretical statements is, we believe, essential to developing generic knowledge about international conflict resolution techniques. Although many of these theoretical statements will be incomplete or wrong at first, they will nevertheless be useful for advancing knowledge. Deterrence theory provides a good example of what can be gained. Fairly well specified statements of deterrence theory have been available for several decades (e.g., Brodie, 1959; Schelling, 1960), allowing a cumulation of focused research using a variety of methods (e.g., George and Smoke, 1974; Lebow, 1981; Jervis et al., 1985; National Research Council, 1989). Although the early theoretical statements can now be said to have been incomplete, they sharpened debate about the historical evidence and encouraged researchers to look more closely at history to test the theory. As a result, practitioners of deterrence now have a more sophisticated understanding than before of
which factors to consider in their domestic situation, in the target country, and in the international context before making threats. We believe that theory about other techniques and concepts can usefully follow this model and that several of the contributions to this volume make advances in that direction.
Theoretical statements need not closely approach the ideal presented here in order to be useful. Even partial theoretical statements—partial taxonomies, for example, or limited sets of hypotheses about the ways that certain contingencies affect the outcomes of intervention—help both research and practice by directing attention to particular variables that may be important to the outcomes of conflict resolution interventions. To the extent that a theory is supported by evidence, it has identified variables worth considering. Both analysts and practitioners can economize on time and effort by looking at those variables first and considering what theory says about them. Of course, to the extent that a theory is incomplete or wrong, it may direct attention to the wrong variables. Scholars and practitioners need to take theories only as seriously as the supporting evidence implies. They are not absolute guides to action, only interim statements that summarize and systematize available knowledge and that may have suggestive implications about what is likely to work in new historical situations.
Well-supported theoretical propositions have several uses for practitioners. They help practitioners assess situations by identifying the factors to consider in deciding whether, when, and how to use a particular type of intervention. They suggest scenarios leading from interventions to outcomes, both desired and otherwise, that practitioners may examine for their relevance to the situation at hand. They suggest what must be put into an intervention if it is to achieve a desired outcome and identify external conditions that are likely to lead to undesirable outcomes. Theories thus help practitioners identify policy opportunities and anticipate policy pitfalls. In all these ways, well-supported theories have diagnostic value for practitioners. Even theoretical propositions that are merely plausible can have diagnostic value if used with caution.
A Dialogue of Theory and Experience
Theories such as deterrence theory have advanced practical knowledge primarily because of how they help make sense of historical experience. Advances in formal theory by themselves may have very limited practical value because such theories may make opposing predictions depending on the specifics of a case. Wilson (1989), for example, shows that game theory models predict different outcomes of deterrence attempts depending on the structure of information and the parties’ interac-
tion—variables whose values can be determined only by observation of particular cases. From the researcher’s standpoint, a useful theory is one that, by focusing attention on particular contingencies, causal mechanisms, or distinctions between situations or classes and characteristics of interventions, leads to empirically supported generic statements that distinguish between favorable and unfavorable conditions for particular types of intervention and explanatory accounts of the processes by which conflicts are resolved. Such theories are also useful to practitioners by focusing their attention on aspects of conflict situations that are likely to be important to their decisions and offering ways to think through the possible consequences of their choices. Experience tests and refines theories, thus making them more useful over time.
Good theory gives practitioners advantages they are not likely to gain from unaided reflection on their experience and past cases. Consider, for example, the development of theory about how the structure of electoral systems may affect the course of communal conflict in multicultural societies. Because each electoral system is unique, it would be hard for a practitioner in such a country to make sense of historical evidence from over 200 other countries without theoretical concepts that make it possible to classify those systems and consider their outcomes. Concepts like vote-seat proportionality, geographic accountability, consociationalism, and centripetalism and the theories in which they are embedded (Lijphart, 1984; Horowitz, 1985; Reilly and Reynolds, Chapter 11) offer useful ways of thinking about how any specific proposed electoral design is likely to shape the ethnic composition of political parties, interparty competition, and the potential for interethnic cooperation, communal violence, and peaceful transfers of power. Of course, there is no definitive theory in this field. Nevertheless, the theories that exist, and even the debate among them, are useful to practitioners by identifying causal mechanisms and historical trajectories from the past that may represent models to emulate or pitfalls to avoid. The theories not only point out good and bad examples but also specify what about them is good or bad.
It is important to be explicit about the ways in which theory can and cannot help practitioners of international conflict resolution. No matter how well a theory is established, it cannot eliminate the need for practitioners to exercise judgment based on their experience and knowledge. It will always be necessary, at a minimum, for practitioners to classify current conflict situations into theoretically meaningful types based on available information and their judgments of the parties, to define their policy objectives, to make tradeoffs between competing objectives and between short- and long-term objectives, to judge how to proceed given the possibility of unforeseen events that might intervene, and to decide
whether aspects of the current situation make it so different from past experience as to question the applicability of the theory to the particular case. A detailed discussion of practitioners’ judgment can be found in George (1993).
What good theory can do for practitioners is help them think through the decisions they face. Theory provides diagnostic categories for classifying conflict situations, and it advises on which aspects of a conflict situation are diagnostically important. It offers generic knowledge about the conditions that favor the use of particular interventions in particular kinds of situations and about the effects of implementing the interventions in specific ways. It also provides information on how strongly such generalizations are supported by historical and other evidence. By offering accounts of the processes and causal mechanisms that lead from interventions to outcomes, it gives practitioners ways of checking on the progress of their conflict resolution efforts. A theory of causal mechanisms may also help a practitioner think of new approaches to conflict situations designed to influence those mechanisms.
Thus, it makes sense for practitioners to use theories as guides to thinking and action but not as sources of prescriptions for action. A theory that is well supported by evidence may provide a better guide to action than an individual practitioner’s experience; it certainly provides an important supplement to that experience. A theory may also mislead—either because it is in error or because it does not apply to the situation at hand—but a theory built on careful analysis of the relevant cases is less likely to mislead than an implicit theory based only on the limited and perhaps biasing experiences of a few practitioners. Theories can be useful and can be made more useful by careful research. However, they cannot eliminate the need or the responsibility for practitioners to make careful judgments appropriate to particular situations.
It is useful to distinguish between theory development and the evaluation of interventions. Theory is intended to produce knowledge that applies to a number of cases, referred to here as generic knowledge. Theories include propositions that specify contingent relationships among variables or causal processes or mechanisms to explain these relationships. Evaluations are intended to throw light on whether particular interventions were or were not effective and why. Evaluation depends on theory development, which provides indicators or criteria of success and a body of propositions describing the conditions under which success or failure (under those criteria) can be expected and the processes leading to those outcomes. Thus, theory provides the concepts needed to evaluate an intervention and explain the reasons for its outcomes.
Using several distinct research approaches or sources of information in conjunction is a valuable strategy for developing generic knowledge. This strategy is particularly useful for meeting the challenges of measurement and inference. The nature of historical phenomena makes impossible controlled experimentation with real-life situations—the analytical technique best suited to make strong inferences about causes and effects. Thus, making inferences requires using experimentation in simulated conditions and various other methods, each of which has its own advantages and limitations but none of which alone can provide the level of certainty desired about what works and under what conditions. We conclude that debates between advocates of different research methods (e.g., the quantitative-qualitative debate) are unproductive except in the context of a search for ways in which different methods can complement each other. Because there is no single best way to develop knowledge, the search for generic knowledge about international conflict resolution should adopt an epistemological strategy of “triangulation” (Campbell and Fiske, 1959), sometimes called “critical multiplism” (Cook, 1985, 1993). That is, it should use multiple perspectives, sources of data, constructs, interpretive frameworks, and modes of analysis to address specific questions on the presumption that research approaches that rely on certain perspectives, constructs, and so forth can act as partial correctives for the limitations of research approaches that rely on different ones. An underlying assumption is that robust findings (those that hold across studies that vary along several dimensions) engender more confidence than replicated findings (a traditional scientific ideal but not practicable in international relations research outside the laboratory). Thus, when different sources of data or different methods converge on a single answer, one can have increased confidence in the result. When they do not converge, one can make interpretations that take into account the known biases in each research approach. A continuing critical dialogue among analysts using different perspectives, methods, and data is likely to lead to an understanding that better approximates reality than the results from any single study, method, or data source. For more detailed theoretical discussion of triangulation approaches to understanding, see Cook (1985, 1993).
A Final Word
Practitioners who wish to resolve international conflicts need to learn the lessons of history, but history provides no definitive or comprehensive text. This situation is inevitable in a continually changing international system. A particular challenge in learning lessons from history is
the tendency of individuals to assimilate new information to old ways of thinking and the related tendency of organizations to reject information that calls current policies into question. Both these tendencies may lead practitioners to discount or misinterpret new information that does not accord with their preexisting views. Because inferences from history always involve comparisons with unrealized, or counterfactual, worlds, there is plenty of room for reinterpreting available knowledge to fit preconceptions or policy commitments, thus undermining the potential value of new knowledge.7 Nevertheless, careful analysis of historical and other evidence, together with the development of clear diagnostic concepts and empirically tested theories of peace processes, can make a modest but significant contribution to practitioners’ ability to understand and intervene to resolve conflicts. The following chapters are intended as part of that contribution.
We are indebted to Alexander George, Thomas Cook, Ronald Fisher, David Laitin, Dean Pruitt, and Philip Tetlock for helpful comments on drafts of this paper.
A shorter version of this paper appeared in International Studies Review, 2000, 2:33–63.
Achen, C.H. 1986 The Statistical Analysis of Quasi-Experiments. Berkeley: University of California Press.
Bennett, A., and A.L.George Forth- Case Study and Theory Development. Cambridge, Mass.: MIT Press, coming
Bercovitch, J. 1986 International mediation: A study of incidence, strategies, and conditions of successful outcomes. Cooperation and Conflict 21:155–168.
1989 International dispute resolution. In Mediation Research: The Process and Effectiveness of Third Party Intervention, K.Kressel and D.Pruitt, eds. San Francisco: Jossey-Bass.
1997 Mediation in international conflict: An overview of theory, a review of practice. In Peacemaking in International Conflict, I.W.Zartman and J.L.Rasmussen, eds. Washington, D.C.: United States Institute of Peace Press.
Bercovitch, J., and J.Langley 1993 The nature of the dispute and the effectiveness of international mediation. Journal of Conflict Resolution 37:670–691.
Bercovitch, J., and R.Wells 1993 Evaluating mediation strategies: A theoretical and empirical analysis. Peace and Change 18:3–25.
Bollen, K.A. 1989 Structural Equations with Latent Variables. New York: Wiley.
Brodie, B. 1959 Strategy in the Missile Age. Princeton, N.J.: Princeton University Press.
Burton, J.W. 1986 The history of international conflict resolution. In International Conflict Resolution: Theory and Practice, E.E.Azar and J.W.Burton, eds. Boulder, Colo.: Lynne Rienner.
Campbell, D.T. 1975 Degrees of freedom and the case study. Comparative Political Studies 8:178–193.
Campbell, D.T., and D.Fiske 1959 Convergent and discriminant validation by the multitrait-multimethod matrix. Psychological Bulletin 56:81–105.
Carnevale, P.J., and R.A.Henry 1989 Determinants of mediator behavior: A test of the strategic choice model. Journal of Applied Social Psychology 19:481–498.
Chasek, P. 1997 A comparative analysis of multilateral environmental negotiations. Group Decision and Negotiation 6:437–461.
Collier, D. 1993 The comparative method. In Political Science: The State of the Discipline II, A.W. Finifter, ed. Washington, D.C.: American Political Science Association.
Collier, D., and J.Mahoney 1996 Insights and pitfalls: Selection bias in qualitative research. World Politics 49(1):56– 91.
Conlon, D.E., P.Carnevale, and W.H.Ross 1994 The influence of third party power and suggestions on negotiation: The surface value of a compromise. Journal of Applied Social Psychology 24(12):1084–1113.
Cook, T.D. 1985 Post-positivist critical multiplism. Pp. 21–62 in Social Science and Social Policy, R.L. Shotland and M.M.Mark, eds. Beverly Hills, Calif.: Sage.
1993 A quasi-sampling theory of the generalization of causal relationships. In New Directions for Program Evaluation: Understanding Causes and Generalizing About Them, vol. 57, L.Sechrest and A.G.Scott, eds. San Francisco: Jossey-Bass.
Cook, T.D., and D.T.Campbell 1979 Quasi-Experimentation: Designs and Analysis Issues for Social Research in Field Settings. Boston: Houghton Mifflin.
Cross, S., and R.Rosenthal 1999 Three models of conflict resolution: Effects on intergroup expectations and attitudes. Journal of Social Issues 55(3):561–580.
Dessler, D. 1991 Beyond correlations: Toward a causal theory of war. International Studies Quarterly 35:337–355.
Diamond, L., and J.W.McDonald 1991 Multi-track Diplomacy: A Systems Guide and Analysis. Grinnell: Iowa Peace Institute.
Diehl, P.F., D.Druckman, and J.Wall 1998 International peacekeeping and conflict resolution: A taxonomic analysis with implications. Journal of Conflict Resolution 42:33–55.
Druckman, D. 1986 Stages, turning points, and crises: Negotiating military base rights, Spain and the United States. Journal of Conflict Resolution 30:327–360.
1995 Situational levers of position change: Further explorations. Annals of the American Academy of Political and Social Science 542:61–80.
1997 Dimensions of international negotiations: Structures, processes, and outcomes. Group Decision and Negotiation 6:395–420.
Druckman, D., and B.J.Broome 1991 Value differences and conflict resolution: Liking or familiarity? Journal of Conflict Resolution 35:571–593.
Druckman, D., and J.N.Druckman 1996 Visibility and negotiating flexibility. Journal of Social Psychology 136:117–120.
Druckman, D., and R.Harris 1990 Alternative models of responsiveness in international negotiation. Journal of Conflict Resolution 34:235–251.
Druckman, D., and P.C.Stern 1997 Evaluating peacekeeping missions. Mershon International Studies Review 41:151– 165.
Druckman, D., B.J.Broome, and S.H.Korper 1988 Value differences and conflict resolution: Facilitation or delinking? Journal of Conflict Resolution 32:489–510.
Duncan, G.T., and B.Job 1980 Probability Forecasting in International Affairs. Final report to the Defense Advanced Research Projects Agency.
Eckstein, H. 1975 Case studies and theory in political science. Pp. 79–138 in Handbook of Political Science, vol. 7, F.Greenstein and N.Polsby, eds. Boston: Addison-Wesley.
Elster, J. 1983 Explaining Technical Change: A Case Study in the Philosophy of Science. Cambridge: Cambridge University Press.
Ember, C.R., and M.Ember 1992 Resource unpredictability, mistrust, and war: A cross-cultural study. Journal of Conflict Resolution 36(2):242–262.
Faure, A.M. 1994 Some methodological problems in comparative politics. Journal of Theoretical Politics 6:307–322.
Fisher, R. 1964 Fractionating conflict. In International Conflict and Behavioral Science: The Craigville Papers, R.Fisher, ed. New York: Basic Books.
Fisher, R.J. 1997 Interactive Conflict Resolution. Syracuse, N.Y.: Syracuse University Press.
Frei, D., and D.Ruloff 1989 Handbook of Foreign Policy Analysis. Boston: Martinus Nijhoff.
Galtung, J. 1969 Peace, violence and peace research. Journal of Peace Research 6:167–191.
Geddes, B. 1990 How the cases you choose affect the answers you get. In Political Analysis, vol. 2, J.A.Stimson, ed. Ann Arbor: University of Michigan Press.
George, A.L. 1979 Case studies and theory development: The method of structured, focused comparison. In Diplomacy: New Approaches in History, Theory, and Policy, P.G.Lauren, ed. New York: Free Press.
1993 Bridging the Gap: Theory and Practice in Foreign Policy. Washington, D.C.: United States Institute of Peace Press.
1997 The role of the congruence method for case study research. Paper presented at the convention of the International Studies Association, Toronto, March.
George, A.L., and A.Bennett 1998 Process tracing with notes on causal mechanisms and historical explanation. Paper presented at the Diplomatic History and International Relations Theory Conference, Arizona State University, January.
George, A.L., and R.Smoke 1974 Deterrence in American Foreign Policy: Theory and Practice. New York: Columbia University Press.
Guetzkow, H., and J.J.Valadez 1981 Simulated International Processes: Theories and Research in Global Modeling. Beverly Hills, Calif.: Sage.
Gurr, T.R. 1993 Minorities at Risk: A Global View of Ethnopolitical Conflict. Washington, D.C.: United States Institute of Peace Press.
Harris, K.I., and P.J.Carnevale 1990 Chilling and hastening: The influence of third party power and interests in negotiation. Organizational Behavior and Human Performance 47:138–160.
Holsti, O.R. 1989 Crisis decision making. Pp. 8–84 in Behavior, Society, and Nuclear War, vol. 1, P.E. Tetlock, J.L.Husbands, R.Jervis, P.C.Stern, and C.Tilly, eds. New York: Oxford University Press.
Hopmann, P.T., and T.C.Smith 1978 An application of a Richardson process model: Soviet-American interactions in the test-ban negotiations, 1962–63. In The Negotiation Process, I.W.Zartman, ed. Beverly Hills, Calif.: Sage.
Hopmann, P.T., and C.Walcott 1977 The impact of external stresses and tensions on negotiations. Pp. 301–323 in Negotiations: Social Psychological Perspectives, D.Druckman, ed. Beverly Hills, Calif.: Sage.
Horowitz, D.L. 1985 Ethnic Groups in Conflict. Berkeley: University of California Press.
Hufbauer, G.C., J.J.Schott, and K.A.Elliott 1990 Economic Sanctions Reconsidered, 2nd ed. Washington, D.C.: Institute for International Economics.
Jervis, R. 1976 Perception and Misperception in International Politics. Princeton, N.J.: Princeton University Press.
Jervis, R., R.N.Lebow, and J.G.Stern 1985 Psychology and Deterrence. Baltimore: Johns Hopkins University Press.
Khong, Y-F. 1991 The lessons of Korea and the Vietnam decisions of 1965. Pp. 302–349 in Learning in U.S. and Soviet Foreign Policy, G.W.Breslauer and P.E.Tetlock, eds. Boulder, Colo.: Westview.
King, G., R.O.Keohane, and S.Verba 1994 Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton, N.J.: Princeton University Press.
Kressel, K., E.A.Frontera, and S.Forlenza 1994 The settlement orientation vs. the problem-solving style in custody mediation. Journal of Social Issues 50:67–84.
Kriesberg, L. 1996 Coordinating intermediary peace efforts. Negotiation Journal 12:341–352.
Lebow, R.N. 1981 Between Peace and War: The Nature of International Crisis. Baltimore: Johns Hopkins University Press.
Lijphart, A. 1971 Comparative politics and comparative method. Americal Political Science Review 65(3):682–693.
1984 Democracies: Patterns of Majoritarian and Consensus Government in Twenty-One Countries. New Haven, Conn.: Yale University Press.
Little, D. 1991 Varieties of Social Explanation: An Introduction to the Philosophy of Social Science. Boulder, Colo.: Westview Press.
Mooradian, M., and D.Druckman 1999 Hurting stalemate or mediation? The conflict over Nagorno-Karabakh, 1990–95. Journal of Peace Research 36(6):709–727.
National Research Council 1988 Enhancing Human Performance: Issues, Theories, and Techniques. Committee on Techniques for the Enhancement of Human Performance, D.Druckman and J.A. Swets, eds. Washington, D.C.: National Academy Press.
1989 Perspectives on Deterrence. Committee on Contributions of the Behavioral and Social Sciences to the Prevention of Nuclear War, P.C.Stern, R.Axelrod, R.Jervis, and R.Radner, eds. Washington, D.C.: National Academy Press.
Neustadt, R.E., and E.R.May 1984 Thinking in Time: The Uses of History for Decision Makers. New York: Free Press.
Patchen, M., and D.D.Bogumil 1995 Testing alternative models of reciprocity against interaction during the Cold War. Conflict Management and Peace Science 4:163–195.
1997 Comparative reciprocity during the Cold War. Peace and Conflict 3:37–58.
Putnam, R.D. 1993 Making Democracy Work: Civic Traditions in Modern Italy. Princeton, N.J.: Princeton University Press.
Ragin, C. 1987 The Comparative Method: Moving Beyond Qualitative and Quantitative Strategies. Berkeley: University of California Press.
Robson, C. 1993 Real-World Research. Oxford: Blackwell.
Schelling, T.C. 1960 The Strategy of Conflict. Cambridge, Mass.: Harvard University Press.
Smelser, N.J. 1976 Comparative Methods in the Social Sciences. Englewood Cliffs, N.J.: Prentice-Hall.
Stedman, S.J. 1991 Peacemaking in Civil War: International Mediation in Zimbabwe, 1974–1980. Boulder, Colo.: Lynne Rienner.
1997 Spoiler problems in peace processes. International Security 22(2):5–53.
Stevens, J. 1996 Applied Multivariate Statistics for the Social Sciences, 3rd ed. Hillsdale, N.J.: Lawrence Erlbaum.
Tetlock, P.E. 1998 Social psychology and world politics. Pp. 868–912 in Handbook of Social Psychology, D.Gilbert, S.Fiske, and G.Lindzey, eds. New York: McGraw-Hill.
Tetlock, P.E., and A.Belkin, eds. 1996 Counterfactual Thought Experiments in World Politics. Princeton, N.J.: Princeton University Press.
Wilson, R. 1989 Deterrence in oligopolistic competition. Pp. 157–190 in Perspectives on Deterrence, Committee on Contributions of Behavioral and Social Science to the Prevention of Nuclear War, P.C.Stern, R.Axelrod, R.Jervis, and R.Radner, eds. Washington, D.C.: National Academy Press.