Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 169
Proceedings of a Workshop on Statistics on Networks Dynamic Network Analysis in Counterterrorism Research Kathleen Carley, Carnegie Mellon University I am going to talk about some applied work that we have been doing, some of the issues that have arisen, and some of the challenges that the work drives for both network analysis in general or statistical approaches to network analysis. To set the context, let me present two real-world examples that we have come across and had to deal with using our models. The first one occurred over a year ago when we happened to be out in Hawaii. A group of individuals associated with the Philippine terrorist group Jemaah Islamiyah (JI) had just been arrested in a move that foiled a major bomb plot. The question posed to us, was what was likely to happen to JI following that arrest. When a question like this comes to the analyst it often takes several months to deal with it; however, we were told they needed the answer yesterday. So, one of the issues is, “How can we answer such questions, rapidly and accurately?” The other motivating example comes from analyzing the effects of an administrative change in a nation state. On the surface, the impact of a change in leadership seems like a very different question. We’re thinking of situations where, as a region of the world becomes more progressive, they become less antagonistic to the United States. Looking at the political elite in those countries to see how they are related to each other and to the military can reveal strategies by which small subtle changes in those groups can alter the lines of disagreement, possibly moving the groups more in alignment with U.S. interests. In contemplating such changes, we might ask questions such as what are the main lines of disagreement, can we influence them, would changing them likely alter the country’s attitude toward the United States. An issue here is, “How can we explore the impact of changes in a network of people on beliefs and attitudes?” On the surface the problems of JI and of state change are very different. They certainly bring to the forefront a lot of different kinds of data. However, there are analysts who face both of these issues using the same kind of data and in both cases using data that has strong network characteristics. That is, in both cases the data includes who is connected to whom, who has what resources, skills or beliefs, and who is doing what. In both cases, the question arises, what would happen if a particular node were to be removed; e.g., as when a member of JI is arrested or a political elite is removed from power. These and other questions are often addressed—things like: How vulnerable is the overall system? What groups or individuals stand out? Who are the key actors, key groups, key ideas, and key things? How can we influence them? What are the
OCR for page 170
Proceedings of a Workshop on Statistics on Networks important connections and so on, or what is the health of the organization, how has it been changing over time? Can we infer where there is missing data, which helps to focus intelligence gathering ideas? How different are groups? And so on. There is a whole slew of questions like these that need to be addressed, some very theoretical while others are of near-term, pragmatic interest, such as “What is the immediate impact of a particular course of action?” These are the kinds of questions that need to be addressed using network inspired tools. The tools that I will talk about today are the first tiny steps on the long road to helping people address these kinds of questions and meeting this very real and practical need. From a technical perspective what we want to do is be able to provide a system of evaluation for looking at change in multi-mode, multi-plex, dynamic networks. Maybe they are terror networks today, maybe they are drug networks tomorrow, but there is a whole set of these kinds of networks, and we want to analyze them under conditions of uncertainty. Finally, we want to place predictions in a risk context: that is, given sparse and erroneous data we want to estimate the probabilities of events and the likelihood of other inferences and the confidence interval around these estimates. From a user’s point of view it is imperative that we provide the tools and the ability to think about analysis and policy issues from an end-to-end perspective. We all know that network tools are extremely data greedy, so we need to embed them in a larger context of bringing in the data automatically, analyzing it automatically, and using different kinds of prediction capabilities to make forecasts and so basically free up human time to do real live analysis and interpretation. At CMU we have developed a few tools as shown in Figure 1, but these are just examples of a lot of the tools that are out there. For every tool I will mention there are dozens more that more or less meet a similar purpose. Overall, our tool chain serves to bring in a set of raw text, like newspaper reports, and, using various entity extraction and language technology, identifies various networks. These are networks of people to people, people to ideas, people to events, et cetera. We then take those networks and analyze them. We are using a tool called ORA1 for doing that, which lets us do things like identify key actors, groups, and so on. Once we have done that, the tools give us some courses of action that we might want to analyze. We take those, put them into a simulation framework, and evolve the system forward at time. All of this sits on top of data bases, etc. Basically, the set of tools help you build a network so you may find points of influence, and then help you assess strategic intervention. The important thing from a technology standpoint is that all of these things have to be built by lots of people, they have to be 1 ORA is a statistical toolkit for meta-matrices that identifies vulnerabilities, key actors (including emergent leaders), and network characteristics of groups, teams, and organizations.
OCR for page 171
Proceedings of a Workshop on Statistics on Networks made interoperable, and we have to start thinking about things that we don’t normally think about as network arrows. FIGURE 1 Integrated Tool Chain for Dynamic Network Analysis From a network assessment perspective I am only going to concentrate on the middle block of Figure 1, which deals with network analysis. Basically, we ask four fundamental types of questions. One is who are the key actors? We want to do this from a multi-criteria perspective, not just worrying about who is key in terms of the social network but also other kinds of power, such as that stemming from access to money, resources, and so on. Second, we want to know about the emergent groups. What are the groups and who are there key members and leaders. Third, we want to know how we can influence someone or some group. And fourth, we want to characterize the network. Knowing what kind of network it is, is critical because the intervention options depend on the type of network. Our tool for network analysis, ORA, is a dynamic network analysis (DNA) tool for
OCR for page 172
Proceedings of a Workshop on Statistics on Networks locating patterns and identifying vulnerabilities in networks. It lets you run a whole series of statistical tool kits on the networks and pull out a whole set of measures. It then organizes these into various reports, such as reports for intelligence, management, risks, etc. Importantly, ORA lets you utilize multiple kinds of data, not just who-talks-to-whom type data but also who has access to what resources, who has been involved in what events, who has been seen at what location. It uses these different kinds of multi-mode multi-plex data to help predict, think about, and infer actions from one network to another. FIGURE 2 Illustrative Meta-Matrix for Dynamic Network Analysis In other words, ORA uses something we call the meta-matrix approach to evaluate connections among multiple entities at varying strength, as illustrated in Figure 2. Traditional social network analysis tends to focus on just a single entity class, such as people to people. In contrast, DNA uses networks connecting not just people to people, but people to knowledge or resources, attacks or events, and organizations, and there are other fields as well. This approach is multi-mode and multi-plex, and thus more powerful than a single mode technique. It takes us beyond the traditional social networks analysis by enabling the simultaneous analysis of relations among many types of entities. Before telling you about some of the results from using these tools, I want to highlight some of the major issues we have come across, basically as caveats as to why it is difficult to get useful insights about dynamic networks if one only relies on simple social network techniques.
OCR for page 173
Proceedings of a Workshop on Statistics on Networks The first caveat is that you really need multi-mode, multi-plex data to reveal relations. This is certainly true when people are trying to hide, when groups operate covertly. In covert cases, we often have to infer network relations from other kinds of data, such as co-presence at a large number of events. Second, we need to collect data from multiple sources, implying different collections methods and biases, and so we need protocols that help us triangulate from those multiple sources to infer the actual relations. Finally, we have also found the need to consider networks over time; specifically looking at data from multiple time periods reveals what relations remain constant and which change. As an example of the first of these challenges, I point to an analysis we conducted about one particular Middle Eastern country. We identified a social network for the political elite based on open source data. This observed network, Figure 3, suggests that the society is not strongly connected. (See Figure 3, noting the lack of connectivity and the isolated subgraphs, the labels are not important here.) FIGURE 3 Political Elite Network Mideast But this impression is wrong. When we dug deeper and examined the knowledge and resources that these various people had access to, we in fact found a lot of connectivity. The nature of the society is such that members who have shared a resource have probably actually met, and so we were able to infer many other social connections that we wouldn’t have been able to infer without resource access data. The resulting network, shown in Figure 4, is a very connected group. This
OCR for page 174
Proceedings of a Workshop on Statistics on Networks structure reveals that the society has a dual core with two competing cores. One such core tends to be more reformist than the other. It is important to note that multi-mode data gives us a very different picture of the society and the connections among the elite than we would have inferred from just the social network in the open-source data. FIGURE 4 Political Elite with Connections Through Knowledge and Resources—MidEast The next example is about collection methods. In order to understand better the insurgency in Iraq, two sets of data have been collected. But the data was collected by two separate subgroups that refused to talk to each other, and they collected the data in very different ways. The first group collected data on incidents (left in Figure 5); for each incident they generated a report that said who was involved in that incident and what was going on. Analyzing the incident data results in a network that is basically a set of isolated individuals or groups, because few of the people involved in the individual incidents had the same name. Looking at just this data, we might conclude that the insurgency is really a bunch of disconnected groups copying one another’s methods. In other words, we might conclude that fighting it, would be like fighting fires, and very difficult to counter or stop in the long run. Even when you codify location it doesn’t help; that is, the conclusion still appears the same.
OCR for page 175
Proceedings of a Workshop on Statistics on Networks The second group that collected data focused more on the resistance movement, and they included information on who attended resistance movement meetings. Based on this attendance data we get the picture of the resistance network shown on the right in Figure 5. It’s a completely different structure than in Figure 4, showing more of a core periphery structure. Based on just this data, we might conclude that there is a central controlling unit that was calling the shots. In this case, fighting the resistance could be successful if we could identify and alter that core. Now many of the same people appear in both data sets. If we combined them we find that the little networks that were identified through incidence data are basically on the fringe of the core of the resistance. This would alter further the operational conclusion. The point of this example is that when dealing with covert groups and working to set operational direction, it is generally useful to collect data in multiple ways and then combine them before drawing any operational conclusions. FIGURE 5 Social Network Based on Incidents and Resistance What about the case where there is data from multiple time periods? For Iraq, we had the opportunity to analyze newspaper data prior to the election. In a very quick proof-of-concept exercise, we collected data on both the incidents and resistance movement. We had a set of people, and we tracked them over a one-month period from November 19 to December 25. At that time, the prevailing view was that the Iraqi insurgents were a bunch of disconnected groups, that didn’t have anything to do with each other. When we actually started taking those people over time using newspaper reports we found that there was a core group of about 50 people who kept showing up repeatedly. As you tracked them over time you saw that they did have very strong connections with each other and they did have a very strong group structure. In fact, we were able to actually assess who the leaders were once we started capturing data over time. Thus, the third issue that is critical if network techniques are to be used in meet applied needs, is that
OCR for page 176
Proceedings of a Workshop on Statistics on Networks data must be captured and assessed over multiple time periods. Once we got all that data we asked questions like who do we target? Where are the vulnerabilities in these kinds of networks, who are the leaders, who stands out, and so on? Using traditional social network measures, based on a single mode data of just who talks to whom, we would typically reveal node attributes such as the centralities: who is the most connected or which of two figures is on the most paths. In the 9-11 hijacker network constructed by Valdez Krebs, four individuals stand out—Alhazami, Hanjour, Atta, and Al-Shehhi. If you actually track the full network and are able to pull out the centralities you have a good understanding of different ways of affecting the group, because then you can affect the flow of communications, the transmission of illness, and so on. However, as was certainly the case with the hijacker network, often you only see part of the network. As such, these centralities can be misleading. Moreover, from an intervention perspective, you sometimes are more concerned with issues of power, control or expertise. As such, you might be better served by using exclusivity measures, like who has exclusive access to particular resources and so forth. Thus, another issue is that you need, since for many applied concerns, you need to move beyond single-mode single plex data, you also need new metrics for identifying core vulnerabilities. In building the dynamic network tools we have tried to address the issues described above, and others. We assess not only the social network but also go on to ask what resources do the members of the network have access to, and how does that access relate to their position in the social network. With event and task data we go still further, and ask who has been at what events, who is doing what tasks, and so on, and use that information to infer missing social linkages and identify different social roles. It is still possible to calculate things that like centralities, but they are on multi-mode data and as such begin to reveal individuals’ roles. Individuals can standout on a number of dimensions, not just by virtue of their communication. For example, one metric we have developed measures who is likely to be the emergent leader, using a multi-mode multi-plex metric called cognitive demand. Additional roles can be defined in terms of work load, subsequent event attendance, and exclusive access to resources. All the things that we want to talk about with respect to individuals and why they are important in networks can begin to be pulled out when we place networks in this kind of dynamic multi-mode multi-plex context. This dynamic approach affords a much richer understanding of how we might impact the underlying network. The person who is connected to a lot of other people but doesn’t have special expertise or doesn’t have high cognitive demand is probably a good target for going to and getting information. However, if I want to break the system, I might want to go in and pull out
OCR for page 177
Proceedings of a Workshop on Statistics on Networks the individuals who have exclusive access to particular resources, so I want to use an exclusivity metric. If I want to impact not just the formal but the informal leadership then I would want to use cognitive demand. Once I’ve identified a possible individual, I might want to ask how I can influence that person. To answer that I would ask who are they connected to, what groups are they in, what do they know, what resources do they control, what events have they been at, and so on. Figure 6 shows the sphere of influence for one actor, showing their focus—their ego network, basically— but in this multi-mode multi-plex space. In addition, as part of the sphere of influence analysis, we also calculate how the actor fares relative to everyone else on a number of metrics. And, we assess who are the closest others to them in that network; i.e., who are most like them structurally. Now Figure 6 implies that Khamenei is close to Khatami in this data set. On the one hand, this is funny because they hold completely opposite political views. On the other hand, this makes sense, because their political positions ensured that they would be connected to the same others. FIGURE 6 Sphere of Influence Analysis
OCR for page 178
Proceedings of a Workshop on Statistics on Networks We next want to ask how such a network can be broken into groups. In Mark Newman’s paper, he introduced some grouping algorithms. Grouping algorithms are extremely important in this area. We use grouping algorithms in a dynamic sense to look at the evolution of the structure of a network. Figure 7 shows the structure of al Qaeda in 2000 as based on open source data. Figure 8 shows some of the groups that were extracted from that data using one of Newman’s algorithms. The block model reveals a structure that basically shows a bunch of individuals who are all tightly connected along the diagonal to a bunch of separate groups, and then a few individuals who serve as liaisons connecting multiple groups. The upper red horizontal rectangle in Figure 8 is the row of group connections for Zawahiri, while the lower red horizontal one is for bin Laden. They cross connect between groups. This is what is now referred to as the classic cellular structure. FIGURE 7 Al Qaeda 2000—Open Source Information
OCR for page 179
Proceedings of a Workshop on Statistics on Networks FIGURE 8 Al Qaeda 2000—Cellular Structure Using those techniques, we can now look at change over time in network. By simply running these grouping algorithms on the network at multiple points in time, and tracking which critical individuals move between them, we begin to track the extent to which these networks are evolving and changing their basic structure. I am going to show you some networks and talk about this in the context of some work we have been doing on Al Qaeda using data collected from open sources on what it looked like in 2000, 2001, 2002, and 2003. In the Figure 9 the meta-network of al Qaeda circa 2000 (left) and 2003 (right) is shown; with different symbols for people, resources, locations, etc. The red circles are people. All the other things represent known locations where operatives were, resources, knowledge at their disposal, roles that people were playing, and critical events. The point that I want to make about this is that there is a lot more information here than just who was in contact with whom. We know a lot more about al Qaeda. This means that, if we are going to be smart about this, we have to use that other information to be able to think about how to evolve these networks and how they are changing and to
OCR for page 180
Proceedings of a Workshop on Statistics on Networks infer what the social network looks like. FIGURE 9 Meta-Matrix for Al Qaeda—2000 and 2003 Has al Qaeda remained the same over time? In short, no; in fact, it has changed quite a lot. Although you can’t tell it from Figure 10, if we were to actually zoom in on the graph for 2003, the red circles, which are the people, actually form two completely separate, broken apart, sub-networks. Whereas, it was one completely connected group back in 2000—except for the isolates—but in 2003 we have got two big separate cores with no connections between them. They are only connected through having been from the same locations, which may be dead letter drops, or who knows. Let’s examine this data in more detail. So far, I am just showing you two visual images and asserting that these pictures are different. Behind this are a lot of statistics for looking at how the networks change. For example, in Figure 10 we look at the movement of al Qaeda over these four years on a number of dimensions. Over time, the density of the overall network has gone down. Basically, this is attrition effect, and it has gone down not just in terms of who is talking or who is communicating with whom, but it has gone down in access to resources, access to knowledge, and involvement in various tasks. At the same time, the network was become increasingly adaptive as a group from 2001 to 2002, but it had suffered so much attrition by 2003 that it kind of stabilized in a new organizational form. In terms of other factors, we know that, over time, the communication structure of the group has changed, such that the average shortest path among actors, for example, has actually increased. So, on average, communication should
OCR for page 181
Proceedings of a Workshop on Statistics on Networks take longer. The communication congruence, which is a mapping of who needs to communicate with whom to do the kind of tasks that they are supposedly doing, has actually improved. This suggests the very different organizational structure that evolved between 2000 and 2003. A leaner, more tightly organized structure appears to have evolved. FIGURE 10 Change in Al Qaeda Over Time What about performance? Learning, recruitment and attrition lead to changes in the network which in turn impact performance. Some of these changes, e.g., learning are natural, and others, e.g., attrition are the result of strategic interventions. We have a whole series of tools for looking at dynamic change in networks, for looking at the effects of both natural evolution and interventions. Basically these tools—we have used ORA to estimate immediate impact using
OCR for page 182
Proceedings of a Workshop on Statistics on Networks comparative statics and DyNet to estimate near-term impact using multi-agent simulation—let you look at change in networks immediately or in the near term, and to look at change hypothetically or for a specific network. You can use these tools to ask questions about network evolution. Fundamental principles from cognitive psychology, sociology, and anthropology about why people interact, the tendency of people to interact with those who are similar, tendency of people to go to others for expertise, and so on, form the basis for the evolutionary mechanisms in DyNet a multi-agent technique for evolving networks. DyNet can be used to address both natural and strategic change. For example, we took the al Qaeda network, shown in figure 7 and evolved it over time, naturally. The result was that the overall network, in the absence of interventions such as arresting key members, oscillated back and forth between a two core and single core structure. The structure at 300 steps is similar to that at 100 steps, with a totally connected structure intermediate at 200 and 400 steps. Without intervention, it just oscillates between these two forms over time, according to these basic fundamental principles. We also used DyNet to examine the effects of hypothetical changes to stylized networks composed by blending actual network data with network structure extracted from more qualitative assessments. For example, Figure 11 shows an expected change over time for Al Qaeda, which is in blue, and Hamas, which is in red, based on a combination of real and stylized data drawn from qualitative assessments. On the left, we see the state if we just left them alone, in which al Qaeda out-performs Hamas and so on, and on the right what would happen if the top leader were removed from each. We did this analysis a while back, when Yassin was the Hamas leader and, in fact, Hamas’ performance did improve once Yassin was removed. FIGURE 11 Impact of Removal of Top Formal Leader
OCR for page 183
Proceedings of a Workshop on Statistics on Networks Finally, we can use DyNet to do course of action analysis and ask “How do these networks evolve, when there are specific strategic interventions?” We might, take a network, for example the al Qaeda 2000 network from Figure 7, and isolate the individuals who stand out on some of the various measures. For example, the individuals who were highest in degree centrality or betweenness or cognitive demand might be removed; i.e., bin Laden or Baasyir. We could, alternatively, remove a whole group of people, such as the top 25 in cognitive demand Each of these what-if scenarios represents various courses of action. Running this series of virtual experiments results in comparative impact statistics like those in Figure 12. In this example, any of the actions considered would lead the system to perform worse than it is now. Caveat: these results were generated using a multi-agent simulation on a network extracted from open source. The relative value of this analysis is not to make a point prediction of what actually happens to these groups, it is not to predict specific reductions such as that pulling out all the high-cognitive-demand people will result in 40-percent lower performance. Rather, the value of this analysis is to show relative impact; e.g., that the relative impact will be stronger were you to remove all those people versus removing just bin Laden and that any intervention is more crippling than none. FIGURE 12 Relative Impact of Different Courses of Action
OCR for page 184
Proceedings of a Workshop on Statistics on Networks You can also use such an analysis to go further and ask, if we were to remove particular individuals, then—since people are learning and changing anyway—who is going to start interacting. That is, where should we see the impact of change? In fact, we might find many people starting to interact in the simulation. An example is shown in Figure 13. FIGURE 13 Emergence of New Relations After Removal of bin Laden If I were the analyst, I would look at these results in Figure 13 and ask “What does it mean that the simulation is predicting that Salah and Sulaeman will start interacting with a probability of one?” Well, it is actually .999999999. Even so, is the probability really that high? What that means is that there is a good chance that these two individuals are already interacting and additional information gathering efforts should be directed to confirming this, if it is critical. The point here, is that suggestions for information gathering and about missing data are some of the side benefits from doing predictive analysis of network change. In summary, from an applied perspective there is a need to move beyond simple network analysis to look at dynamic multi-mode, multi-plex approaches. Doing so will require us to put network statistical analysis as a component within a larger tool chain moving from network extraction, to analysis, to simulation, and employing more text mining, data mining and machine learning techniques. The tools I have shown you are a step in this direction, and their use has raised a number of key issues that the network community needs a deal with. At this point, I will open it to questions.
OCR for page 185
Proceedings of a Workshop on Statistics on Networks QUESTIONS AND ANSWERS DR. HOFF: This is a very sophisticated model that you have for the behavior of the networks under changes. Even when we look at observational studies of the impact of, say, beta carotene on people’s health, we get that wrong. So, you need to do a clinical trial to figure out what the causal effect is. Is there anything you can do, or what sort of methods do you have for sort of checking—you talk about what will happen if we modify X. Is there any way of checking that or diagnosing that? DR. CARLEY: This is going to be kind of a long answer. There were three parts to that question. The first thing is that, in the model, there are three fundamental mechanisms for why individuals change their level of interaction. One is based on this notion called relative similarity or homophily based interaction. I interact with you because it is comfortable; you have stuff in common with me. Another is expertise, you have something I need to know. And the third is task based, we are working on the same task, so we start interacting. Those three basic principles all came out of different branches of science, and there is lots of evidence underlying all three of them. So, we began by taking things that had been empirically validated, and put them in the network context. Then what we did was, we took this basic model and we have applied it in a variety of contexts. For example, we applied it to NASA’s team X out in California and to other groups at NASA. We applied it to three different companies that don't like to be talked about. In all cases we used it to predict changes in interaction, and in general we got between a .8 and .9 prediction of changes and who started to interact with whom. It was less good at who stopped interacting. So, we know that there is a weakness there. In terms of the covert networks themselves, we actually used this model to try to predict who some of the emergent leaders were in some of the groups over time. I can tell you that, in the case of Hamas, we correctly predicted the next leader after Yassin using this model. Does it need more validation? Of course, but that is the beginning answer. The other part of the answer, though, is that one of the ways that things like this are validated in the field, and one of the things that people like about models like this in the field, is that it helps them think systematically about data. Thinking systematically is really important. The second thing is that, even if the model gets it wrong, it suggests places to look to gather data and, when you get that data back in, you can then modify and upgrade and adapt things based on that new data, which is what I think we are in the process of doing.
OCR for page 186
Proceedings of a Workshop on Statistics on Networks DR. GROSS: When you said you used a variety of statistical tools, other than calculating measures, did you use any other statistical tools, cross validation or boot strapping? I don’t know, I am just mentioning random— DR. CARLEY: The tool itself basically has a lot of measure calculations. It also has an optimizer system for optimizing the fit between a desired network form and an actual network form. It also has a morphing procedure for morphing from one form to another that is based on facial morphing technology, and it has some clustering and other techniques for grouping. In terms of the simulation models, we have tried different types of validation techniques. DR. BANKS: I have a quick question. It is sort of technical. You were using a Hamming network on the two networks as a measure of adaptability, and I am not quite sure what the intent of adaptability is there. DR. CARLEY: Basically, one of the ways in which people think of organizations as being adaptive is that they change who is doing what and who has access to what resources, and who is working together. Rapid changes in those are often considered a key leading indicator of adaptivity. That doesn’t mean improvement in performance. It just means it is adaptive from the organizational sense to change. So, what we are looking at is a Hamming metric between— thinking of this as an organization—what the connections were in this multimode state at time one versus two and that becomes our measure. DR. MOODY: Could you speak about what you mean by performance? You have reductions in performance, but I could imagine numerous dimensions of performance. DR. CARLEY: Right now, in the system, there are two things that are considered performance metrics that people are using the system on. One is basically a measure of information diffusion, which is basically based on shortest path, with the idea that systems tend to perform better if they are closer together and so on. A second measure of performance is what we call our performance-as-accuracy metric, which is a simulation technique that estimates, for any given organizational form, its likelihood of correctly classifying information, given a very trivial classification choice task. It is not a specific measure of how good are you at recruiting or how good are you at X, Y, and Z, but it is a generic thing of how good is a structure, is this structural topology good at doing classification choice tasks. We use that because there is a lot of evidence out of organization science that, in fact, a huge portion of tasks that any group does are classification choice tasks. So, this seems to get at it, and we have some validation in the sense that we have looked at 69 different organizations in a crisis, non-crisis condition, and you can correctly get their ensemble performance differences using this kind of a generic measure. It is a good indicator.
OCR for page 187
Proceedings of a Workshop on Statistics on Networks DR. HANDCOCK: Leading into the next session, could you make some comments on the reliability and quality issues and, in particular, about missing sample data? DR. CARLEY: Well, the first thing I will say is that these metrics in here don't yet have confidence intervals about them, showing how confident we are that this metric is what it is given the high levels of missing data. So, that is an unsolved problem. The second thing is that we have been doing a series of analyses, as I know several other people here have been doing similar analyses, looking at networks where we know what the true answer is, and we then sample from it, and we then re-estimate the measures to see how robust they are. Steve Borgatti and I have just finished some work in that area, which suggests that, for a lot of these measures, if you have 10 or 20 percent errors, that as long as you are not trying to say, this person is number one, but instead say, this person is in the top 10 percent, you are going to be about right some 80-90 percent of the time. So, it has that kind of fidelity. Is it worse or better with particular types of network structures? We are not sure yet. Some of our leading data suggests that for some types of network structures, like corporate free networks, you can in fact do better than that.