Cover Image

Not for Sale



View/Hide Left Panel

Mixing Patterns and Community Structure in Networks

Mark Newman, University of Michigan and Santa Fe Institute


DR. NEWMAN: I am going to be talking mostly about social networks because that is mostly what I know about. I hope that some of the ideas I talk about will be applicable to some of the other kinds of networks that some of the others of you work on.

First, I want to mention some of my collaborators, one of whom, Aaron Clauset, is in the audience. The others are Michelle Girvan, Cris Moore, and Juyong Park.

FIGURE 1

The story I am going to tell you starts with an old idea from network analysis. We believe that many of the networks we study divide up naturally into groups.

Suppose I am talking about a social network. I will assume that you know what a social network is. It is a network where the nodes are, for example, people and they are connected together by some social interaction such as they’re friends, or they have business acquaintanceships, or they have been in close physical proximity recently. Choose the definition that is relevant to your particular problem. So, you have a network of vertices and edges as shown in Figure 1, and we are going to assume that it divides up naturally into communities, groups, clusters, or whatever you would like to call them. First of all, one thing that we might be interested in doing is finding those groups, if they exist in the first place, and if they do exist, determine what kind of groups they are. For example you could be interested in a social network and be looking at groups of people in a network. You could be interested in the World Wide Web, and be looking for groups of related Web pages. You could be looking at a metabolic



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 74
Mixing Patterns and Community Structure in Networks Mark Newman, University of Michigan and Santa Fe Institute DR. NEWMAN: I am going to be talking mostly about social networks because that is mostly what I know about. I hope that some of the ideas I talk about will be applicable to some of the other kinds of networks that some of the others of you work on. First, I want to mention some of my collaborators, one of whom, Aaron Clauset, is in the audience. The others are Michelle Girvan, Cris Moore, and Juyong Park. FIGURE 1 The story I am going to tell you starts with an old idea from network analysis. We believe that many of the networks we study divide up naturally into groups. Suppose I am talking about a social network. I will assume that you know what a social network is. It is a network where the nodes are, for example, people and they are connected together by some social interaction such as they’re friends, or they have business acquaintanceships, or they have been in close physical proximity recently. Choose the definition that is relevant to your particular problem. So, you have a network of vertices and edges as shown in Figure 1, and we are going to assume that it divides up naturally into communities, groups, clusters, or whatever you would like to call them. First of all, one thing that we might be interested in doing is finding those groups, if they exist in the first place, and if they do exist, determine what kind of groups they are. For example you could be interested in a social network and be looking at groups of people in a network. You could be interested in the World Wide Web, and be looking for groups of related Web pages. You could be looking at a metabolic 74

OCR for page 74
network and be looking for functional clusters of nodes. There are a variety of situations in which you might be interested in this. Finding groups in networks is an old problem. It is something that computer scientists, in particular, have looked at for many decades. But there is a difference between what we want to do with social networks, for example, and what computer scientists have traditionally done. First of all, in the computer science problems, traditionally when you are looking for the groups in networks, there are some additional constraints that we don’t have, such as you know how many groups you want beforehand. A standard computer science problem might be that you have a bunch of tasks and you want to divide them up over many processors on a computer, and you know how many processors you have. You know how many groups you want to divide them into. Very often you might want the groups to be of roughly equal sizes. That is a typical constraint in the kinds of algorithms that computer scientists look at because, for example, you want to load balance between many processors in a computer. In the problems we are looking at that is not often true. What is more, there is a fundamental philosophical difference compared to the traditional approach of partitioning a computer network, in that we are assuming here that the social networks we are looking at naturally divide into these groups somehow. In the kinds of problems you want to solve in computer science, often you are just given the network and you want to find whatever the best division of it is into groups, such that you have a bunch of these tightly knit groups, and there are lots of connections within the groups, and only a few connections between the groups, and you do the best you can with whatever network you are given. We are taking a slightly different point of view here, and assuming from the outset that there is some good division for some good sociological reason, for example, that your social network naturally divides up into these groups, and we would like to find the natural divisions of the network. And those natural divisions might involve dividing it into some small groups and some large groups, and you don’t know how many groups there will be beforehand, and so forth. There are old algorithms for doing this from computer science, like the one shown in Figure 2, which you might be familiar with, the so-called Kernighan-Lin algorithm from the 1970s. In this algorithm, if you want to divide a network into two parts, you can start by dividing it any way you like, essentially at random, and then you repeatedly swap pairs of vertices between the two parts in an effort to reduce the number of edges that run between the left-hand group and the right-hand group in Figure 2. You continue doing that until you can’t improve things any more, and then you stop. This is typical of the kinds of things that traditional algorithms do. It works very well but it has some constraints. First of all, you need to know the sizes of the groups 75

OCR for page 74
you want beforehand, because when you swap two vertices, you don’t change the size of the groups. Each group gets one vertex and loses one vertex. FIGURE 2 Furthermore, the Kernighan-Lin algorithm only bisects. It starts off with two groups, and it ends up with two groups. If you want more than two groups, then you would have to do this repeatedly, repeatedly bisect, divide into two, and then divide the two into two, and so forth. However, there is no guarantee that the best division into three groups is given by finding first the best division into two and then dividing one or the other of those two groups. Something that I like better is shown in Figure 3, spectral partitioning, a rather elegant method that is based on matrix-algebra-type ideas. This is also something that has been worked on since the 1970s. In this approach, we take the so-called graph Laplacian, a matrix representing the network in which the degrees of the vertices are down the diagonal. The degree of a vertex is how many connections it has. There is a -1 off the diagonal in each position corresponding to an edge in the network. Minus 1 here means that there is an edge between vertex 2 and vertex 1. This is a symmetric matrix in an undirected network, and it has an eigenvector of all 1s with an eigenvalue of 0. 76

OCR for page 74
FIGURE 3 If you have a block diagonal matrix, like the one shown in Figure 4, one that divides completely into two separate communities with no connections between them, then it has two eigenvectors with eigenvalue 0, one with all the 1s corresponding to the first block and one with all the 1s corresponding to the second block. And, of course, any linear combination of those two is also an eigenvector with eigenvalue 0. FIGURE 4 If you have a network that is approximately block diagonal, that is precisely the sort of 77

OCR for page 74
thing we are talking about—in other words, it has two communities, two separate groups, but they are not quite separate, there are a few connections between them, so we have a few off diagonals here and here. Then, as shown in Figure 5, you still have one eigenvector with eigenvalue 0, and the second one is slightly perturbed and has a slightly positive eigenvalue. You can prove that it is always positive. By looking at the properties of that second eigenvector you can deduce what those two blocks must have been. It is actually very simple. You just look at the signs of the elements of those eigenvectors and assign vertices to the two groups according to their sign. This gives a very nice and principled way of bisecting a network into two groups. In this case, you don’t have to specify what the size of the two groups is. You just find—if your matrix really is approximately block diagonal with two blocks—what those blocks are. But this algorithm only bisects, and it is a rather slow algorithm, since it involves finding the leading eigenvectors of the matrix. Typically for large networks you do this by the Lanczos method because it is a sparse matrix, but it is still a pretty slow thing to do. FIGURE 5 Looking particularly at social networks, but potentially for other networks, we have been thinking about other methods of solving this problem for the kinds of networks we are looking at. Many people have worked on this in the last decade or so. I am going to tell you about some stuff that we have done, just because I was involved in it and I like it, but of course it is certainly not the only thing that has been done. Figure 6 shows one method we looked at; this is work that I did with Michelle Girvan, who is at the Santa Fe Institute. Suppose you have a network like the one in Figure 1, and it divides into groups where there are many edges within the groups and only a few edges between the groups. The idea is going to be, suppose I live within one of the dotted circles and I want to 78

OCR for page 74
drive to a node within one of the others. I would then go along one of the edges within my circle to get to the edge that connects the groups, which is like a bridge. In fact, if I lived anywhere within my dotted circle and I wanted to drive anywhere within another circle, I am going to end up going over that bridge. If you live in San Francisco and you want to get to the East Bay, you would go over the Bay Bridge, and that is the way you have to go. That means there is a lot of traffic on the Bay Bridge because everybody goes that way. These bridges between communities are things that are going to have a lot of traffic if we consider, in some simulation kind of a way, traffic moving around from every part of the network to every other part of the network. A simple concept that captures this idea of betweenness was first formulated by Lin Freeman, who is also here somewhere. So, the idea here is that we find the shortest path between every vertex in the network and every other vertex in the network, and then we count how many of those shortest paths go along each edge in the network. This is like a primitive measure of how much traffic will be flowing along each edge of the network. Then, because some of these edges are the few edges that lie between the communities, those ones are going to get a lot of traffic. We identify the edges between the communities as those with the highest value of this betweenness measure, and then we remove them from the network. Once you remove them and those edges there are gone, you are left with the three clumps, and these are your communities. FIGURE 6 It is a little bit more complicated than that in practice. The principal problem is if you have two edges between two communities, like the two at the top in Figure 1, then there is no guarantee that they will both get a high value of betweenness. It is certainly the case that the sum of the number of paths that go along one and the number of paths that go along the other should be high but, under certain circumstances, it is entirely possible that one of them could be high and 79

OCR for page 74
one of them could be low. If you remove the edges with the high betweenness, you wouldn’t remove all the ones that connect groups, only some of them. In order to get around that, you have to find the edge with the highest betweenness, remove it, and recalculate this betweenness measure—this traffic flow measure. If I remove one edge and recalculate, any traffic that previously had to go along that edge will now go along the other bridge instead. So, even if it didn’t have a high count before, it does have a high count now. It is crucial that you do this recalculation step at each step of the algorithm, and that slows things down a bit. That is the primary disadvantage of this method. In fact it takes a moderately long time to do the calculation. We did come up with a nice way of speeding up the calculation. Naively you would expect this to be an O(n4) calculation. But some of the calculations can be reused, and you can get it down to an O(n3) calculation, but that is still prohibitively slow for very large networks. You couldn’t do this on the World Wide Web graph or something like that, but it could typically be done on the sizes of social networks we are looking at. It works up to tens of thousands of vertices. This works quite well and I will give you some examples. Some of these are whimsical examples, but there is a reason for that. Figure 7 shows a famous network from the social science literature. At the top is a network from a study done in the 1970s of a karate club at a U.S. university. The vertices represent students in this club who were practicing their karate, and connections between them represent friendships as determined by direct observation of the members of the club by a sociologist who was working with them. This is an interesting example. The sociologist, Wayne Zachary, studied this club for two years and fortuitously it turned out during the course of these two years that a dispute arose between two factions within the club, and eventually the club split in two and one half went off and formed their own club. The theory is that you should be able to spot that coming split in this network, even though this network here was constructed before the split occurred. In fact, you can do that. The white circles and the gray squares represent the factions as they were after the split. All we did was to take this network and feed it through our algorithm. The output of the algorithm is a so- called dendrogram, which is a tree as shown in (b) in Figure 7. It doesn’t matter too much what it indicates. The main thing to notice is that it splits the network into two groups, and the coding of the circles and the squares is the same as in the network in (a). You can see that it has very neatly found the group of the squares and the group of circles. There is only one error over there, vertex number three, which is the one right in the middle there, on the boundary between the two factions. Maybe it is understandable why it got that one wrong. 80

OCR for page 74
FIGURE 7 This is a network that was derived from observations made before the split, and yet clearly the network has in it information about what is going to happen in the future when the thing does split up. In a way, it is predicting what is going to happen, although it is predicting after we knew what the answer already was. Obviously, we wouldn’t have put this example up here if it didn’t work. Figure 8 gives another example. I am at the University of Michigan, and so I need to have an example about football. This is a picture—one of these dendrograms, one of these trees—that shows what happens when you feed the schedule of games for division 1-A college football into our algorithm. As you might know, college football is played in conferences, which are groups of schools that play together on a regular basis. If you feed the schedule in, it picks out the conferences very nicely. They are color coded by the different shapes and colors in this figure. The groups that the algorithm finds, as you can see, pretty much correspond to the conferences. In a way, this is a silly example but we look at examples like this for a good reason, which is they are cases where we think we know what the answers should be. You want to have applications to real world networks to show that this really is working, but you also want, when you are developing algorithms, to apply it to something where you can tell, after you did it, whether it is actually giving a sensible answer. This algorithm is always going to give some 81

OCR for page 74
answer. If you plug in some network and it spits out an answer and you can’t tell whether it is sensible, then it didn’t really get you anywhere. Many of the examples we look at when we are developing the algorithm are ones where we think we know what it ought to be doing before we start. FIGURE 8 Figure 9 gives one more example. In this case this is a food web, a non-social network example, a food web of marine organisms in the Chesapeake Bay. The interesting thing about what happens when we feed this one into the algorithm is that it appears to split the food web into the pelagic organisms, the ones that live in the upper layers of the water, and the benthic ones, the ones that live in the mud down at the bottom. It appears to be indicating that this ecosystem is divided into two weakly interacting systems with not very much going on between them, so it certainly does have applications other than the social networks. 82

OCR for page 74
FIGURE 9 One of the differences between what we do and what computer scientists do is to know whether the solution that we got to a problem is actually a good one. We don’t want to just get the best solution, whatever it is, we also want to know if the solution we got is no good; in other words, if there wasn’t really any community structure there in the network in the first place. To do that, we use a measure that we call modularity. The idea with modularity is basically to measure what fraction of the edges in our network fall within the groups that we find. What we would like is that most of the edges are within groups, and only a few of them between groups. However, if you only use that, it is not a good measure of what we want, because then you could put all the vertices in a single group together, and all the edges would be within that group, and you would get this 100 percent modularity, but that is obviously a trivial result, it is not a good division of the network. What we use instead is the fraction of edges that fall within the groups minus the expected fraction of edges that would fall within those groups. I can write down exactly how we calculate that if you like, but the basic idea you can see by examining that trivial partition. If I put all the vertices in one group together, then all the edges would be within that one group, but it is also expected that all the edges will be in that one group. Thus, you get 100 percent minus 100 percent, so the measure defined in the previous 83

OCR for page 74
paragraph is 0 rather than a high score. So we think of modularity as something which is high if the number of edges within groups is much greater than we would expect on the basis of chance. What we can do is take one of these dendograms, which represents the order in which the network gets split into smaller and smaller pieces as we remove the edges, and plot this modularity against it, as shown in Figure 10. What we typically find is that the modularity has a peak somewhere and then it falls off, and this peak indicates where the best split would be in this modularity sense, the best split in terms of getting most of the edges within groups. That is shown by the red line in Figure 10. The value there at the peak indicates how good a split it is. This is a measure that goes strictly from 0 to 1. It is about 50 percent modularity in Figure 10, which is pretty good. We usually find that there is something significant going on if it is anywhere above about 30 percent, and the highest values we have seen in real networks are about 80 percent. FIGURE 10 Figure 11 shows what happens if you apply this method to the karate club network. It actually finds two different things. It has a peak for the split into two groups that I talked about before, the one we know about. It has another peak for a split into five groups, and it looks like that second peak is actually a little bit higher. Maybe there is some other smaller-scale structure going on there in the networks that we didn’t know about before. 84

OCR for page 74
FIGURE 11 I simulated some data using some rank k. Does anybody want to hazard a guess as to what the rank of my mean matrix was? It probably wasn’t zero. I have a vote for four. That seems to be popular. Why might it be four? If you have played around with these things you know that you are looking for gaps in the magnitude from one singular value to the next. You know it is definitely more than zero. In Figure 11 you can see a big gap between 1 and 2, so it is at least 1, but then there is this big gap between 3 and 4, and then there is another little gap between 4 and 5, and then it seems to be monotonically going down. So 4 is a really good guess; the actual answer is 5, as shown in Figure 12. This goes to show you how well our eye is at doing these things. 109

OCR for page 74
FIGURE 12 Figure 13 shows the actual singular values that I used to generate the data matrix. Four is a fair guess because the fifth non-zero singular value is smaller than the others. I used the Bayesian machinery. You can get a posterior distribution for the mean matrix, and then you can look at the posterior mean of all the singular values. If you add up all the blue, it is measuring the total amount of variation that the model picked up in the data. Here is a posterior distribution on the rank of a matrix. This should say that Bayesian methods are good, because you had your intuition about what the rank was, and the Bayesian estimate matched up to that to a great degree. The posterior distribution says there is a very good chance that there are four or more dimensions to this mean matrix. The two highest probability ranks are four and five, as you might expect. 110

OCR for page 74
FIGURE 13 One other thing about doing Bayesian inference, if you do the singular value decomposition—here is my little plug for Bayesian analysis—suppose I told you the rank was 5, what is your least-squares estimate of the mean matrix, given that you know the rank is 5? Well, the least-squares estimate is actually the first five singular values and singular vectors, so you would just be taking the top of the green bars for the first singular values there. If you take the true matrix that was used to generate the data, tab all those entries, and plot them against the least-squares estimate, you get the green pattern shown on the right side of Figure 14. 111

OCR for page 74
FIGURE 14 If you take the Bayesian estimate you get something with a much lower mean square. What is happening here is the singular value decomposition is taking our data matrix Y and it is looking at the projections onto these subspaces. Data matrix Y is equal to the mean matrix plus error so the singular value decomposition is projecting all the error as well onto the subspace. It is probably best to point out that looking at the range of the truth is –3 to 3, and the range of the singular value decomposition is essentially –4 to 4. That is not atypical of these types of problems that sort of a least-squares estimate overestimates the variability in the matrix. The reason why I went through all that work was because I wanted to embed the concept of the singular value decomposition model into more complicated models for social networks. I wanted to incorporate this into very general models for all sorts of relational data, not just Gaussian data, not just binary data, but data of all sorts. The way I do that is with a generalized linear model. Figure 15 shows the general model for relational data. We have a sociomatrix Y. All of this generalizes to the case where Y is an n x m matrix; it doesn’t have to be a square matrix. 112

OCR for page 74
FIGURE 15 We have a set of explanatory variables X, and our model, relating X to Y, is as follows: we have what we call a predictor, θ, and it is equal to a regression part plus a latent structure part. This latent structure part is just this model that we talked about, the singular value decomposition model. The generalized linear model is not that our observed data is equal to X β + Z, but some function of the mean is equal to X β + Z. Some function of the mean of yi,j is equal to the regression part, a latent structure part, and lack of fit. In this context, you can fit binary data where you model the log odds of the observations as being equal to the structure. If you have count data you might model things on the log scale and, if it is continuous, you might model it on the additive scale, using the identity link. I should point out that a lot of data that people have been talking about is skewed data. I wouldn’t call it high-variability data, but I would call it skewed data. You have not just the first and second moments there, you have the third and fourth moments, and those are really the things that can actually represent that skewed structure. That can also be incorporated in here, although with more difficulty. You have to decide on what scale you are making the measurements. Actually, you could do continuous data and use a log link to model skewed data. An example of what I am going to show in the next two slides is some international conflict data. The model for how this really is generated is debatable, but it is a very nice data set because we can see the structure. Mark pointed this out in the last talk. It is good to work on 113

OCR for page 74
problems where you can tell if the answer makes sense or not. I have some colleagues in political science who are interested in modeling conflicts between nations. The data is from Mike Ward and Xun Cao, and what they measure is the number of militarized disputes initiated by i with target j over a 10-year period. This involves actual militarized disputes and conflicts as well as threats and a bunch of other things that they have recorded. They want to relate these to a bunch of covariates, so they have a bunch of theories about what causes nations to have conflicts with others and what things might ameliorate conflicts. There is this concept of democratic peace that democratic nations don’t have conflicts with each other. There is this idea that trade inhibits conflicts and so on. They like to evaluate these things and look at the structure that is left unexplained by these covariates. Figure 16 provides an overview. FIGURE 16 We fit this into our model. We say that we are going to model the mean of each observation on the log scale with this regression part plus this latent structure. I was very happy to see the posterior distribution on the number of dimensions of the latent structure as shown in Figure 17. Why was I happy? It’s because it is easy to plot 2-dimensional latent structures. You can plot 3-dimensional latent structure, but this k could have been anywhere between 0 to perhaps 130. If the dimension of the latent structure was 73, I wouldn’t be able to give you a very good 114

OCR for page 74
picture at the end of what is going on, or you could look at the first few dimensions of the latent structure. I should also point out that if you play with these data enough, you know that there is no way that the mean structure has dimension less than two. There are lots of countries that are very active in conflict, and a lot of countries that are attacked a lot; you know that you are going to have row and column effects, and that is a 2-dimensional structure. Therefore, you know the dimension has got to be at least two. FIGURE 17 The right half of Figure 17 shows some confidence intervals on the regression parameters. There are some obvious things to point out. As distance increases, the amount of conflict decreases. People have conflicts with countries that are near them, and population plays a big role. Populations of countries that are in conflicts tend to be higher than average. 115

OCR for page 74
FIGURE 18 Figure 18 shows the latent structure. I will finish by explaining this plot. Recall the model. The model for each country has a 2-dimensional vector of characteristics as an attacker and a 2-dimensional vector of characteristics as a target, so that each one of these countries has a vector as an aggressor. What I have done here is plot the direction of each country’s vector in red and that is this outer circle here. The magnitude of their plotting character is the magnitude of their vector in that direction. The inner circle is the direction of each country’s 2-dimensional latent vector as a target. Then I have drawn these green lines indicating the strength of the number of conflicts between the countries. So, you see somewhat obvious things because we know the names of the rows and the names of the columns. Those are the main structures there. Here is where I see if I made myself clear. If there wasn’t any latent structure, can anybody guess what this plot would look like, or what would be different about this plot? Where would the green line be? If there were no two dimensional latent structure, there would be no relationship between the actual links, or attacks between the countries, and their position in this circle. All you would see is a bunch of green lines scattered all over the place like that so you can see how these positions really do pick up the structure that is there. To summarize, my point is that singular value decomposition is a well-known tool in multivariate analysis that has been used for a long time, and there is a huge literature based on it, sort of in the additive case in the psychometric literature in lots of fields. It is something that is 116

OCR for page 74
well understood, although apparently the problem of identifying the dimension of the mean matrix has been left unsolved, at least in a Bayesian context, and actually in a non-Bayesian context it is also unsolved. Using this framework, the idea is that we can use this natural singular value decomposition model that we are familiar with, and incorporate it into different types of data structures, along with regression effects and other things that we use when we are doing statistical modeling. That is the main point of the talk. QUESTIONS AND ANSWERS QUESTION: One quick question. Could you just say a couple of words on what you are thinking about here with the time series extensions, the dynamic extensions, because that is something that is sort of natural to the topics we are going to be covering. DR. HOFF: I have always known that I should spend more time in the dynamic aspect of networks, but every time I gave a talk on latent variable models, everybody always asked, what is the dimension? So, I had to spend time doing the dimension first. Now that that is done, there are a number of ways that you can do dynamic network inference for these things, or incorporate these ideas. One way is to have a time-series model for the latent characteristics, as a Markov model or something like that, and actually David Banks and Eric Vance are working on stuff like that. Again, I have some other ideas about how to set these things up in a general autoregressive structure. Their ideas are kind of interesting, they have a behavioral slant on things. They are using ideas from social networks like, I am at this location in the social space, and I form a tie with this person, and I should move at the next time step, in this other direction. There are sort of dynamical models, people starting to work on these things with these ideas here. DR. BLEI: Your latent variables seem to model the connections between rows and columns that happen outside of the covariates. I can imagine, say, with countries, that two countries that don’t get along, that the covariates explain that, that their latent positions would be far apart. Is there some way to deal with that? DR. HOFF: In one sense, you could use this sort of approach with a slight modification to actually let the latent variables model the variation that is unexplained by those covariates. In a sense, that might be the goal. 117

OCR for page 74
DR. BLEI: You could actually model those jointly with the connection. DR. HOFF: Yes, depending on what your goal is, you could to it in a variety of different ways. For example, you might want to leave that variable out of it. I mean, if you know what the variable is, you might leave the variable out of the model, fit the latent characteristics, and then plot, say, the latent characteristics versus that variable. It depends on what you are trying to get out. One issue with including things in a model is, you are saying what the scale is, or saying it is additive. In a lot of cases it is not. It is like multiplicative. There can be an advantage to leaving it out and then plotting it versus the latent characteristics later. You could do what I just said, try to—the way I have the model set up, you can constrain the latent characteristics to be orthogonal to things, and you could actually restrict them to be orthogonal to sort of the design structure of the regression and say, yes, it is the left-over variation. It depends on what issue you are trying to address at the moment. DR. BLEI: One last question. A lot of these networks like the Internet are very sparsely connected, and I imagine that if you tried to fit a model that was just overrun with zeroes, say a logistic regression, you would end up just fitting the zeroes and you might not get anything interesting out of it. Have you had that issue? DR. HOFF: Yes and no. Actually, the networks that I have been modeling, these conflict networks that my political science colleagues fit, are extremely sparse. So, actually we do that. One thing he likes to show to political science crowds is that he fits this logistic regression without any sort of latent variables, and then you make a prediction on where the conflicts are. Well, your best bet is to just predict no conflict and that is it, and you just predict all zeroes. So, the r2 is very bad for these types of models. The benefit from doing something like this is, yes, you do predict mostly zeroes, but you can identify sort of those actors in the system that are active, and they get sort of big, latent vectors, and then you can sort of see what is going on there. So, you still predict mostly zeroes for most of the countries, but then you actually do pick up these things that are not sparse, you pick up the activity, and the fit is a lot better. DR. BANKS: Do you have any thoughts on how to robustify this kind of analysis? DR. HOFF: Against what? DR. BANKS: In this business you can wind up with one or two atypical actors that don’t quite follow the model that you specified for everybody else but, because of the interactivity, they influence so many other things that it gets really hard. DR. HOFF: This type of approach actually can pick those things up pretty well, because it is fitting characteristics for each node. If it is true that there is a node that is sort of an outlier, it will have characteristics that are estimated as being way out there and they are very large and are 118

OCR for page 74
very active, without influencing too much the rest of the network. Actually, that sort of outlying stuff fit well with this type of structure. What is harder is when you have it not as a node that is very active. It is that a couple of pairs are outliers and you might get into trouble. Because of the nature of the model and where it fits, it is fitting these parameters for each node. You are very robust against outlying nodes. REFERENCES Handcock, Mark S., P.D. Hoff, and A.E. Raftery. 2002. “Latent Space Approaches to Social Network Analysis.” Journal of the American Statistical Association (97). Hoff, P.D. 2005. “Bilinear Mixed-Effects Models for Dyadic Data.” Journal of the American Statistical Association, vol. 100. Hoff, P.D., K. Nowicki, and T. Snijders. 2001. “Estimation and Prediction for Stochastic Blockstructures.” Journal of the American Statistical Association (96). 119