**Mark Newman, University of Michigan and Santa Fe Institute**

**DR. NEWMAN:** I am going to be talking mostly about social networks because that is mostly what I know about. I hope that some of the ideas I talk about will be applicable to some of the other kinds of networks that some of the others of you work on.

First, I want to mention some of my collaborators, one of whom, Aaron Clauset, is in the audience. The others are Michelle Girvan, Cris Moore, and Juyong Park.

The story I am going to tell you starts with an old idea from network analysis. We believe that many of the networks we study divide up naturally into groups.

Suppose I am talking about a social network. I will assume that you know what a social network is. It is a network where the nodes are, for example, people and they are connected together by some social interaction such as they’re friends, or they have business acquaintanceships, or they have been in close physical proximity recently. Choose the definition that is relevant to your particular problem. So, you have a network of vertices and edges as shown in Figure 1, and we are going to assume that it divides up naturally into communities, groups, clusters, or whatever you would like to call them. First of all, one thing that we might be interested in doing is finding those groups, if they exist in the first place, and if they do exist, determine what kind of groups they are. For example you could be interested in a social network and be looking at groups of people in a network. You could be interested in the World Wide Web, and be looking for groups of related Web pages. You could be looking at a metabolic

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 74

Mixing Patterns and Community Structure in Networks
Mark Newman, University of Michigan and Santa Fe Institute
DR. NEWMAN: I am going to be talking mostly about social networks because that is
mostly what I know about. I hope that some of the ideas I talk about will be applicable to some
of the other kinds of networks that some of the others of you work on.
First, I want to mention some of my collaborators, one of whom, Aaron Clauset, is in the
audience. The others are Michelle Girvan, Cris Moore, and Juyong Park.
FIGURE 1
The story I am going to tell you starts with an old idea from network analysis. We
believe that many of the networks we study divide up naturally into groups.
Suppose I am talking about a social network. I will assume that you know what a social
network is. It is a network where the nodes are, for example, people and they are connected
together by some social interaction such as they’re friends, or they have business
acquaintanceships, or they have been in close physical proximity recently. Choose the definition
that is relevant to your particular problem. So, you have a network of vertices and edges as
shown in Figure 1, and we are going to assume that it divides up naturally into communities,
groups, clusters, or whatever you would like to call them. First of all, one thing that we might be
interested in doing is finding those groups, if they exist in the first place, and if they do exist,
determine what kind of groups they are. For example you could be interested in a social network
and be looking at groups of people in a network. You could be interested in the World Wide
Web, and be looking for groups of related Web pages. You could be looking at a metabolic
74

OCR for page 74

network and be looking for functional clusters of nodes. There are a variety of situations in
which you might be interested in this.
Finding groups in networks is an old problem. It is something that computer scientists, in
particular, have looked at for many decades. But there is a difference between what we want to
do with social networks, for example, and what computer scientists have traditionally done. First
of all, in the computer science problems, traditionally when you are looking for the groups in
networks, there are some additional constraints that we don’t have, such as you know how many
groups you want beforehand. A standard computer science problem might be that you have a
bunch of tasks and you want to divide them up over many processors on a computer, and you
know how many processors you have. You know how many groups you want to divide them
into. Very often you might want the groups to be of roughly equal sizes. That is a typical
constraint in the kinds of algorithms that computer scientists look at because, for example, you
want to load balance between many processors in a computer. In the problems we are looking at
that is not often true.
What is more, there is a fundamental philosophical difference compared to the traditional
approach of partitioning a computer network, in that we are assuming here that the social
networks we are looking at naturally divide into these groups somehow. In the kinds of problems
you want to solve in computer science, often you are just given the network and you want to find
whatever the best division of it is into groups, such that you have a bunch of these tightly knit
groups, and there are lots of connections within the groups, and only a few connections between
the groups, and you do the best you can with whatever network you are given. We are taking a
slightly different point of view here, and assuming from the outset that there is some good
division for some good sociological reason, for example, that your social network naturally
divides up into these groups, and we would like to find the natural divisions of the network. And
those natural divisions might involve dividing it into some small groups and some large groups,
and you don’t know how many groups there will be beforehand, and so forth.
There are old algorithms for doing this from computer science, like the one shown in
Figure 2, which you might be familiar with, the so-called Kernighan-Lin algorithm from the
1970s. In this algorithm, if you want to divide a network into two parts, you can start by dividing
it any way you like, essentially at random, and then you repeatedly swap pairs of vertices between
the two parts in an effort to reduce the number of edges that run between the left-hand group and
the right-hand group in Figure 2. You continue doing that until you can’t improve things any
more, and then you stop. This is typical of the kinds of things that traditional algorithms do. It
works very well but it has some constraints. First of all, you need to know the sizes of the groups
75

OCR for page 74

you want beforehand, because when you swap two vertices, you don’t change the size of the
groups. Each group gets one vertex and loses one vertex.
FIGURE 2
Furthermore, the Kernighan-Lin algorithm only bisects. It starts off with two groups, and
it ends up with two groups. If you want more than two groups, then you would have to do this
repeatedly, repeatedly bisect, divide into two, and then divide the two into two, and so forth.
However, there is no guarantee that the best division into three groups is given by finding first the
best division into two and then dividing one or the other of those two groups.
Something that I like better is shown in Figure 3, spectral partitioning, a rather elegant
method that is based on matrix-algebra-type ideas. This is also something that has been worked
on since the 1970s. In this approach, we take the so-called graph Laplacian, a matrix representing
the network in which the degrees of the vertices are down the diagonal. The degree of a vertex is
how many connections it has. There is a -1 off the diagonal in each position corresponding to an
edge in the network. Minus 1 here means that there is an edge between vertex 2 and vertex 1.
This is a symmetric matrix in an undirected network, and it has an eigenvector of all 1s with an
eigenvalue of 0.
76

OCR for page 74

FIGURE 3
If you have a block diagonal matrix, like the one shown in Figure 4, one that divides
completely into two separate communities with no connections between them, then it has two
eigenvectors with eigenvalue 0, one with all the 1s corresponding to the first block and one with
all the 1s corresponding to the second block. And, of course, any linear combination of those two
is also an eigenvector with eigenvalue 0.
FIGURE 4
If you have a network that is approximately block diagonal, that is precisely the sort of
77

OCR for page 74

thing we are talking about—in other words, it has two communities, two separate groups, but they
are not quite separate, there are a few connections between them, so we have a few off diagonals
here and here. Then, as shown in Figure 5, you still have one eigenvector with eigenvalue 0, and
the second one is slightly perturbed and has a slightly positive eigenvalue. You can prove that it
is always positive. By looking at the properties of that second eigenvector you can deduce what
those two blocks must have been. It is actually very simple. You just look at the signs of the
elements of those eigenvectors and assign vertices to the two groups according to their sign. This
gives a very nice and principled way of bisecting a network into two groups. In this case, you
don’t have to specify what the size of the two groups is. You just find—if your matrix really is
approximately block diagonal with two blocks—what those blocks are. But this algorithm only
bisects, and it is a rather slow algorithm, since it involves finding the leading eigenvectors of the
matrix. Typically for large networks you do this by the Lanczos method because it is a sparse
matrix, but it is still a pretty slow thing to do.
FIGURE 5
Looking particularly at social networks, but potentially for other networks, we have been
thinking about other methods of solving this problem for the kinds of networks we are looking at.
Many people have worked on this in the last decade or so. I am going to tell you about some stuff
that we have done, just because I was involved in it and I like it, but of course it is certainly not
the only thing that has been done.
Figure 6 shows one method we looked at; this is work that I did with Michelle Girvan,
who is at the Santa Fe Institute. Suppose you have a network like the one in Figure 1, and it
divides into groups where there are many edges within the groups and only a few edges between
the groups. The idea is going to be, suppose I live within one of the dotted circles and I want to
78

OCR for page 74

drive to a node within one of the others. I would then go along one of the edges within my circle
to get to the edge that connects the groups, which is like a bridge. In fact, if I lived anywhere
within my dotted circle and I wanted to drive anywhere within another circle, I am going to end
up going over that bridge. If you live in San Francisco and you want to get to the East Bay, you
would go over the Bay Bridge, and that is the way you have to go. That means there is a lot of
traffic on the Bay Bridge because everybody goes that way. These bridges between communities
are things that are going to have a lot of traffic if we consider, in some simulation kind of a way,
traffic moving around from every part of the network to every other part of the network. A
simple concept that captures this idea of betweenness was first formulated by Lin Freeman, who
is also here somewhere.
So, the idea here is that we find the shortest path between every vertex in the network and
every other vertex in the network, and then we count how many of those shortest paths go along
each edge in the network. This is like a primitive measure of how much traffic will be flowing
along each edge of the network. Then, because some of these edges are the few edges that lie
between the communities, those ones are going to get a lot of traffic. We identify the edges
between the communities as those with the highest value of this betweenness measure, and then
we remove them from the network. Once you remove them and those edges there are gone, you
are left with the three clumps, and these are your communities.
FIGURE 6
It is a little bit more complicated than that in practice. The principal problem is if you
have two edges between two communities, like the two at the top in Figure 1, then there is no
guarantee that they will both get a high value of betweenness. It is certainly the case that the sum
of the number of paths that go along one and the number of paths that go along the other should
be high but, under certain circumstances, it is entirely possible that one of them could be high and
79

OCR for page 74

one of them could be low. If you remove the edges with the high betweenness, you wouldn’t
remove all the ones that connect groups, only some of them. In order to get around that, you have
to find the edge with the highest betweenness, remove it, and recalculate this betweenness
measure—this traffic flow measure. If I remove one edge and recalculate, any traffic that
previously had to go along that edge will now go along the other bridge instead. So, even if it
didn’t have a high count before, it does have a high count now. It is crucial that you do this
recalculation step at each step of the algorithm, and that slows things down a bit. That is the
primary disadvantage of this method. In fact it takes a moderately long time to do the calculation.
We did come up with a nice way of speeding up the calculation. Naively you would
expect this to be an O(n4) calculation. But some of the calculations can be reused, and you can
get it down to an O(n3) calculation, but that is still prohibitively slow for very large networks.
You couldn’t do this on the World Wide Web graph or something like that, but it could typically
be done on the sizes of social networks we are looking at. It works up to tens of thousands of
vertices.
This works quite well and I will give you some examples. Some of these are whimsical
examples, but there is a reason for that. Figure 7 shows a famous network from the social science
literature. At the top is a network from a study done in the 1970s of a karate club at a U.S.
university. The vertices represent students in this club who were practicing their karate, and
connections between them represent friendships as determined by direct observation of the
members of the club by a sociologist who was working with them. This is an interesting
example. The sociologist, Wayne Zachary, studied this club for two years and fortuitously it
turned out during the course of these two years that a dispute arose between two factions within
the club, and eventually the club split in two and one half went off and formed their own club.
The theory is that you should be able to spot that coming split in this network, even
though this network here was constructed before the split occurred. In fact, you can do that. The
white circles and the gray squares represent the factions as they were after the split. All we did
was to take this network and feed it through our algorithm. The output of the algorithm is a so-
called dendrogram, which is a tree as shown in (b) in Figure 7. It doesn’t matter too much what it
indicates. The main thing to notice is that it splits the network into two groups, and the coding of
the circles and the squares is the same as in the network in (a). You can see that it has very neatly
found the group of the squares and the group of circles. There is only one error over there, vertex
number three, which is the one right in the middle there, on the boundary between the two
factions. Maybe it is understandable why it got that one wrong.
80

OCR for page 74

FIGURE 7
This is a network that was derived from observations made before the split, and yet
clearly the network has in it information about what is going to happen in the future when the
thing does split up. In a way, it is predicting what is going to happen, although it is predicting
after we knew what the answer already was. Obviously, we wouldn’t have put this example up
here if it didn’t work.
Figure 8 gives another example. I am at the University of Michigan, and so I need to
have an example about football. This is a picture—one of these dendrograms, one of these
trees—that shows what happens when you feed the schedule of games for division 1-A college
football into our algorithm. As you might know, college football is played in conferences, which
are groups of schools that play together on a regular basis. If you feed the schedule in, it picks
out the conferences very nicely. They are color coded by the different shapes and colors in this
figure. The groups that the algorithm finds, as you can see, pretty much correspond to the
conferences. In a way, this is a silly example but we look at examples like this for a good reason,
which is they are cases where we think we know what the answers should be. You want to have
applications to real world networks to show that this really is working, but you also want, when
you are developing algorithms, to apply it to something where you can tell, after you did it,
whether it is actually giving a sensible answer. This algorithm is always going to give some
81

OCR for page 74

answer. If you plug in some network and it spits out an answer and you can’t tell whether it is
sensible, then it didn’t really get you anywhere. Many of the examples we look at when we are
developing the algorithm are ones where we think we know what it ought to be doing before we
start.
FIGURE 8
Figure 9 gives one more example. In this case this is a food web, a non-social network
example, a food web of marine organisms in the Chesapeake Bay. The interesting thing about
what happens when we feed this one into the algorithm is that it appears to split the food web into
the pelagic organisms, the ones that live in the upper layers of the water, and the benthic ones, the
ones that live in the mud down at the bottom. It appears to be indicating that this ecosystem is
divided into two weakly interacting systems with not very much going on between them, so it
certainly does have applications other than the social networks.
82

OCR for page 74

FIGURE 9
One of the differences between what we do and what computer scientists do is to know
whether the solution that we got to a problem is actually a good one. We don’t want to just get
the best solution, whatever it is, we also want to know if the solution we got is no good; in other
words, if there wasn’t really any community structure there in the network in the first place. To
do that, we use a measure that we call modularity. The idea with modularity is basically to
measure what fraction of the edges in our network fall within the groups that we find. What we
would like is that most of the edges are within groups, and only a few of them between groups.
However, if you only use that, it is not a good measure of what we want, because then you could
put all the vertices in a single group together, and all the edges would be within that group, and
you would get this 100 percent modularity, but that is obviously a trivial result, it is not a good
division of the network. What we use instead is the fraction of edges that fall within the groups
minus the expected fraction of edges that would fall within those groups.
I can write down exactly how we calculate that if you like, but the basic idea you can see
by examining that trivial partition. If I put all the vertices in one group together, then all the
edges would be within that one group, but it is also expected that all the edges will be in that one
group. Thus, you get 100 percent minus 100 percent, so the measure defined in the previous
83

OCR for page 74

paragraph is 0 rather than a high score. So we think of modularity as something which is high if
the number of edges within groups is much greater than we would expect on the basis of chance.
What we can do is take one of these dendograms, which represents the order in which the
network gets split into smaller and smaller pieces as we remove the edges, and plot this
modularity against it, as shown in Figure 10. What we typically find is that the modularity has a
peak somewhere and then it falls off, and this peak indicates where the best split would be in this
modularity sense, the best split in terms of getting most of the edges within groups. That is
shown by the red line in Figure 10. The value there at the peak indicates how good a split it is.
This is a measure that goes strictly from 0 to 1. It is about 50 percent modularity in Figure 10,
which is pretty good. We usually find that there is something significant going on if it is
anywhere above about 30 percent, and the highest values we have seen in real networks are about
80 percent.
FIGURE 10
Figure 11 shows what happens if you apply this method to the karate club network. It
actually finds two different things. It has a peak for the split into two groups that I talked about
before, the one we know about. It has another peak for a split into five groups, and it looks like
that second peak is actually a little bit higher. Maybe there is some other smaller-scale structure
going on there in the networks that we didn’t know about before.
84

OCR for page 74

FIGURE 11
I simulated some data using some rank k. Does anybody want to hazard a guess as to
what the rank of my mean matrix was? It probably wasn’t zero. I have a vote for four. That
seems to be popular. Why might it be four? If you have played around with these things you
know that you are looking for gaps in the magnitude from one singular value to the next. You
know it is definitely more than zero. In Figure 11 you can see a big gap between 1 and 2, so it is
at least 1, but then there is this big gap between 3 and 4, and then there is another little gap
between 4 and 5, and then it seems to be monotonically going down. So 4 is a really good guess;
the actual answer is 5, as shown in Figure 12. This goes to show you how well our eye is at doing
these things.
109

OCR for page 74

FIGURE 12
Figure 13 shows the actual singular values that I used to generate the data matrix. Four is
a fair guess because the fifth non-zero singular value is smaller than the others. I used the
Bayesian machinery. You can get a posterior distribution for the mean matrix, and then you can
look at the posterior mean of all the singular values. If you add up all the blue, it is measuring the
total amount of variation that the model picked up in the data. Here is a posterior distribution on
the rank of a matrix. This should say that Bayesian methods are good, because you had your
intuition about what the rank was, and the Bayesian estimate matched up to that to a great degree.
The posterior distribution says there is a very good chance that there are four or more dimensions
to this mean matrix. The two highest probability ranks are four and five, as you might expect.
110

OCR for page 74

FIGURE 13
One other thing about doing Bayesian inference, if you do the singular value
decomposition—here is my little plug for Bayesian analysis—suppose I told you the rank was 5,
what is your least-squares estimate of the mean matrix, given that you know the rank is 5? Well,
the least-squares estimate is actually the first five singular values and singular vectors, so you
would just be taking the top of the green bars for the first singular values there. If you take the
true matrix that was used to generate the data, tab all those entries, and plot them against the
least-squares estimate, you get the green pattern shown on the right side of Figure 14.
111

OCR for page 74

FIGURE 14
If you take the Bayesian estimate you get something with a much lower mean square.
What is happening here is the singular value decomposition is taking our data matrix Y and it is
looking at the projections onto these subspaces. Data matrix Y is equal to the mean matrix plus
error so the singular value decomposition is projecting all the error as well onto the subspace. It
is probably best to point out that looking at the range of the truth is –3 to 3, and the range of the
singular value decomposition is essentially –4 to 4. That is not atypical of these types of
problems that sort of a least-squares estimate overestimates the variability in the matrix.
The reason why I went through all that work was because I wanted to embed the concept
of the singular value decomposition model into more complicated models for social networks. I
wanted to incorporate this into very general models for all sorts of relational data, not just
Gaussian data, not just binary data, but data of all sorts. The way I do that is with a generalized
linear model. Figure 15 shows the general model for relational data. We have a sociomatrix Y.
All of this generalizes to the case where Y is an n x m matrix; it doesn’t have to be a square
matrix.
112

OCR for page 74

FIGURE 15
We have a set of explanatory variables X, and our model, relating X to Y, is as follows:
we have what we call a predictor, θ, and it is equal to a regression part plus a latent structure part.
This latent structure part is just this model that we talked about, the singular value decomposition
model. The generalized linear model is not that our observed data is equal to X β + Z, but some
function of the mean is equal to X β + Z. Some function of the mean of yi,j is equal to the
regression part, a latent structure part, and lack of fit. In this context, you can fit binary data
where you model the log odds of the observations as being equal to the structure. If you have
count data you might model things on the log scale and, if it is continuous, you might model it on
the additive scale, using the identity link.
I should point out that a lot of data that people have been talking about is skewed data. I
wouldn’t call it high-variability data, but I would call it skewed data. You have not just the first
and second moments there, you have the third and fourth moments, and those are really the things
that can actually represent that skewed structure. That can also be incorporated in here, although
with more difficulty. You have to decide on what scale you are making the measurements.
Actually, you could do continuous data and use a log link to model skewed data.
An example of what I am going to show in the next two slides is some international
conflict data. The model for how this really is generated is debatable, but it is a very nice data set
because we can see the structure. Mark pointed this out in the last talk. It is good to work on
113

OCR for page 74

problems where you can tell if the answer makes sense or not. I have some colleagues in political
science who are interested in modeling conflicts between nations. The data is from Mike Ward
and Xun Cao, and what they measure is the number of militarized disputes initiated by i with
target j over a 10-year period. This involves actual militarized disputes and conflicts as well as
threats and a bunch of other things that they have recorded. They want to relate these to a bunch
of covariates, so they have a bunch of theories about what causes nations to have conflicts with
others and what things might ameliorate conflicts. There is this concept of democratic peace that
democratic nations don’t have conflicts with each other. There is this idea that trade inhibits
conflicts and so on. They like to evaluate these things and look at the structure that is left
unexplained by these covariates. Figure 16 provides an overview.
FIGURE 16
We fit this into our model. We say that we are going to model the mean of each
observation on the log scale with this regression part plus this latent structure. I was very happy
to see the posterior distribution on the number of dimensions of the latent structure as shown in
Figure 17. Why was I happy? It’s because it is easy to plot 2-dimensional latent structures. You
can plot 3-dimensional latent structure, but this k could have been anywhere between 0 to perhaps
130. If the dimension of the latent structure was 73, I wouldn’t be able to give you a very good
114

OCR for page 74

picture at the end of what is going on, or you could look at the first few dimensions of the latent
structure. I should also point out that if you play with these data enough, you know that there is
no way that the mean structure has dimension less than two. There are lots of countries that are
very active in conflict, and a lot of countries that are attacked a lot; you know that you are going
to have row and column effects, and that is a 2-dimensional structure. Therefore, you know the
dimension has got to be at least two.
FIGURE 17
The right half of Figure 17 shows some confidence intervals on the regression
parameters. There are some obvious things to point out. As distance increases, the amount of
conflict decreases. People have conflicts with countries that are near them, and population plays a
big role. Populations of countries that are in conflicts tend to be higher than average.
115

OCR for page 74

FIGURE 18
Figure 18 shows the latent structure. I will finish by explaining this plot. Recall the
model. The model for each country has a 2-dimensional vector of characteristics as an attacker
and a 2-dimensional vector of characteristics as a target, so that each one of these countries has a
vector as an aggressor. What I have done here is plot the direction of each country’s vector in red
and that is this outer circle here. The magnitude of their plotting character is the magnitude of
their vector in that direction. The inner circle is the direction of each country’s 2-dimensional
latent vector as a target. Then I have drawn these green lines indicating the strength of the
number of conflicts between the countries. So, you see somewhat obvious things because we
know the names of the rows and the names of the columns. Those are the main structures there.
Here is where I see if I made myself clear. If there wasn’t any latent structure, can
anybody guess what this plot would look like, or what would be different about this plot? Where
would the green line be? If there were no two dimensional latent structure, there would be no
relationship between the actual links, or attacks between the countries, and their position in this
circle. All you would see is a bunch of green lines scattered all over the place like that so you can
see how these positions really do pick up the structure that is there.
To summarize, my point is that singular value decomposition is a well-known tool in
multivariate analysis that has been used for a long time, and there is a huge literature based on it,
sort of in the additive case in the psychometric literature in lots of fields. It is something that is
116

OCR for page 74

well understood, although apparently the problem of identifying the dimension of the mean
matrix has been left unsolved, at least in a Bayesian context, and actually in a non-Bayesian
context it is also unsolved.
Using this framework, the idea is that we can use this natural singular value
decomposition model that we are familiar with, and incorporate it into different types of data
structures, along with regression effects and other things that we use when we are doing statistical
modeling. That is the main point of the talk.
QUESTIONS AND ANSWERS
QUESTION: One quick question. Could you just say a couple of words on what you are
thinking about here with the time series extensions, the dynamic extensions, because that is
something that is sort of natural to the topics we are going to be covering.
DR. HOFF: I have always known that I should spend more time in the dynamic aspect
of networks, but every time I gave a talk on latent variable models, everybody always asked, what
is the dimension? So, I had to spend time doing the dimension first.
Now that that is done, there are a number of ways that you can do dynamic network
inference for these things, or incorporate these ideas. One way is to have a time-series model for
the latent characteristics, as a Markov model or something like that, and actually David Banks
and Eric Vance are working on stuff like that. Again, I have some other ideas about how to set
these things up in a general autoregressive structure. Their ideas are kind of interesting, they have
a behavioral slant on things. They are using ideas from social networks like, I am at this location
in the social space, and I form a tie with this person, and I should move at the next time step, in
this other direction. There are sort of dynamical models, people starting to work on these things
with these ideas here.
DR. BLEI: Your latent variables seem to model the connections between rows and
columns that happen outside of the covariates. I can imagine, say, with countries, that two
countries that don’t get along, that the covariates explain that, that their latent positions would be
far apart. Is there some way to deal with that?
DR. HOFF: In one sense, you could use this sort of approach with a slight modification
to actually let the latent variables model the variation that is unexplained by those covariates. In a
sense, that might be the goal.
117

OCR for page 74

DR. BLEI: You could actually model those jointly with the connection.
DR. HOFF: Yes, depending on what your goal is, you could to it in a variety of
different ways. For example, you might want to leave that variable out of it. I mean, if you know
what the variable is, you might leave the variable out of the model, fit the latent characteristics,
and then plot, say, the latent characteristics versus that variable. It depends on what you are trying
to get out. One issue with including things in a model is, you are saying what the scale is, or
saying it is additive. In a lot of cases it is not. It is like multiplicative. There can be an advantage
to leaving it out and then plotting it versus the latent characteristics later. You could do what I
just said, try to—the way I have the model set up, you can constrain the latent characteristics to
be orthogonal to things, and you could actually restrict them to be orthogonal to sort of the design
structure of the regression and say, yes, it is the left-over variation. It depends on what issue you
are trying to address at the moment.
DR. BLEI: One last question. A lot of these networks like the Internet are very sparsely
connected, and I imagine that if you tried to fit a model that was just overrun with zeroes, say a
logistic regression, you would end up just fitting the zeroes and you might not get anything
interesting out of it. Have you had that issue?
DR. HOFF: Yes and no. Actually, the networks that I have been modeling, these
conflict networks that my political science colleagues fit, are extremely sparse. So, actually we
do that. One thing he likes to show to political science crowds is that he fits this logistic
regression without any sort of latent variables, and then you make a prediction on where the
conflicts are. Well, your best bet is to just predict no conflict and that is it, and you just predict
all zeroes. So, the r2 is very bad for these types of models. The benefit from doing something like
this is, yes, you do predict mostly zeroes, but you can identify sort of those actors in the system
that are active, and they get sort of big, latent vectors, and then you can sort of see what is going
on there. So, you still predict mostly zeroes for most of the countries, but then you actually do
pick up these things that are not sparse, you pick up the activity, and the fit is a lot better.
DR. BANKS: Do you have any thoughts on how to robustify this kind of analysis?
DR. HOFF: Against what?
DR. BANKS: In this business you can wind up with one or two atypical actors that don’t
quite follow the model that you specified for everybody else but, because of the interactivity, they
influence so many other things that it gets really hard.
DR. HOFF: This type of approach actually can pick those things up pretty well, because
it is fitting characteristics for each node. If it is true that there is a node that is sort of an outlier, it
will have characteristics that are estimated as being way out there and they are very large and are
118

OCR for page 74

very active, without influencing too much the rest of the network. Actually, that sort of outlying
stuff fit well with this type of structure. What is harder is when you have it not as a node that is
very active. It is that a couple of pairs are outliers and you might get into trouble. Because of the
nature of the model and where it fits, it is fitting these parameters for each node. You are very
robust against outlying nodes.
REFERENCES
Handcock, Mark S., P.D. Hoff, and A.E. Raftery. 2002. “Latent Space Approaches to Social
Network Analysis.” Journal of the American Statistical Association (97).
Hoff, P.D. 2005. “Bilinear Mixed-Effects Models for Dyadic Data.” Journal of the American
Statistical Association, vol. 100.
Hoff, P.D., K. Nowicki, and T. Snijders. 2001. “Estimation and Prediction for Stochastic
Blockstructures.” Journal of the American Statistical Association (96).
119