**Mark S. Handcock, University of Washington**

**DR. HANDCOCK:** There has been a lot of discussion related to the ideas I’ll be talking about today. This is not a new topic at all but I think its worthwhile looking at how these particular issues relate to the particular problem of models and social networks. My main idea is to see how very old issues and very old concerns apply or play themselves out in the context of social network models.

This is joint work and I’m going to go through some things here, but if you really want to find it, please look at these working papers on the Web at the Center for Statistics and the Social Sciences (CSSS) at the University of Washington (www.css.washington.edu). Much of this work was with Martina Morris and with the University of Washington’s Network Working Group, which includes Carter Butts, Jim Moody, and other folks that Martina mentioned yesterday. I particularly want to point out Dave Hunter of Pennsylvania State University, who is in the audience. Much of this work is done jointly with him.

I’ll review some topics that have been gone through before. I won’t belabor them now because they have basically already been covered. As I look around this room, the study of networks draws in a very diverse set of folks, and we have many varied objectives and a multitude of frameworks and languages used to express things. I don’t think I’ll spend much time on this. After one and a half days I think this is fairly clear.

One thing that is important to recognize, at least for the topics that I’ll be looking at, is there are many deep and relevant theories here, and a deep statistical literature grouped into several communities. For instance, the social networks community has done a massive amount of work as you saw last night in the after-dinner talk. Two key references would be Frank (1972) and Wasserman and Faust (1994). The statistical networks community has also worked on this topic for a long period of time, and some key references are Snijders (1997), Frank and Strauss (1986), and Hoff, Raftery, and Handcock (2002). In particular, it’s worthwhile to look at the work of David Strauss, which I think threads through much of what’s going on and is very important from a technical contribution. Other contributions which I think haven’t been discussed much at this workshop are from the spatial statistics community. There is a lot of very important work there and it’s very closely allied. If you are looking to work in this area I think it’s very important to read this stuff very closely. Good pointers into that literature include Besag (1974) and Cressie (1993). Another important literature, which I’ll mention later in this talk, is

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 343

Stability and Degeneracy of Network Models
Mark S. Handcock, University of Washington
DR. HANDCOCK: There has been a lot of discussion related to the ideas I’ll be talking
about today. This is not a new topic at all but I think its worthwhile looking at how these
particular issues relate to the particular problem of models and social networks. My main idea is
to see how very old issues and very old concerns apply or play themselves out in the context of
social network models.
This is joint work and I’m going to go through some things here, but if you really want to
find it, please look at these working papers on the Web at the Center for Statistics and the Social
Sciences (CSSS) at the University of Washington (www.css.washington.edu). Much of this work
was with Martina Morris and with the University of Washington’s Network Working Group,
which includes Carter Butts, Jim Moody, and other folks that Martina mentioned yesterday. I
particularly want to point out Dave Hunter of Pennsylvania State University, who is in the
audience. Much of this work is done jointly with him.
I’ll review some topics that have been gone through before. I won’t belabor them now
because they have basically already been covered. As I look around this room, the study of
networks draws in a very diverse set of folks, and we have many varied objectives and a
multitude of frameworks and languages used to express things. I don’t think I’ll spend much time
on this. After one and a half days I think this is fairly clear.
One thing that is important to recognize, at least for the topics that I’ll be looking at, is
there are many deep and relevant theories here, and a deep statistical literature grouped into
several communities. For instance, the social networks community has done a massive amount of
work as you saw last night in the after-dinner talk. Two key references would be Frank (1972)
and Wasserman and Faust (1994). The statistical networks community has also worked on this
topic for a long period of time, and some key references are Snijders (1997), Frank and Strauss
(1986), and Hoff, Raftery, and Handcock (2002). In particular, it’s worthwhile to look at the
work of David Strauss, which I think threads through much of what’s going on and is very
important from a technical contribution. Other contributions which I think haven’t been
discussed much at this workshop are from the spatial statistics community. There is a lot of very
important work there and it’s very closely allied. If you are looking to work in this area I think
it’s very important to read this stuff very closely. Good pointers into that literature include Besag
(1974) and Cressie (1993). Another important literature, which I’ll mention later in this talk, is
343

OCR for page 343

exponential family theory. It’s been worked out a long time ago and the best place to start is a
seminal book by Barndorff-Nielsen (1978). The last relevant literature is that coming from the
graphical modeling community. They work on related ideas that are actually closer than it would
seem by just judging the commonality in terms used. A good pointer is Lauritzen and
Spiegelhalter (1988).
Networks are very complex things, and how we choose to model them will reflects what
we have tried to represent well and what we are essentially not trying to represent well. That
choice is going to be driven much by the objectives we have in mind. This is an obvious point, a
point that drives through much of scientific modeling, but I have found that it’s very important
for the networks area because of the complexity of the underlying models and the complexity of
the phenomena we are interested in.
What we are probably interested in first would be the nature of the relationships
themselves, questions such as how the behavior of individuals depends on their location in a
social network, or how the qualities of the individuals influence the social structure. Then we
might be interested in how network structure influences processes that develop over a network.
Dynamic or otherwise, the classic examples would include the spread of infection, the diffusion
of innovations, or the spread of computer viruses, all of which are affected by the network
structure.
Lastly—and I think this is of primary importance and why many people actually study
networks, even though they will look at it only after understanding the relationships and the
network structure—is that we are interested in the effect of interventions. If we change the
network structure and/or the processes that develop over that network, how will the network
structure play itself out, and how will the process itself be actually changed? If you make
changes to the network almost certainly the consequences of those changes will not be obvious.
Another important point is that our objectives define our perspectives. There is a
difference between a so-called network-specific viewpoint and a population process. By
network-specific I mean we look at a given network. It might be sampled or have missing data in
it, but our scientific objective is that particular network. That is in contrast with a population
viewpoint, where we view the data we have, whether it’s complete or sampled, as a realization
from some underlying social phenomena typically represented through a stochastic process, and
we wish to understand the properties of that stochastic process. In that case, the observed
network is conceptualized as a realization of a stochastic process.
I think a lot of the different approaches in the literature look very different because they
have these different perspectives and objectives in that mind. For example, repeating a point
344

OCR for page 343

made last night by Steve Borgatti, one can compare a social-relations perspective with a nodal
attribute compositional perspective. The former brings out interdependence/endogeneity and
structure and positional characteristics, while the latter focuses on atomistic essentialism and
reductionism. I won’t say more about these things because I think they have been adequately
covered.
Figure 1 defines a very basic model for a social network.
FIGURE 1
Hence, we can think of Y, the g-by-g matrix of relationships among the actors, as the
sociomatrix. In a very simple sense, we want to write down a stochastic model for the joint
distribution of Y, which is this large, multivariate distribution, often discrete, but it could be
continuous also. All we want is a relatively simple “model” for the dependent structure of this
multivariate random variable; that’s how a statistician would view this problem, and that view
dominates the development.
345

OCR for page 343

FIGURE 2
Martina Morris showed Figure 2 yesterday but I’ll show it again just to bring it forward.
Let Y be the sample space of all possible graphs, for example, just presence or absence of ties,
and we write down an exponential family model of the form shown. We also have the space of
graphs that are possible. The denominator is a constant which is defined as the sum of the
numerators.
One thing that I think is worthwhile to look at is a way of getting a general interpretation
of the model through local specification. This is shown in Figure 3. We define Yc as the graph
excluding the ij component of the network. These are all the network elements, excluding the ij
one. This is a standard formula, but I think it’s quite instructive. What it says is the probability
of a tie between i and j given the rest of the network, divided by the probability of a non-tie
between i and j with the rest of the network held fixed. The basic idea is that we consider the
graph, and we hold the graph fixed and think about taking a particular tie: what happens when
we change it from a 0 to 1? Looking at the odds of the tie conditional on the rest gives us a direct
interpretation of these forms. It can also be interpreted in terms of a relative risk form. The basic
idea gives us an interpretation of either, at least one or many others in terms of the local
specifications of the actual form. How a particular dyad probably would tie in between a
particular pair of actors is influenced by the surrounding ties. By specifying the local properties,
as long as they are done in this way, you get the global joint distribution, going from the local
specification to give you a global one. Again, this is relatively straightforward from an algebraic
346

OCR for page 343

standpoint. It’s just helping us with the interpretation.
FIGURE 3
I’ll make a brief comment on models for the actor degree distribution, which is shown in
Figure 4. We can write a particular model of this form where the statistics are just the proportion
of actors with exactly k relationships, i.e., the proportion of the nodes with k relationship or
degree k. Basically, we write just a linear predictor on these forms, this is the so-called degree-
only model. In essence what it is saying is if we can condition on the degrees, and all of the
graphs with those degree structures are equally likely, in that sense it’s random mixing given a
given degree sequence. This has a very long history in social networks community, and there are
the two references shown in Figure 4 plus many others that are good places to start. This is
receiving an enormous amount of work. The reason I’m pointing this out is because there has
been a lot of work on these models, and they are being considered both from a technical and from
a theoretical perspective. I often think that science is not helped a lot when that prior work is
forgotten. The other thing you can do here is further parameterize the degree distribution, which
essentially places nonlinear parametric constraints on the α parameters here. As most statisticians
know, this moves it technically from just a straight linear exponential family to a curve
exponential family. Dave Hunter and I did some work on models of that kind.
347

OCR for page 343

FIGURE 4
We could then add in—Martina did this yesterday, and I’ll show this again briefly—
co-variates that occur in this linear fashion, attributes of the nodes, attributes of the dyads. We
can add some additional clustering component to a degree form in this way. This is shown in
Figure 5.
FIGURE 5
348

OCR for page 343

I’ll give you just a couple of illustrations of how this model might apply in particular
instances. Say you wanted to represent a village-level structure with 50 actors, a clustering
coefficient of 15 percent, and the degree distribution was Yule with a scaling exponent of 3. The
Yule is the classic preferential attachment power law model, and we can just see how that looks.
Figure 6 shows two different networks generated from that model.
FIGURE 6
The network on the left side of Figure 6 is one with zero clustering, and on the right we
see what happens when the mean clustering coefficient is pushed up to 15 percent with the same
degree distribution. The basic notion is that these models give you a way of incorporating known
clustering coefficients in the model, while holding a degree distribution fixed. Just to reiterate
points made earlier that the degree distribution is clearly not all of what is going on.
349

OCR for page 343

FIGURE 7
Figure 7 shows the same thing with 1,000 nodes, so it’s a pretty big village. You can see
to get the clustering going on with a given degree of distribution it’s forcing a lot of geodesics in
this form. Of course, with 1,000 nodes it’s pretty hard to see.
Figure 8 is a bipartite graph; the same model form works for bipartite graphs. The
network in the upper left is a heterosexual Yule with no correlation and the one in the upper right
is a heterosexual Yule with a strong correlation triangle percent of 60 percent versus 3, which is
the default one for random mixing given degree. There is a modest one here as well as one with
negative correlation. I don’t think I’m going to say too much more about these.
350

OCR for page 343

FIGURE 8
The main topic I would like to talk about today is to essentially address a canonical
problem: how can we tell if a model class is useful? That is, if you write down a particular form
of the kind in Figure 2, we can be introduce statistics based on principles of underlying local
specifications or local configurations, or we can just write down statistics that we believe would a
priori, from a scientific perspective, explain a lot of the variation in the actual graph. But the
natural question is, because these statistics will tend to be highly correlated with each other,
highly dependent, it’s not really clear for any given model exactly what the node qualities of that
model would actually be. As Martina Morris showed yesterday, the natural idea of starting from
something very simple can sometimes lead to models that aren’t very good.
It is true the properties we saw were the properties of the model we wrote down. It’s
essentially saying that any implication that simple models have simple properties is not true. This
has been known in statistical mechanics for a very long period of time. It’s always a little bit of a
surprise to see them occur in applied models or very empirically-based models. So, the basic
question here is, is a model class itself able to represent a range of realistic networks? It’s not the
only question, but one question that could be asked here, and this is what I refer to as the issue of
model degeneracy (Handcock, 2003). The idea is that some model classes will only represent a
351

OCR for page 343

small range of graphs as the parameters are varied. In circumstances where we like a fairly
general model to cover a large range of graph and graph types that might not be a desirable
property of a model.
The second issue is what are the properties of different methods of estimation, such as
Maximum likelihood estimation, pseudo-likelihood, or some Bayesian framework? I’ll make
some comments on the computational issues where certain estimators do or don’t exist in that
many forms (e.g., see Snijders, 2002, and Handcock, 2002).
The last issue in assessing if a model class is useful is whether we can assess the
goodness of fit of models. For example, we have a graph and we have a model, how well does
the model actually fit the actual graph? I’ll say some notes here about measuring this. I don’t
think I’ll say a lot about this, but I think the application Martina went through yesterday was very
interesting in terms of a stylized view of how that would be done. Some background on this topic
may be found in Besag (2000) and Hunter, Goodreau, and Handcock (2005).
Figure 9 illustrates some points on model degeneracy. It is a property of a random graph
model; it’s nothing to do with data per se, but the model itself. We call it near degenerate if the
model places all its probability mass, i.e., the likelihood of certain graphs on a small number of
actual graphs. An example of this would be the empty graph, as shown in Figure 10. If we know
that a model produces the empty graph with probably close to 1, that’s probably not good, or the
full graph or some mixture of them. In some sense, subsets of the set of possible graphs which
are regarded for the particular scientific application is interesting. If we are interested in
heterosexual graphs, and the model chosen places all of its mass on heterosexual graphs which is
a large subset, that’s a good thing. This idea is clearly application-specific.
352

OCR for page 343

FIGURE 9
FIGURE 10
353

OCR for page 343

FIGURE 24
I’ll make a brief comment on MCMC. For those who haven’t seen a lot of Monte Carlo
this is the idea. I find it’s a simple way of thinking about likelihood estimation. The idea here,
shown in Figure 24, is if you want to estimate the partition function or normalizing constant
which is essentially the population mean of this over this set of all possible graphs. Being
statisticians, if we have a very large population to measure the mean of what we would do is draw
a sample from that population. We would then calculate the sample mean and use that to replace
the population mean, which is one of the simplest ideas in statistics. In essence what we do is
draw a sample of the possible graphs, compute the corresponding sample mean, and use this to
approximate the partition function. The question is how do we get a sample of graphs? MCMC
graphs are a natural way. This has been developed and I won’t say too much more about it.
There is a result which corresponds to any MCMC likelihood used in that way actually converges
with sufficient iterations which I won’t belabor here.
I always find interesting the relationship between a near degenerate model and MCMC
estimation. Figure 25 addresses this idea, which goes all the way back to very nice work by
Charlie Geyer. Basically, the idea is—there is some mathematics that goes behind it—if your
model is near degenerate then your MCMC won’t mix very well. This might be a surprise to
most folks who have ever tried this, but just to give you some sense in practice of how this works
I’ll show you again this two-star model in Figure 26.
364

OCR for page 343

FIGURE 25
FIGURE 26
For example, suppose for the two-star models you want to choose a mean value with 9
edges and about 40 two stars, and you run your MCMC sampler. Figure 27 shows what you get.
Many people have probably run into these if they have ever tried it.
365

OCR for page 343

FIGURE 27
This is the trace part of the edges. I’m running the MCMC sampler for about 100,000
iterations of this simple 7 node process, and these are the trace-plotted number of edges as we
draw from that mark off chain. It stays up here around 19, and dropping down to 15, 17, and
suddenly jumps down to 3 or 2 or 0. It stays down there for maybe 20,000 and jumps back up
and jumps back down. So, what we are actually seeing, if you look at the marginal distribution of
these draws, is a profoundly polarized distribution with most of the draws from very low values
and some of the draws from quite high. And of course, in such a way that the mean value is 9,
which is exactly what we designed the model to actually do. This is another view of the example
that Martina gave yesterday of the two star model. Note that this sampler is doing great. It is
giving us samples back from the model we asked it for, but now that we looked at this we would
probably say we don’t want this model therefore bad mixing of an MCMC is highly related to
these degeneracy properties.
366

OCR for page 343

FIGURE 28
To come back very briefly, the estimation of the mean value parameterization is
essentially just as hard as it is in natural parameterization, although the corresponding point
estimate itself is trivial. It’s just the observed statistics. Of course, to find the properties of that
point estimator, and it’s no good unless you know what its properties are, essentially you need the
same MCMC methods as you need for the natural parameterization. It is much easier to do the
natural parameterization from a computational perspective in terms of writing code because of
slightly simpler forms. You don’t need to solve some part of the inverse problem.
To finish, I’ll say a little bit about network sampling because it has come up a bunch of
times during this workshop. I’ll give you some idea of the classical statistical treatment of it
going back to Frank in 1972 and other work since. We think of our design mechanism as that
part of the observation process that is under the control of the researcher. So, if we are thinking
about doing network sampling, the design mechanism is how the researchers design the sampling
process. Examples would be surveys using egocentric, snowball, or other link-tracing sampling.
There is the out-of-design mechanism also, which is the unintentional non-observation of network
information, meaning the mechanisms by which information is missed, such as the failure to
report links, incomplete measurement of links, and attrition from longitudinal surveys. Note that
for any sampling mechanism we need to deal with both these components of it.
It is sometimes convenient to cluster design mechanisms into conventional, adaptive, and
convenience designs. So-called conventional designs do not use the information collected during
a survey to direct the subsequent sampling of individuals.
For example, we’ll sample everyone and do a network census, or we might do egocentric
designs and randomly choose a number of individuals then look at the actual ties of only those
individuals. That is, we don’t use anything that we collect during a survey to look at subsequent
367

OCR for page 343

sampling. This might sound like an odd way to do it, but let me describe adaptive designs for
contrast. In adaptive designs, we actually use information to direct the subsequent sampling;
most network sampling is actually of this adaptive form. The idea is to collect ties. We then
follow the links of the initial group and use the fact that we have observed the links of those
individuals to essentially do contact tracing and move out in those forms. An example is classic
snowball sampling link tracing around work designs. Many of the designs using computer
science and physics fall under this form where you are using information taken during your
particular survey to direct the subsequent sampling. There are also convenience designs where
you might use something very intelligent but not very statistical. So, we are using convenient
information; you just nab people who are close by and that are convenient to sample.
In Handcock (2003) I examined likelihood-based inference for adaptive designs—
basically how to fit the models I described earlier, and Martina Morris described, when you have
a conventional design and, probably more importantly, when you have adaptive data. You
actually have massively sampled data due to link tracing. It turns out that you can fit these
models in this way, and there is a computationally feasible MCMC algorithm to actually do it,
which I think is actually pretty helpful in practice. This is being implemented in a statnet package
for R.
As an example of the use of likelihood inference, I’ll briefly mention the Colorado
Springs “Project 90,” which is familiar to many people in this room and which involved quite a
broad sample. This study dealt with a population of prostitutes, pimps, and drug dealers in
Colorado Springs; I will only focus on 1991. I’m looking at a heterosexual sexual network within
the group of people who responded to a survey. Essentially what they did was a form of link
tracing, which could be referred to a bi-link tracing, where you are recruited into the survey if two
other respondents nominated you. It wasn’t enough just to be nominated by one individual, you
had to be nominated by two.
368

OCR for page 343

FIGURE 29
Figure 29 presents an example of that network. You see it has a relatively large
component in the center. Many of the ties tend to be just monogamy here, just split up. You then
have some small components around the side, and this core is around 300 folks. I think there are
some isolates also. You get this sense that there is a large part that is connected but not
massively, so individuals float around. Note this is a visualization, so there are some artifacts.
For instance, it looks more central than it actually is. There is a lot of missing data here due to
the sampling processes.
You are only seeing the actual part of it that is observed, and I won’t go through the
boring detail, but if you run the program here, Figure 30 shows the sort of result you will get.
You put in all these covariates. We know a lot of the information about the individuals, their age,
their race, whether they are a prostitute or a pimp. What we did was looked at different types of
tie formation based on age, race, and occupation. We also looked at whether there was
homophily based on race, a homophily based on age, and then these endogenous parameters that
measures the dependency involved. I won’t say too much about this particular model, but you
can get the natural parameter estimates. I also measure the standard errors induced by the
Markov of chain Monte Carlo sampling. In terms of goodness of fit, because this is an
exponential model, we can compute the deviance here so we have the classic deviance statistics
369

OCR for page 343

for this process. Note that we do not have classical asymptotic approximations to their
distributions!
FIGURE 30
As we can see, attribute mixing does explain, although if you look at the p-values here,
there is not a massive amount of difference based on attribute mixing. When we look at these
dependence terms what we actually see is that there is a fairly large over supply of people with a
degree of exactly 1. This parameter is quite large and positive, indicating there are a lot of people
in monogamous relationships. There is also a fairly substantial homophily based on race
involved in this process as well.
In my two remaining seconds I will say we can use those model parameters to generate
processes that look a lot like Colorado Springs, and Figure 31 shows two of them. They don’t
have the larger component of the graph we saw in Figure 29, because this model doesn’t naturally
have that form, but if you go through other realizations you can get very similar looking forms.
370

OCR for page 343

FIGURE 31
In conclusion, I’ll reiterate that large and deep literatures exist that are often ignored;
simple models are being used to capture clustering and other structural properties, and the
inclusion of attributes (e.g., actor attributes, dyad attributes) is very important.
QUESTIONS AND ANSWERS
DR. KOLACZYK: Eric Kolaczyk, Boston University.
Mark, here is a quick question for you. My understanding is the nature of this
degeneracy result is essentially relying on Barndorff-Nielsen’s statement with the interior of what
you were calling C, and whether you are in the boundary or not. That is always filtering in the
sort of standard theory that you would see, but it always sort of floats by. Most of us don’t end
up really needing to worry about it. Or if you are going to end up working in discrete models and
binomial type models or something, then you have to be aware of it a bit. But it almost never is
as extreme as this. So, that’s something I’m missing here. What is the intuition behind this? Is it
the mapping involved? Why is it that it’s so extreme here, whereas for the most part in most
models that we would consider across a distribution of statistics, if you want to say roughly,
people don't have to worry about that seemingly innocuous theoretical result?
371

OCR for page 343

DR. HANDCOCK: That’s a good question. The underlying theory is the same, but here
are actually two forms. One form is the computational degeneracy: that is, for a given observed
graph, if you try to calculate the MLE, the MLE may or may not exist, and that is the standard
Barndorff-Nielsen form. Although I agree that it is rarely seen for very complex models, and this
is the first case I’ve seen beyond my undergraduate statistic classes.
The second is the same theory, but a completely separate idea is used for the model
degeneracy, which has no data in sight. It’s just that you’ve got a model. You are looking at it.
You’re saying what does this model do? You are treating it as a toy so you start playing with it
and seeing how it behaves. The same underlying geometry is important, but it’s not related to
those results about existence of an MLE. The basic idea is that if we change the view from the
natural parameter space, where interpretation of model properties is actually quite complex, to the
mean value parameter space, which is expressed in terms of the statistics—which we chose to be
the foundation of our model, and hence for which we should have good interpretation about—
then it gives us a much better lens to see about how the model is working.
The last part of your question is that the nonlinear mapping is exactly the key here. And
a stealth bomber is also an example plot that makes that clear. I should also point out, which I
didn’t earlier, Mark Newman has a nice paper where he looks at the mean-field approximation to
exactly this two-ster model, and I think he has been able to produce a similar plot for the same
model.
DR. HOFF: Peter Hoff, University of Washington.
Before I begin Mark, I just want to remind you that you were the one who taught me that
I should be critical of these types of models. So, when we make univariate a parameter, and we
have replications, and we make model-based inference, we have some way of evaluating whether
or not a model is good or adequate. Or even if we are interested in the mean of a population, we
have replicates. We can rely on a stochastic distribution, which relies only on the first and second
moments of the actual population, and nothing else. So in a way we are not concerned with
model misspecification, or we have ways of dealing with it, and in these situations, the way you
formulate the model, you have one observation, so my question is your work, a lot of it shows
that model selection is extremely critical. And the choice of these statistics you put in, that’s the
choice of the model, is extremely critical. So, I’m just kind of curious if you could comment on
maybe the utility or how we can start thinking about getting replications to address this issue, and
if there is any way to actually address this issue without replications, because you are putting up a
table of P values and confidence intervals there. That is clearly going to rely on what other
statistics you choose to put in the model.
372

OCR for page 343

DR. HANDCOCK: There are a number of different issues here. First, I think there are
very valid points, very important points. Issue number one is single replication: well, that’s
clearly true. We’ve got a single graph, and what is essentially pseudo-replication, which we are
using to compute the standard errors and other things like that, is in essence induced by the
structure of the model. If I write down the given set of statistics that define that model, that
implicitly gives us a set of configurations, which we can define the replication over. So, in
essence, the standard errors are very dependent upon the specification of the actual model. Let
me give a very simple example. If we just have a model with edges in it, and clearly all the edges
are independent of each other we have essentially got replication of edges. If we put a model
with two stars in it, then if we look at other forms, you look at all the possible configurations
which are two stars, then the configurations which aren’t of that form then give us a replication
over then to look at the properties of the two stars involved. But note that that’s very dependent
upon the form so that’s a partial answer to the question.
I think the other answers are important as well. How do we specify this model if it really
matters how the form actually is? A natural answer to that, which is incomplete, is if you had
replications of these graphs, the classic, purely independent forms would apply, and then we
could work with them, and that would be one solution. The other answer I think is it’s worth
stepping back and asking ourselves whether to have pure independence is required. I’ll just
remind us that every spatial process has exactly the same issues. All the spatial statistics deal
with spatial processes which are dependent across their full domain. I think a spatial lattice
process, continuous field processes all have dependence, and you only have a single realization.
So, this is more a property of dependent models rather than social network models on this
particular model class.
Coming back to Peter’s last point, model specification is extremely difficult here, because
you are using the model specification to also find misspecification in that model. And I strongly
emphasize the use of the goodness of fit methods that Martina described.
And why I think this is very helpful is that if you have a statistic which is not in a model,
not in your specification, and you see who well the model actually can reconstruct that somewhat
separate, although obviously typically dependent statistic, and that gives you a way of just seeing
how it does represent properties you have not explicitly placed in your model. The other thing is
just a standard statistical approach of doing exact testing, based on this model, is of a similar ilk,
where you can do exact testing to look for model misspecification forms. I think Peter has raised
a good point here. I have been very critical of these models in the past for the reasons I have said,
and I’m actually using them. I’ll leave it at that.
373

OCR for page 343

REFERENCES
Barndorff-Nielsen, Ole. 1978. Comments on Paper by B. Efron and D.V. Hinkley. Biometrika
65(3):483.
Besag, J.E. 1974. Spatial interaction and the statistical analysis of lattice systems (with
Discussion). Journal of the Royal Statistical Society, Series B 36:192-236.
Cressie, N. A two-dimensional random walk in the presence of a partially reflecting barrier.
Journal of Applied Probability 11:199-205.
Doreian, P., and Frans Stokman. 1997. Evolution of Social Networks. Amsterdam, The
Netherlands: Overseas Publishers Association.
Frank, Ove, and D. Strauss. 1986. Markov Graphs. Journal of the American Statistical
Association 81(395):832-842.
Holland, P.W., and S. Leinhardt. 1981. An Exponential Family of Probability Distributions for
Directed Graphs. Journal of the American Statistical Association 76(373):33-50.
Lauritzen, S.L., and D.J. Spiegelhalter. 1988. Local computations with probabilities on graphical
structures and their application to expert systems (with discussion). Journal of the Royal
Statistical Society, Series B 50:157-224.
Leinhardt, Samuel. 1977. Social Networks: A Developing Paradigm. Burlington, Maine:
Academic Press.
Morris, Martina. 2004. Network Epidemiology: A Handbook for Survey Design and Data
Collection. London: Oxford University Press.
Newman, M.E.J. 2003. The structure and function of complex networks. SIAM Review 167-256.
Wasserman, Stanley, and Katherine Faust. 1994. Social Network Analysis: Methods and
Applications. Cambridge: U.K.: Cambridge University Press.
374