Cover Image

Not for Sale



View/Hide Left Panel

Stability and Degeneracy of Network Models

Mark S. Handcock, University of Washington


DR. HANDCOCK: There has been a lot of discussion related to the ideas I’ll be talking about today. This is not a new topic at all but I think its worthwhile looking at how these particular issues relate to the particular problem of models and social networks. My main idea is to see how very old issues and very old concerns apply or play themselves out in the context of social network models.

This is joint work and I’m going to go through some things here, but if you really want to find it, please look at these working papers on the Web at the Center for Statistics and the Social Sciences (CSSS) at the University of Washington (www.css.washington.edu). Much of this work was with Martina Morris and with the University of Washington’s Network Working Group, which includes Carter Butts, Jim Moody, and other folks that Martina mentioned yesterday. I particularly want to point out Dave Hunter of Pennsylvania State University, who is in the audience. Much of this work is done jointly with him.

I’ll review some topics that have been gone through before. I won’t belabor them now because they have basically already been covered. As I look around this room, the study of networks draws in a very diverse set of folks, and we have many varied objectives and a multitude of frameworks and languages used to express things. I don’t think I’ll spend much time on this. After one and a half days I think this is fairly clear.

One thing that is important to recognize, at least for the topics that I’ll be looking at, is there are many deep and relevant theories here, and a deep statistical literature grouped into several communities. For instance, the social networks community has done a massive amount of work as you saw last night in the after-dinner talk. Two key references would be Frank (1972) and Wasserman and Faust (1994). The statistical networks community has also worked on this topic for a long period of time, and some key references are Snijders (1997), Frank and Strauss (1986), and Hoff, Raftery, and Handcock (2002). In particular, it’s worthwhile to look at the work of David Strauss, which I think threads through much of what’s going on and is very important from a technical contribution. Other contributions which I think haven’t been discussed much at this workshop are from the spatial statistics community. There is a lot of very important work there and it’s very closely allied. If you are looking to work in this area I think it’s very important to read this stuff very closely. Good pointers into that literature include Besag (1974) and Cressie (1993). Another important literature, which I’ll mention later in this talk, is



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 343
Stability and Degeneracy of Network Models Mark S. Handcock, University of Washington DR. HANDCOCK: There has been a lot of discussion related to the ideas I’ll be talking about today. This is not a new topic at all but I think its worthwhile looking at how these particular issues relate to the particular problem of models and social networks. My main idea is to see how very old issues and very old concerns apply or play themselves out in the context of social network models. This is joint work and I’m going to go through some things here, but if you really want to find it, please look at these working papers on the Web at the Center for Statistics and the Social Sciences (CSSS) at the University of Washington (www.css.washington.edu). Much of this work was with Martina Morris and with the University of Washington’s Network Working Group, which includes Carter Butts, Jim Moody, and other folks that Martina mentioned yesterday. I particularly want to point out Dave Hunter of Pennsylvania State University, who is in the audience. Much of this work is done jointly with him. I’ll review some topics that have been gone through before. I won’t belabor them now because they have basically already been covered. As I look around this room, the study of networks draws in a very diverse set of folks, and we have many varied objectives and a multitude of frameworks and languages used to express things. I don’t think I’ll spend much time on this. After one and a half days I think this is fairly clear. One thing that is important to recognize, at least for the topics that I’ll be looking at, is there are many deep and relevant theories here, and a deep statistical literature grouped into several communities. For instance, the social networks community has done a massive amount of work as you saw last night in the after-dinner talk. Two key references would be Frank (1972) and Wasserman and Faust (1994). The statistical networks community has also worked on this topic for a long period of time, and some key references are Snijders (1997), Frank and Strauss (1986), and Hoff, Raftery, and Handcock (2002). In particular, it’s worthwhile to look at the work of David Strauss, which I think threads through much of what’s going on and is very important from a technical contribution. Other contributions which I think haven’t been discussed much at this workshop are from the spatial statistics community. There is a lot of very important work there and it’s very closely allied. If you are looking to work in this area I think it’s very important to read this stuff very closely. Good pointers into that literature include Besag (1974) and Cressie (1993). Another important literature, which I’ll mention later in this talk, is 343

OCR for page 343
exponential family theory. It’s been worked out a long time ago and the best place to start is a seminal book by Barndorff-Nielsen (1978). The last relevant literature is that coming from the graphical modeling community. They work on related ideas that are actually closer than it would seem by just judging the commonality in terms used. A good pointer is Lauritzen and Spiegelhalter (1988). Networks are very complex things, and how we choose to model them will reflects what we have tried to represent well and what we are essentially not trying to represent well. That choice is going to be driven much by the objectives we have in mind. This is an obvious point, a point that drives through much of scientific modeling, but I have found that it’s very important for the networks area because of the complexity of the underlying models and the complexity of the phenomena we are interested in. What we are probably interested in first would be the nature of the relationships themselves, questions such as how the behavior of individuals depends on their location in a social network, or how the qualities of the individuals influence the social structure. Then we might be interested in how network structure influences processes that develop over a network. Dynamic or otherwise, the classic examples would include the spread of infection, the diffusion of innovations, or the spread of computer viruses, all of which are affected by the network structure. Lastly—and I think this is of primary importance and why many people actually study networks, even though they will look at it only after understanding the relationships and the network structure—is that we are interested in the effect of interventions. If we change the network structure and/or the processes that develop over that network, how will the network structure play itself out, and how will the process itself be actually changed? If you make changes to the network almost certainly the consequences of those changes will not be obvious. Another important point is that our objectives define our perspectives. There is a difference between a so-called network-specific viewpoint and a population process. By network-specific I mean we look at a given network. It might be sampled or have missing data in it, but our scientific objective is that particular network. That is in contrast with a population viewpoint, where we view the data we have, whether it’s complete or sampled, as a realization from some underlying social phenomena typically represented through a stochastic process, and we wish to understand the properties of that stochastic process. In that case, the observed network is conceptualized as a realization of a stochastic process. I think a lot of the different approaches in the literature look very different because they have these different perspectives and objectives in that mind. For example, repeating a point 344

OCR for page 343
made last night by Steve Borgatti, one can compare a social-relations perspective with a nodal attribute compositional perspective. The former brings out interdependence/endogeneity and structure and positional characteristics, while the latter focuses on atomistic essentialism and reductionism. I won’t say more about these things because I think they have been adequately covered. Figure 1 defines a very basic model for a social network. FIGURE 1 Hence, we can think of Y, the g-by-g matrix of relationships among the actors, as the sociomatrix. In a very simple sense, we want to write down a stochastic model for the joint distribution of Y, which is this large, multivariate distribution, often discrete, but it could be continuous also. All we want is a relatively simple “model” for the dependent structure of this multivariate random variable; that’s how a statistician would view this problem, and that view dominates the development. 345

OCR for page 343
FIGURE 2 Martina Morris showed Figure 2 yesterday but I’ll show it again just to bring it forward. Let Y be the sample space of all possible graphs, for example, just presence or absence of ties, and we write down an exponential family model of the form shown. We also have the space of graphs that are possible. The denominator is a constant which is defined as the sum of the numerators. One thing that I think is worthwhile to look at is a way of getting a general interpretation of the model through local specification. This is shown in Figure 3. We define Yc as the graph excluding the ij component of the network. These are all the network elements, excluding the ij one. This is a standard formula, but I think it’s quite instructive. What it says is the probability of a tie between i and j given the rest of the network, divided by the probability of a non-tie between i and j with the rest of the network held fixed. The basic idea is that we consider the graph, and we hold the graph fixed and think about taking a particular tie: what happens when we change it from a 0 to 1? Looking at the odds of the tie conditional on the rest gives us a direct interpretation of these forms. It can also be interpreted in terms of a relative risk form. The basic idea gives us an interpretation of either, at least one or many others in terms of the local specifications of the actual form. How a particular dyad probably would tie in between a particular pair of actors is influenced by the surrounding ties. By specifying the local properties, as long as they are done in this way, you get the global joint distribution, going from the local specification to give you a global one. Again, this is relatively straightforward from an algebraic 346

OCR for page 343
standpoint. It’s just helping us with the interpretation. FIGURE 3 I’ll make a brief comment on models for the actor degree distribution, which is shown in Figure 4. We can write a particular model of this form where the statistics are just the proportion of actors with exactly k relationships, i.e., the proportion of the nodes with k relationship or degree k. Basically, we write just a linear predictor on these forms, this is the so-called degree- only model. In essence what it is saying is if we can condition on the degrees, and all of the graphs with those degree structures are equally likely, in that sense it’s random mixing given a given degree sequence. This has a very long history in social networks community, and there are the two references shown in Figure 4 plus many others that are good places to start. This is receiving an enormous amount of work. The reason I’m pointing this out is because there has been a lot of work on these models, and they are being considered both from a technical and from a theoretical perspective. I often think that science is not helped a lot when that prior work is forgotten. The other thing you can do here is further parameterize the degree distribution, which essentially places nonlinear parametric constraints on the α parameters here. As most statisticians know, this moves it technically from just a straight linear exponential family to a curve exponential family. Dave Hunter and I did some work on models of that kind. 347

OCR for page 343
FIGURE 4 We could then add in—Martina did this yesterday, and I’ll show this again briefly— co-variates that occur in this linear fashion, attributes of the nodes, attributes of the dyads. We can add some additional clustering component to a degree form in this way. This is shown in Figure 5. FIGURE 5 348

OCR for page 343
I’ll give you just a couple of illustrations of how this model might apply in particular instances. Say you wanted to represent a village-level structure with 50 actors, a clustering coefficient of 15 percent, and the degree distribution was Yule with a scaling exponent of 3. The Yule is the classic preferential attachment power law model, and we can just see how that looks. Figure 6 shows two different networks generated from that model. FIGURE 6 The network on the left side of Figure 6 is one with zero clustering, and on the right we see what happens when the mean clustering coefficient is pushed up to 15 percent with the same degree distribution. The basic notion is that these models give you a way of incorporating known clustering coefficients in the model, while holding a degree distribution fixed. Just to reiterate points made earlier that the degree distribution is clearly not all of what is going on. 349

OCR for page 343
FIGURE 7 Figure 7 shows the same thing with 1,000 nodes, so it’s a pretty big village. You can see to get the clustering going on with a given degree of distribution it’s forcing a lot of geodesics in this form. Of course, with 1,000 nodes it’s pretty hard to see. Figure 8 is a bipartite graph; the same model form works for bipartite graphs. The network in the upper left is a heterosexual Yule with no correlation and the one in the upper right is a heterosexual Yule with a strong correlation triangle percent of 60 percent versus 3, which is the default one for random mixing given degree. There is a modest one here as well as one with negative correlation. I don’t think I’m going to say too much more about these. 350

OCR for page 343
FIGURE 8 The main topic I would like to talk about today is to essentially address a canonical problem: how can we tell if a model class is useful? That is, if you write down a particular form of the kind in Figure 2, we can be introduce statistics based on principles of underlying local specifications or local configurations, or we can just write down statistics that we believe would a priori, from a scientific perspective, explain a lot of the variation in the actual graph. But the natural question is, because these statistics will tend to be highly correlated with each other, highly dependent, it’s not really clear for any given model exactly what the node qualities of that model would actually be. As Martina Morris showed yesterday, the natural idea of starting from something very simple can sometimes lead to models that aren’t very good. It is true the properties we saw were the properties of the model we wrote down. It’s essentially saying that any implication that simple models have simple properties is not true. This has been known in statistical mechanics for a very long period of time. It’s always a little bit of a surprise to see them occur in applied models or very empirically-based models. So, the basic question here is, is a model class itself able to represent a range of realistic networks? It’s not the only question, but one question that could be asked here, and this is what I refer to as the issue of model degeneracy (Handcock, 2003). The idea is that some model classes will only represent a 351

OCR for page 343
small range of graphs as the parameters are varied. In circumstances where we like a fairly general model to cover a large range of graph and graph types that might not be a desirable property of a model. The second issue is what are the properties of different methods of estimation, such as Maximum likelihood estimation, pseudo-likelihood, or some Bayesian framework? I’ll make some comments on the computational issues where certain estimators do or don’t exist in that many forms (e.g., see Snijders, 2002, and Handcock, 2002). The last issue in assessing if a model class is useful is whether we can assess the goodness of fit of models. For example, we have a graph and we have a model, how well does the model actually fit the actual graph? I’ll say some notes here about measuring this. I don’t think I’ll say a lot about this, but I think the application Martina went through yesterday was very interesting in terms of a stylized view of how that would be done. Some background on this topic may be found in Besag (2000) and Hunter, Goodreau, and Handcock (2005). Figure 9 illustrates some points on model degeneracy. It is a property of a random graph model; it’s nothing to do with data per se, but the model itself. We call it near degenerate if the model places all its probability mass, i.e., the likelihood of certain graphs on a small number of actual graphs. An example of this would be the empty graph, as shown in Figure 10. If we know that a model produces the empty graph with probably close to 1, that’s probably not good, or the full graph or some mixture of them. In some sense, subsets of the set of possible graphs which are regarded for the particular scientific application is interesting. If we are interested in heterosexual graphs, and the model chosen places all of its mass on heterosexual graphs which is a large subset, that’s a good thing. This idea is clearly application-specific. 352

OCR for page 343
FIGURE 9 FIGURE 10 353

OCR for page 343
FIGURE 24 I’ll make a brief comment on MCMC. For those who haven’t seen a lot of Monte Carlo this is the idea. I find it’s a simple way of thinking about likelihood estimation. The idea here, shown in Figure 24, is if you want to estimate the partition function or normalizing constant which is essentially the population mean of this over this set of all possible graphs. Being statisticians, if we have a very large population to measure the mean of what we would do is draw a sample from that population. We would then calculate the sample mean and use that to replace the population mean, which is one of the simplest ideas in statistics. In essence what we do is draw a sample of the possible graphs, compute the corresponding sample mean, and use this to approximate the partition function. The question is how do we get a sample of graphs? MCMC graphs are a natural way. This has been developed and I won’t say too much more about it. There is a result which corresponds to any MCMC likelihood used in that way actually converges with sufficient iterations which I won’t belabor here. I always find interesting the relationship between a near degenerate model and MCMC estimation. Figure 25 addresses this idea, which goes all the way back to very nice work by Charlie Geyer. Basically, the idea is—there is some mathematics that goes behind it—if your model is near degenerate then your MCMC won’t mix very well. This might be a surprise to most folks who have ever tried this, but just to give you some sense in practice of how this works I’ll show you again this two-star model in Figure 26. 364

OCR for page 343
FIGURE 25 FIGURE 26 For example, suppose for the two-star models you want to choose a mean value with 9 edges and about 40 two stars, and you run your MCMC sampler. Figure 27 shows what you get. Many people have probably run into these if they have ever tried it. 365

OCR for page 343
FIGURE 27 This is the trace part of the edges. I’m running the MCMC sampler for about 100,000 iterations of this simple 7 node process, and these are the trace-plotted number of edges as we draw from that mark off chain. It stays up here around 19, and dropping down to 15, 17, and suddenly jumps down to 3 or 2 or 0. It stays down there for maybe 20,000 and jumps back up and jumps back down. So, what we are actually seeing, if you look at the marginal distribution of these draws, is a profoundly polarized distribution with most of the draws from very low values and some of the draws from quite high. And of course, in such a way that the mean value is 9, which is exactly what we designed the model to actually do. This is another view of the example that Martina gave yesterday of the two star model. Note that this sampler is doing great. It is giving us samples back from the model we asked it for, but now that we looked at this we would probably say we don’t want this model therefore bad mixing of an MCMC is highly related to these degeneracy properties. 366

OCR for page 343
FIGURE 28 To come back very briefly, the estimation of the mean value parameterization is essentially just as hard as it is in natural parameterization, although the corresponding point estimate itself is trivial. It’s just the observed statistics. Of course, to find the properties of that point estimator, and it’s no good unless you know what its properties are, essentially you need the same MCMC methods as you need for the natural parameterization. It is much easier to do the natural parameterization from a computational perspective in terms of writing code because of slightly simpler forms. You don’t need to solve some part of the inverse problem. To finish, I’ll say a little bit about network sampling because it has come up a bunch of times during this workshop. I’ll give you some idea of the classical statistical treatment of it going back to Frank in 1972 and other work since. We think of our design mechanism as that part of the observation process that is under the control of the researcher. So, if we are thinking about doing network sampling, the design mechanism is how the researchers design the sampling process. Examples would be surveys using egocentric, snowball, or other link-tracing sampling. There is the out-of-design mechanism also, which is the unintentional non-observation of network information, meaning the mechanisms by which information is missed, such as the failure to report links, incomplete measurement of links, and attrition from longitudinal surveys. Note that for any sampling mechanism we need to deal with both these components of it. It is sometimes convenient to cluster design mechanisms into conventional, adaptive, and convenience designs. So-called conventional designs do not use the information collected during a survey to direct the subsequent sampling of individuals. For example, we’ll sample everyone and do a network census, or we might do egocentric designs and randomly choose a number of individuals then look at the actual ties of only those individuals. That is, we don’t use anything that we collect during a survey to look at subsequent 367

OCR for page 343
sampling. This might sound like an odd way to do it, but let me describe adaptive designs for contrast. In adaptive designs, we actually use information to direct the subsequent sampling; most network sampling is actually of this adaptive form. The idea is to collect ties. We then follow the links of the initial group and use the fact that we have observed the links of those individuals to essentially do contact tracing and move out in those forms. An example is classic snowball sampling link tracing around work designs. Many of the designs using computer science and physics fall under this form where you are using information taken during your particular survey to direct the subsequent sampling. There are also convenience designs where you might use something very intelligent but not very statistical. So, we are using convenient information; you just nab people who are close by and that are convenient to sample. In Handcock (2003) I examined likelihood-based inference for adaptive designs— basically how to fit the models I described earlier, and Martina Morris described, when you have a conventional design and, probably more importantly, when you have adaptive data. You actually have massively sampled data due to link tracing. It turns out that you can fit these models in this way, and there is a computationally feasible MCMC algorithm to actually do it, which I think is actually pretty helpful in practice. This is being implemented in a statnet package for R. As an example of the use of likelihood inference, I’ll briefly mention the Colorado Springs “Project 90,” which is familiar to many people in this room and which involved quite a broad sample. This study dealt with a population of prostitutes, pimps, and drug dealers in Colorado Springs; I will only focus on 1991. I’m looking at a heterosexual sexual network within the group of people who responded to a survey. Essentially what they did was a form of link tracing, which could be referred to a bi-link tracing, where you are recruited into the survey if two other respondents nominated you. It wasn’t enough just to be nominated by one individual, you had to be nominated by two. 368

OCR for page 343
FIGURE 29 Figure 29 presents an example of that network. You see it has a relatively large component in the center. Many of the ties tend to be just monogamy here, just split up. You then have some small components around the side, and this core is around 300 folks. I think there are some isolates also. You get this sense that there is a large part that is connected but not massively, so individuals float around. Note this is a visualization, so there are some artifacts. For instance, it looks more central than it actually is. There is a lot of missing data here due to the sampling processes. You are only seeing the actual part of it that is observed, and I won’t go through the boring detail, but if you run the program here, Figure 30 shows the sort of result you will get. You put in all these covariates. We know a lot of the information about the individuals, their age, their race, whether they are a prostitute or a pimp. What we did was looked at different types of tie formation based on age, race, and occupation. We also looked at whether there was homophily based on race, a homophily based on age, and then these endogenous parameters that measures the dependency involved. I won’t say too much about this particular model, but you can get the natural parameter estimates. I also measure the standard errors induced by the Markov of chain Monte Carlo sampling. In terms of goodness of fit, because this is an exponential model, we can compute the deviance here so we have the classic deviance statistics 369

OCR for page 343
for this process. Note that we do not have classical asymptotic approximations to their distributions! FIGURE 30 As we can see, attribute mixing does explain, although if you look at the p-values here, there is not a massive amount of difference based on attribute mixing. When we look at these dependence terms what we actually see is that there is a fairly large over supply of people with a degree of exactly 1. This parameter is quite large and positive, indicating there are a lot of people in monogamous relationships. There is also a fairly substantial homophily based on race involved in this process as well. In my two remaining seconds I will say we can use those model parameters to generate processes that look a lot like Colorado Springs, and Figure 31 shows two of them. They don’t have the larger component of the graph we saw in Figure 29, because this model doesn’t naturally have that form, but if you go through other realizations you can get very similar looking forms. 370

OCR for page 343
FIGURE 31 In conclusion, I’ll reiterate that large and deep literatures exist that are often ignored; simple models are being used to capture clustering and other structural properties, and the inclusion of attributes (e.g., actor attributes, dyad attributes) is very important. QUESTIONS AND ANSWERS DR. KOLACZYK: Eric Kolaczyk, Boston University. Mark, here is a quick question for you. My understanding is the nature of this degeneracy result is essentially relying on Barndorff-Nielsen’s statement with the interior of what you were calling C, and whether you are in the boundary or not. That is always filtering in the sort of standard theory that you would see, but it always sort of floats by. Most of us don’t end up really needing to worry about it. Or if you are going to end up working in discrete models and binomial type models or something, then you have to be aware of it a bit. But it almost never is as extreme as this. So, that’s something I’m missing here. What is the intuition behind this? Is it the mapping involved? Why is it that it’s so extreme here, whereas for the most part in most models that we would consider across a distribution of statistics, if you want to say roughly, people don't have to worry about that seemingly innocuous theoretical result? 371

OCR for page 343
DR. HANDCOCK: That’s a good question. The underlying theory is the same, but here are actually two forms. One form is the computational degeneracy: that is, for a given observed graph, if you try to calculate the MLE, the MLE may or may not exist, and that is the standard Barndorff-Nielsen form. Although I agree that it is rarely seen for very complex models, and this is the first case I’ve seen beyond my undergraduate statistic classes. The second is the same theory, but a completely separate idea is used for the model degeneracy, which has no data in sight. It’s just that you’ve got a model. You are looking at it. You’re saying what does this model do? You are treating it as a toy so you start playing with it and seeing how it behaves. The same underlying geometry is important, but it’s not related to those results about existence of an MLE. The basic idea is that if we change the view from the natural parameter space, where interpretation of model properties is actually quite complex, to the mean value parameter space, which is expressed in terms of the statistics—which we chose to be the foundation of our model, and hence for which we should have good interpretation about— then it gives us a much better lens to see about how the model is working. The last part of your question is that the nonlinear mapping is exactly the key here. And a stealth bomber is also an example plot that makes that clear. I should also point out, which I didn’t earlier, Mark Newman has a nice paper where he looks at the mean-field approximation to exactly this two-ster model, and I think he has been able to produce a similar plot for the same model. DR. HOFF: Peter Hoff, University of Washington. Before I begin Mark, I just want to remind you that you were the one who taught me that I should be critical of these types of models. So, when we make univariate a parameter, and we have replications, and we make model-based inference, we have some way of evaluating whether or not a model is good or adequate. Or even if we are interested in the mean of a population, we have replicates. We can rely on a stochastic distribution, which relies only on the first and second moments of the actual population, and nothing else. So in a way we are not concerned with model misspecification, or we have ways of dealing with it, and in these situations, the way you formulate the model, you have one observation, so my question is your work, a lot of it shows that model selection is extremely critical. And the choice of these statistics you put in, that’s the choice of the model, is extremely critical. So, I’m just kind of curious if you could comment on maybe the utility or how we can start thinking about getting replications to address this issue, and if there is any way to actually address this issue without replications, because you are putting up a table of P values and confidence intervals there. That is clearly going to rely on what other statistics you choose to put in the model. 372

OCR for page 343
DR. HANDCOCK: There are a number of different issues here. First, I think there are very valid points, very important points. Issue number one is single replication: well, that’s clearly true. We’ve got a single graph, and what is essentially pseudo-replication, which we are using to compute the standard errors and other things like that, is in essence induced by the structure of the model. If I write down the given set of statistics that define that model, that implicitly gives us a set of configurations, which we can define the replication over. So, in essence, the standard errors are very dependent upon the specification of the actual model. Let me give a very simple example. If we just have a model with edges in it, and clearly all the edges are independent of each other we have essentially got replication of edges. If we put a model with two stars in it, then if we look at other forms, you look at all the possible configurations which are two stars, then the configurations which aren’t of that form then give us a replication over then to look at the properties of the two stars involved. But note that that’s very dependent upon the form so that’s a partial answer to the question. I think the other answers are important as well. How do we specify this model if it really matters how the form actually is? A natural answer to that, which is incomplete, is if you had replications of these graphs, the classic, purely independent forms would apply, and then we could work with them, and that would be one solution. The other answer I think is it’s worth stepping back and asking ourselves whether to have pure independence is required. I’ll just remind us that every spatial process has exactly the same issues. All the spatial statistics deal with spatial processes which are dependent across their full domain. I think a spatial lattice process, continuous field processes all have dependence, and you only have a single realization. So, this is more a property of dependent models rather than social network models on this particular model class. Coming back to Peter’s last point, model specification is extremely difficult here, because you are using the model specification to also find misspecification in that model. And I strongly emphasize the use of the goodness of fit methods that Martina described. And why I think this is very helpful is that if you have a statistic which is not in a model, not in your specification, and you see who well the model actually can reconstruct that somewhat separate, although obviously typically dependent statistic, and that gives you a way of just seeing how it does represent properties you have not explicitly placed in your model. The other thing is just a standard statistical approach of doing exact testing, based on this model, is of a similar ilk, where you can do exact testing to look for model misspecification forms. I think Peter has raised a good point here. I have been very critical of these models in the past for the reasons I have said, and I’m actually using them. I’ll leave it at that. 373

OCR for page 343
REFERENCES Barndorff-Nielsen, Ole. 1978. Comments on Paper by B. Efron and D.V. Hinkley. Biometrika 65(3):483. Besag, J.E. 1974. Spatial interaction and the statistical analysis of lattice systems (with Discussion). Journal of the Royal Statistical Society, Series B 36:192-236. Cressie, N. A two-dimensional random walk in the presence of a partially reflecting barrier. Journal of Applied Probability 11:199-205. Doreian, P., and Frans Stokman. 1997. Evolution of Social Networks. Amsterdam, The Netherlands: Overseas Publishers Association. Frank, Ove, and D. Strauss. 1986. Markov Graphs. Journal of the American Statistical Association 81(395):832-842. Holland, P.W., and S. Leinhardt. 1981. An Exponential Family of Probability Distributions for Directed Graphs. Journal of the American Statistical Association 76(373):33-50. Lauritzen, S.L., and D.J. Spiegelhalter. 1988. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, Series B 50:157-224. Leinhardt, Samuel. 1977. Social Networks: A Developing Paradigm. Burlington, Maine: Academic Press. Morris, Martina. 2004. Network Epidemiology: A Handbook for Survey Design and Data Collection. London: Oxford University Press. Newman, M.E.J. 2003. The structure and function of complex networks. SIAM Review 167-256. Wasserman, Stanley, and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge: U.K.: Cambridge University Press. 374