National Academies Press: OpenBook

Proceedings of a Workshop on Statistics on Networks (CD-ROM) (2007)

Chapter: Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington

« Previous: Robustness and Fragility--Jean M. Carlson, University of California at Santa Barbara
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 343
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 344
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 345
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 346
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 347
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 348
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 349
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 350
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 351
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 352
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 353
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 354
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 355
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 356
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 357
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 358
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 359
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 360
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 361
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 362
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 363
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 364
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 365
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 366
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 367
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 368
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 369
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 370
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 371
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 372
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 373
Suggested Citation:"Stability and Degeneracy of Network Models--Mark S. Handcock, University of Washington." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 374

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Stability and Degeneracy of Network Models Mark S. Handcock, University of Washington DR. HANDCOCK: There has been a lot of discussion related to the ideas I’ll be talking about today. This is not a new topic at all but I think its worthwhile looking at how these particular issues relate to the particular problem of models and social networks. My main idea is to see how very old issues and very old concerns apply or play themselves out in the context of social network models. This is joint work and I’m going to go through some things here, but if you really want to find it, please look at these working papers on the Web at the Center for Statistics and the Social Sciences (CSSS) at the University of Washington (www.css.washington.edu). Much of this work was with Martina Morris and with the University of Washington’s Network Working Group, which includes Carter Butts, Jim Moody, and other folks that Martina mentioned yesterday. I particularly want to point out Dave Hunter of Pennsylvania State University, who is in the audience. Much of this work is done jointly with him. I’ll review some topics that have been gone through before. I won’t belabor them now because they have basically already been covered. As I look around this room, the study of networks draws in a very diverse set of folks, and we have many varied objectives and a multitude of frameworks and languages used to express things. I don’t think I’ll spend much time on this. After one and a half days I think this is fairly clear. One thing that is important to recognize, at least for the topics that I’ll be looking at, is there are many deep and relevant theories here, and a deep statistical literature grouped into several communities. For instance, the social networks community has done a massive amount of work as you saw last night in the after-dinner talk. Two key references would be Frank (1972) and Wasserman and Faust (1994). The statistical networks community has also worked on this topic for a long period of time, and some key references are Snijders (1997), Frank and Strauss (1986), and Hoff, Raftery, and Handcock (2002). In particular, it’s worthwhile to look at the work of David Strauss, which I think threads through much of what’s going on and is very important from a technical contribution. Other contributions which I think haven’t been discussed much at this workshop are from the spatial statistics community. There is a lot of very important work there and it’s very closely allied. If you are looking to work in this area I think it’s very important to read this stuff very closely. Good pointers into that literature include Besag (1974) and Cressie (1993). Another important literature, which I’ll mention later in this talk, is 343

exponential family theory. It’s been worked out a long time ago and the best place to start is a seminal book by Barndorff-Nielsen (1978). The last relevant literature is that coming from the graphical modeling community. They work on related ideas that are actually closer than it would seem by just judging the commonality in terms used. A good pointer is Lauritzen and Spiegelhalter (1988). Networks are very complex things, and how we choose to model them will reflects what we have tried to represent well and what we are essentially not trying to represent well. That choice is going to be driven much by the objectives we have in mind. This is an obvious point, a point that drives through much of scientific modeling, but I have found that it’s very important for the networks area because of the complexity of the underlying models and the complexity of the phenomena we are interested in. What we are probably interested in first would be the nature of the relationships themselves, questions such as how the behavior of individuals depends on their location in a social network, or how the qualities of the individuals influence the social structure. Then we might be interested in how network structure influences processes that develop over a network. Dynamic or otherwise, the classic examples would include the spread of infection, the diffusion of innovations, or the spread of computer viruses, all of which are affected by the network structure. Lastly—and I think this is of primary importance and why many people actually study networks, even though they will look at it only after understanding the relationships and the network structure—is that we are interested in the effect of interventions. If we change the network structure and/or the processes that develop over that network, how will the network structure play itself out, and how will the process itself be actually changed? If you make changes to the network almost certainly the consequences of those changes will not be obvious. Another important point is that our objectives define our perspectives. There is a difference between a so-called network-specific viewpoint and a population process. By network-specific I mean we look at a given network. It might be sampled or have missing data in it, but our scientific objective is that particular network. That is in contrast with a population viewpoint, where we view the data we have, whether it’s complete or sampled, as a realization from some underlying social phenomena typically represented through a stochastic process, and we wish to understand the properties of that stochastic process. In that case, the observed network is conceptualized as a realization of a stochastic process. I think a lot of the different approaches in the literature look very different because they have these different perspectives and objectives in that mind. For example, repeating a point 344

made last night by Steve Borgatti, one can compare a social-relations perspective with a nodal attribute compositional perspective. The former brings out interdependence/endogeneity and structure and positional characteristics, while the latter focuses on atomistic essentialism and reductionism. I won’t say more about these things because I think they have been adequately covered. Figure 1 defines a very basic model for a social network. FIGURE 1 Hence, we can think of Y, the g-by-g matrix of relationships among the actors, as the sociomatrix. In a very simple sense, we want to write down a stochastic model for the joint distribution of Y, which is this large, multivariate distribution, often discrete, but it could be continuous also. All we want is a relatively simple “model” for the dependent structure of this multivariate random variable; that’s how a statistician would view this problem, and that view dominates the development. 345

FIGURE 2 Martina Morris showed Figure 2 yesterday but I’ll show it again just to bring it forward. Let Y be the sample space of all possible graphs, for example, just presence or absence of ties, and we write down an exponential family model of the form shown. We also have the space of graphs that are possible. The denominator is a constant which is defined as the sum of the numerators. One thing that I think is worthwhile to look at is a way of getting a general interpretation of the model through local specification. This is shown in Figure 3. We define Yc as the graph excluding the ij component of the network. These are all the network elements, excluding the ij one. This is a standard formula, but I think it’s quite instructive. What it says is the probability of a tie between i and j given the rest of the network, divided by the probability of a non-tie between i and j with the rest of the network held fixed. The basic idea is that we consider the graph, and we hold the graph fixed and think about taking a particular tie: what happens when we change it from a 0 to 1? Looking at the odds of the tie conditional on the rest gives us a direct interpretation of these forms. It can also be interpreted in terms of a relative risk form. The basic idea gives us an interpretation of either, at least one or many others in terms of the local specifications of the actual form. How a particular dyad probably would tie in between a particular pair of actors is influenced by the surrounding ties. By specifying the local properties, as long as they are done in this way, you get the global joint distribution, going from the local specification to give you a global one. Again, this is relatively straightforward from an algebraic 346

standpoint. It’s just helping us with the interpretation. FIGURE 3 I’ll make a brief comment on models for the actor degree distribution, which is shown in Figure 4. We can write a particular model of this form where the statistics are just the proportion of actors with exactly k relationships, i.e., the proportion of the nodes with k relationship or degree k. Basically, we write just a linear predictor on these forms, this is the so-called degree- only model. In essence what it is saying is if we can condition on the degrees, and all of the graphs with those degree structures are equally likely, in that sense it’s random mixing given a given degree sequence. This has a very long history in social networks community, and there are the two references shown in Figure 4 plus many others that are good places to start. This is receiving an enormous amount of work. The reason I’m pointing this out is because there has been a lot of work on these models, and they are being considered both from a technical and from a theoretical perspective. I often think that science is not helped a lot when that prior work is forgotten. The other thing you can do here is further parameterize the degree distribution, which essentially places nonlinear parametric constraints on the α parameters here. As most statisticians know, this moves it technically from just a straight linear exponential family to a curve exponential family. Dave Hunter and I did some work on models of that kind. 347

FIGURE 4 We could then add in—Martina did this yesterday, and I’ll show this again briefly— co-variates that occur in this linear fashion, attributes of the nodes, attributes of the dyads. We can add some additional clustering component to a degree form in this way. This is shown in Figure 5. FIGURE 5 348

I’ll give you just a couple of illustrations of how this model might apply in particular instances. Say you wanted to represent a village-level structure with 50 actors, a clustering coefficient of 15 percent, and the degree distribution was Yule with a scaling exponent of 3. The Yule is the classic preferential attachment power law model, and we can just see how that looks. Figure 6 shows two different networks generated from that model. FIGURE 6 The network on the left side of Figure 6 is one with zero clustering, and on the right we see what happens when the mean clustering coefficient is pushed up to 15 percent with the same degree distribution. The basic notion is that these models give you a way of incorporating known clustering coefficients in the model, while holding a degree distribution fixed. Just to reiterate points made earlier that the degree distribution is clearly not all of what is going on. 349

FIGURE 7 Figure 7 shows the same thing with 1,000 nodes, so it’s a pretty big village. You can see to get the clustering going on with a given degree of distribution it’s forcing a lot of geodesics in this form. Of course, with 1,000 nodes it’s pretty hard to see. Figure 8 is a bipartite graph; the same model form works for bipartite graphs. The network in the upper left is a heterosexual Yule with no correlation and the one in the upper right is a heterosexual Yule with a strong correlation triangle percent of 60 percent versus 3, which is the default one for random mixing given degree. There is a modest one here as well as one with negative correlation. I don’t think I’m going to say too much more about these. 350

FIGURE 8 The main topic I would like to talk about today is to essentially address a canonical problem: how can we tell if a model class is useful? That is, if you write down a particular form of the kind in Figure 2, we can be introduce statistics based on principles of underlying local specifications or local configurations, or we can just write down statistics that we believe would a priori, from a scientific perspective, explain a lot of the variation in the actual graph. But the natural question is, because these statistics will tend to be highly correlated with each other, highly dependent, it’s not really clear for any given model exactly what the node qualities of that model would actually be. As Martina Morris showed yesterday, the natural idea of starting from something very simple can sometimes lead to models that aren’t very good. It is true the properties we saw were the properties of the model we wrote down. It’s essentially saying that any implication that simple models have simple properties is not true. This has been known in statistical mechanics for a very long period of time. It’s always a little bit of a surprise to see them occur in applied models or very empirically-based models. So, the basic question here is, is a model class itself able to represent a range of realistic networks? It’s not the only question, but one question that could be asked here, and this is what I refer to as the issue of model degeneracy (Handcock, 2003). The idea is that some model classes will only represent a 351

small range of graphs as the parameters are varied. In circumstances where we like a fairly general model to cover a large range of graph and graph types that might not be a desirable property of a model. The second issue is what are the properties of different methods of estimation, such as Maximum likelihood estimation, pseudo-likelihood, or some Bayesian framework? I’ll make some comments on the computational issues where certain estimators do or don’t exist in that many forms (e.g., see Snijders, 2002, and Handcock, 2002). The last issue in assessing if a model class is useful is whether we can assess the goodness of fit of models. For example, we have a graph and we have a model, how well does the model actually fit the actual graph? I’ll say some notes here about measuring this. I don’t think I’ll say a lot about this, but I think the application Martina went through yesterday was very interesting in terms of a stylized view of how that would be done. Some background on this topic may be found in Besag (2000) and Hunter, Goodreau, and Handcock (2005). Figure 9 illustrates some points on model degeneracy. It is a property of a random graph model; it’s nothing to do with data per se, but the model itself. We call it near degenerate if the model places all its probability mass, i.e., the likelihood of certain graphs on a small number of actual graphs. An example of this would be the empty graph, as shown in Figure 10. If we know that a model produces the empty graph with probably close to 1, that’s probably not good, or the full graph or some mixture of them. In some sense, subsets of the set of possible graphs which are regarded for the particular scientific application is interesting. If we are interested in heterosexual graphs, and the model chosen places all of its mass on heterosexual graphs which is a large subset, that’s a good thing. This idea is clearly application-specific. 352

FIGURE 9 FIGURE 10 353

Let me illustrate this further with Figure 11, which is in some ways a toy model, but also in some ways an interesting model because of its connection to other fields. The horizontal axis relates to the number of edges in the graph, so it’s a measure of graph density. The other axis is the so-called two stars, which I interpret as either the path between j and k going through a third node I, or another way to interpret it is a clustering of edges. How often do these two edges end up having a node in common? FIGURE 11 The reason I like the second interpretation is this model has analogues in other areas. One analog would be the Ising model for lattice-base processes. It also has an analog as a Strauss model for spatial point processes in the very least. Basically, what it has is an overall density parameter, and a single parameter dependence measure. The clustering is of a certain kind, of course, because it's simple. If this parameter of 2 is positive here, it’s a sense of placing more probability on graphs with the positive cluster in here. If it’s negative it actually places on “regular” graphs: graphs that essentially have edges that do not end up at the same node. What we actually see, although this looks relatively simple and relatively easy to work with, is that it is 354

near degenerate for most positive values of n2. While this is particular galling to social networkers, as was pointed out yesterday for many social processes you have a positive clustering of ties, and as a result would expect most social process to have a positive parameter. This would seem like a good model to actually start, but in actual fact it is not a very good model at all. I’ll show why this is true. I’ll only focus on the plot shown in Figure 12. FIGURE 12 Figure 12 is just a plot of the 2-dimensional parameter space of this model. If the parameters are in the dark region, then all the models produce empty graphs. So, basically, it’s saying the density is so low and the clustering is so low that you get empty graphs. You essentially have one area of the parameter space which is producing essentially all of the mass in a very small subset of models. It’s only in this small area of the parameter space, around 0, which corresponds to just purely random graphs with equally likely ties, that you are about to get non- degenerate forms. Most of the time that θ2 is positive here, this entire upper hemisphere or upper part of the graph you actually have mostly degenerate models. It’s only the little small area here that you could actually get non-degenerate ones. 355

FIGURE 13 Now I’ll say a little bit about the geometry of exponential random graph models, as explained in Figure 13. For those of you who are very interested in this and are familiar with them, the papers on my Web site make this much more clear, so I’ll go through quite a stylized form. The central idea is this. Rather than thinking about the natural parameterization of the exponential family, we can think about the classical mean value parameterization of that family defined by the following mapping. The parameter corresponding to the natural parameter of z is just the expectation over y of the graph statistics, explicitly of this form η of z. Note that this mapping is injective. It is also strictly increasing. If you look at this product here what you see is that if you have a positive increase in the mean value here you will always have this product positive here. So, it gives us some sense of the relationship between the natural and the mean value parameters. This is shown by Barndorff-Wilson and others, that it’s just an alternative parameterization of the actual model. Why is that interesting? I think because this alternative parameterization has got a lot going for it, not necessarily to replace natural parameterization, but in a lot of scientific applications it's much more natural to look at. 356

FIGURE 14 Let’s go back to the two star model as shown in Figure 14. Let E be the number of edges and S be the number of two stars. Then μ1, the value per parameter corresponding to number of edges is just the expected number of edges, or with the simple dividing by the number of dyads, it’s just the probability that the two actors are linked. As an interpretation I find this a very natural thing because if we just see this form here it gives it away. If the probably that the two actors are linked that is a very natural parameter of this model. Again, when you divide by the number of potential two stars here it is the probability that a given actor is tied to randomly chosen other actors. If we choose two randomly chosen other actors what is the probability they are tied to a third? In Figure 15, we have a new parameterization in which the mean values are actually on scales that people usually think about in terms of social network forms, and I think they have a lot of value for that reason. The natural parameters in exponential families make the math really easy, but unfortunately their interpretation in terms of traditional log-logs can be very tortured at time because of the auto-logistic form. The auto dependence, meaning you are looking at some part conditional on holding some other parts of the graph fixed, which are closely dependent on them. And it is very, very tortured. 357

FIGURE 15 I’ll briefly show some things here and then I’ll move on. There are other parameterizations you can look at, such as that shown in Figure 16, which is just the mean value space. The natural parameter spaces across all of R2. The mean value space is from the number of edges, so 0-21 in my simple example with 7 nodes, and from 0-105. It’s a finite parameter space. The green area in Figure 16 is the interior of these points. 358

FIGURE 16 There are alternative mixed parameterizations which I find very useful that have a lot of very good statistical properties, and Figure 17 explains that. I want to say how this is relevant to the degeneracy issue, so we’ll define that now explicitly: a model is near degenerate if μ(η) is close to the boundary of C. Let degree Y = {y є Y : Z(y) є bdC} be the set of graph on the boundary of the convex hull. Based on the geometry of the mean value parametrization, the expected sufficient statistics are close to the boundary of the hull and the model will place much probability mass on graphs in degree Y. If Figure 18 shows our model space, then say you are close to the boundary of it, which I arbitrarily say is just this red rim around the corner. I’m essentially going to claim that models in this red rimmed boundary of the parameter space, the boundaries of the corresponding convex hull, that is going to be models which are degenerate. And models in the center are somewhat less degenerate, but it’s a graded thing. Based on the geometry of the mean value parameterization, which is the expected statistics from the actual model, it is going to look very bad if you are right next to the boundary. This can be quantified in very many ways due to mathematical theorems, which I won’t dwell on here. 359

FIGURE 17 FIGURE 18 360

I will ask a natural question: what happens when you map this red region around the edge of the parameter space, what I’m calling the degenerate region, back to the natural parameter space, and what happens when you map the green? Figure 19 gives the idea here and I think this explains a lot why people have had a lot of trouble modeling in this class. FIGURE 19 What I have done in Figure 19 is map the green region on the previous slide back to the natural parameter space here, which is all of R2. What we see is that the green region is mapped to this very jagged form here, and that small boundary red region is mapped to every other place. In essence, what this is saying is that if your parameter is in the region outside the green region, it’s essentially going to give you mean values on the edge of the parameter space, and hence very odd looking forms. The only reasonable values are the values in this small green area. What has gone on, in a nutshell is that people will be looking at changing the parameter values in the natural parameter space here, and we will be doing this fine. Moving it so slightly over and they will step off the edge into the other area here, which gives you very bizarre looking models. The geometry of this, which looks like some sort of “stealth bomber” gives us some sense that if you move around in a nice, smooth way in this space, you fall off an edge, you are in trouble. To make matters worse, most of the interesting models are in the positive dependent area so they are out on this peninsula of models. In a nutshell, what people are doing is changing 361

parameter values in this peninsula; make a small change, turning off the edge into a very degenerate model form. There are a lot of problems in this example, but it follows through more generally due to the nonlinear nature of the mapping here, which leads to rapid changes in model behavior through small changes in the natural parameter space. This statement can be quantified in a number of ways as shown in Figure 20: FIGURE 20 I’ll briefly speak about inference for social network models, although we can do inference based on the likelihood as before. We have a probability model. It’s natural to think of likelihood biased inference as shown in Figure 21. FIGURE 21 I will also say briefly about the existence and non-existence of maximum-likelihood estimators. The classical result, due to Barndorff-Nielsen (1978), is that we have necessary and sufficient conditions for the MLE to exist here. This is a classic result but it has enormous impact. If the observed graph statistics are in the interior of the convex hull of the discrete 362

support points for the sufficient statistics, the MLE exists, and it can be found. It is unique and can be found by directly solving equations or by direct optimization. FIGURE 22 Figures 22 and 23 tell us exactly what and how to do in terms of maximum likelihood. On the other hand if it’s on exterior of the convex hull, the MLE doesn’t exist. So, if we use a method of optimization to find it we’re in deep trouble. This caused a lot of problems in the past when people didn’t realize that was what was actually going on. FIGURE 23 I could belabor this and say a lot about corresponding results like pseudo-likelihood, but reading the paper is probably the easiest thing. 363

FIGURE 24 I’ll make a brief comment on MCMC. For those who haven’t seen a lot of Monte Carlo this is the idea. I find it’s a simple way of thinking about likelihood estimation. The idea here, shown in Figure 24, is if you want to estimate the partition function or normalizing constant which is essentially the population mean of this over this set of all possible graphs. Being statisticians, if we have a very large population to measure the mean of what we would do is draw a sample from that population. We would then calculate the sample mean and use that to replace the population mean, which is one of the simplest ideas in statistics. In essence what we do is draw a sample of the possible graphs, compute the corresponding sample mean, and use this to approximate the partition function. The question is how do we get a sample of graphs? MCMC graphs are a natural way. This has been developed and I won’t say too much more about it. There is a result which corresponds to any MCMC likelihood used in that way actually converges with sufficient iterations which I won’t belabor here. I always find interesting the relationship between a near degenerate model and MCMC estimation. Figure 25 addresses this idea, which goes all the way back to very nice work by Charlie Geyer. Basically, the idea is—there is some mathematics that goes behind it—if your model is near degenerate then your MCMC won’t mix very well. This might be a surprise to most folks who have ever tried this, but just to give you some sense in practice of how this works I’ll show you again this two-star model in Figure 26. 364

FIGURE 25 FIGURE 26 For example, suppose for the two-star models you want to choose a mean value with 9 edges and about 40 two stars, and you run your MCMC sampler. Figure 27 shows what you get. Many people have probably run into these if they have ever tried it. 365

FIGURE 27 This is the trace part of the edges. I’m running the MCMC sampler for about 100,000 iterations of this simple 7 node process, and these are the trace-plotted number of edges as we draw from that mark off chain. It stays up here around 19, and dropping down to 15, 17, and suddenly jumps down to 3 or 2 or 0. It stays down there for maybe 20,000 and jumps back up and jumps back down. So, what we are actually seeing, if you look at the marginal distribution of these draws, is a profoundly polarized distribution with most of the draws from very low values and some of the draws from quite high. And of course, in such a way that the mean value is 9, which is exactly what we designed the model to actually do. This is another view of the example that Martina gave yesterday of the two star model. Note that this sampler is doing great. It is giving us samples back from the model we asked it for, but now that we looked at this we would probably say we don’t want this model therefore bad mixing of an MCMC is highly related to these degeneracy properties. 366

FIGURE 28 To come back very briefly, the estimation of the mean value parameterization is essentially just as hard as it is in natural parameterization, although the corresponding point estimate itself is trivial. It’s just the observed statistics. Of course, to find the properties of that point estimator, and it’s no good unless you know what its properties are, essentially you need the same MCMC methods as you need for the natural parameterization. It is much easier to do the natural parameterization from a computational perspective in terms of writing code because of slightly simpler forms. You don’t need to solve some part of the inverse problem. To finish, I’ll say a little bit about network sampling because it has come up a bunch of times during this workshop. I’ll give you some idea of the classical statistical treatment of it going back to Frank in 1972 and other work since. We think of our design mechanism as that part of the observation process that is under the control of the researcher. So, if we are thinking about doing network sampling, the design mechanism is how the researchers design the sampling process. Examples would be surveys using egocentric, snowball, or other link-tracing sampling. There is the out-of-design mechanism also, which is the unintentional non-observation of network information, meaning the mechanisms by which information is missed, such as the failure to report links, incomplete measurement of links, and attrition from longitudinal surveys. Note that for any sampling mechanism we need to deal with both these components of it. It is sometimes convenient to cluster design mechanisms into conventional, adaptive, and convenience designs. So-called conventional designs do not use the information collected during a survey to direct the subsequent sampling of individuals. For example, we’ll sample everyone and do a network census, or we might do egocentric designs and randomly choose a number of individuals then look at the actual ties of only those individuals. That is, we don’t use anything that we collect during a survey to look at subsequent 367

sampling. This might sound like an odd way to do it, but let me describe adaptive designs for contrast. In adaptive designs, we actually use information to direct the subsequent sampling; most network sampling is actually of this adaptive form. The idea is to collect ties. We then follow the links of the initial group and use the fact that we have observed the links of those individuals to essentially do contact tracing and move out in those forms. An example is classic snowball sampling link tracing around work designs. Many of the designs using computer science and physics fall under this form where you are using information taken during your particular survey to direct the subsequent sampling. There are also convenience designs where you might use something very intelligent but not very statistical. So, we are using convenient information; you just nab people who are close by and that are convenient to sample. In Handcock (2003) I examined likelihood-based inference for adaptive designs— basically how to fit the models I described earlier, and Martina Morris described, when you have a conventional design and, probably more importantly, when you have adaptive data. You actually have massively sampled data due to link tracing. It turns out that you can fit these models in this way, and there is a computationally feasible MCMC algorithm to actually do it, which I think is actually pretty helpful in practice. This is being implemented in a statnet package for R. As an example of the use of likelihood inference, I’ll briefly mention the Colorado Springs “Project 90,” which is familiar to many people in this room and which involved quite a broad sample. This study dealt with a population of prostitutes, pimps, and drug dealers in Colorado Springs; I will only focus on 1991. I’m looking at a heterosexual sexual network within the group of people who responded to a survey. Essentially what they did was a form of link tracing, which could be referred to a bi-link tracing, where you are recruited into the survey if two other respondents nominated you. It wasn’t enough just to be nominated by one individual, you had to be nominated by two. 368

FIGURE 29 Figure 29 presents an example of that network. You see it has a relatively large component in the center. Many of the ties tend to be just monogamy here, just split up. You then have some small components around the side, and this core is around 300 folks. I think there are some isolates also. You get this sense that there is a large part that is connected but not massively, so individuals float around. Note this is a visualization, so there are some artifacts. For instance, it looks more central than it actually is. There is a lot of missing data here due to the sampling processes. You are only seeing the actual part of it that is observed, and I won’t go through the boring detail, but if you run the program here, Figure 30 shows the sort of result you will get. You put in all these covariates. We know a lot of the information about the individuals, their age, their race, whether they are a prostitute or a pimp. What we did was looked at different types of tie formation based on age, race, and occupation. We also looked at whether there was homophily based on race, a homophily based on age, and then these endogenous parameters that measures the dependency involved. I won’t say too much about this particular model, but you can get the natural parameter estimates. I also measure the standard errors induced by the Markov of chain Monte Carlo sampling. In terms of goodness of fit, because this is an exponential model, we can compute the deviance here so we have the classic deviance statistics 369

for this process. Note that we do not have classical asymptotic approximations to their distributions! FIGURE 30 As we can see, attribute mixing does explain, although if you look at the p-values here, there is not a massive amount of difference based on attribute mixing. When we look at these dependence terms what we actually see is that there is a fairly large over supply of people with a degree of exactly 1. This parameter is quite large and positive, indicating there are a lot of people in monogamous relationships. There is also a fairly substantial homophily based on race involved in this process as well. In my two remaining seconds I will say we can use those model parameters to generate processes that look a lot like Colorado Springs, and Figure 31 shows two of them. They don’t have the larger component of the graph we saw in Figure 29, because this model doesn’t naturally have that form, but if you go through other realizations you can get very similar looking forms. 370

FIGURE 31 In conclusion, I’ll reiterate that large and deep literatures exist that are often ignored; simple models are being used to capture clustering and other structural properties, and the inclusion of attributes (e.g., actor attributes, dyad attributes) is very important. QUESTIONS AND ANSWERS DR. KOLACZYK: Eric Kolaczyk, Boston University. Mark, here is a quick question for you. My understanding is the nature of this degeneracy result is essentially relying on Barndorff-Nielsen’s statement with the interior of what you were calling C, and whether you are in the boundary or not. That is always filtering in the sort of standard theory that you would see, but it always sort of floats by. Most of us don’t end up really needing to worry about it. Or if you are going to end up working in discrete models and binomial type models or something, then you have to be aware of it a bit. But it almost never is as extreme as this. So, that’s something I’m missing here. What is the intuition behind this? Is it the mapping involved? Why is it that it’s so extreme here, whereas for the most part in most models that we would consider across a distribution of statistics, if you want to say roughly, people don't have to worry about that seemingly innocuous theoretical result? 371

DR. HANDCOCK: That’s a good question. The underlying theory is the same, but here are actually two forms. One form is the computational degeneracy: that is, for a given observed graph, if you try to calculate the MLE, the MLE may or may not exist, and that is the standard Barndorff-Nielsen form. Although I agree that it is rarely seen for very complex models, and this is the first case I’ve seen beyond my undergraduate statistic classes. The second is the same theory, but a completely separate idea is used for the model degeneracy, which has no data in sight. It’s just that you’ve got a model. You are looking at it. You’re saying what does this model do? You are treating it as a toy so you start playing with it and seeing how it behaves. The same underlying geometry is important, but it’s not related to those results about existence of an MLE. The basic idea is that if we change the view from the natural parameter space, where interpretation of model properties is actually quite complex, to the mean value parameter space, which is expressed in terms of the statistics—which we chose to be the foundation of our model, and hence for which we should have good interpretation about— then it gives us a much better lens to see about how the model is working. The last part of your question is that the nonlinear mapping is exactly the key here. And a stealth bomber is also an example plot that makes that clear. I should also point out, which I didn’t earlier, Mark Newman has a nice paper where he looks at the mean-field approximation to exactly this two-ster model, and I think he has been able to produce a similar plot for the same model. DR. HOFF: Peter Hoff, University of Washington. Before I begin Mark, I just want to remind you that you were the one who taught me that I should be critical of these types of models. So, when we make univariate a parameter, and we have replications, and we make model-based inference, we have some way of evaluating whether or not a model is good or adequate. Or even if we are interested in the mean of a population, we have replicates. We can rely on a stochastic distribution, which relies only on the first and second moments of the actual population, and nothing else. So in a way we are not concerned with model misspecification, or we have ways of dealing with it, and in these situations, the way you formulate the model, you have one observation, so my question is your work, a lot of it shows that model selection is extremely critical. And the choice of these statistics you put in, that’s the choice of the model, is extremely critical. So, I’m just kind of curious if you could comment on maybe the utility or how we can start thinking about getting replications to address this issue, and if there is any way to actually address this issue without replications, because you are putting up a table of P values and confidence intervals there. That is clearly going to rely on what other statistics you choose to put in the model. 372

DR. HANDCOCK: There are a number of different issues here. First, I think there are very valid points, very important points. Issue number one is single replication: well, that’s clearly true. We’ve got a single graph, and what is essentially pseudo-replication, which we are using to compute the standard errors and other things like that, is in essence induced by the structure of the model. If I write down the given set of statistics that define that model, that implicitly gives us a set of configurations, which we can define the replication over. So, in essence, the standard errors are very dependent upon the specification of the actual model. Let me give a very simple example. If we just have a model with edges in it, and clearly all the edges are independent of each other we have essentially got replication of edges. If we put a model with two stars in it, then if we look at other forms, you look at all the possible configurations which are two stars, then the configurations which aren’t of that form then give us a replication over then to look at the properties of the two stars involved. But note that that’s very dependent upon the form so that’s a partial answer to the question. I think the other answers are important as well. How do we specify this model if it really matters how the form actually is? A natural answer to that, which is incomplete, is if you had replications of these graphs, the classic, purely independent forms would apply, and then we could work with them, and that would be one solution. The other answer I think is it’s worth stepping back and asking ourselves whether to have pure independence is required. I’ll just remind us that every spatial process has exactly the same issues. All the spatial statistics deal with spatial processes which are dependent across their full domain. I think a spatial lattice process, continuous field processes all have dependence, and you only have a single realization. So, this is more a property of dependent models rather than social network models on this particular model class. Coming back to Peter’s last point, model specification is extremely difficult here, because you are using the model specification to also find misspecification in that model. And I strongly emphasize the use of the goodness of fit methods that Martina described. And why I think this is very helpful is that if you have a statistic which is not in a model, not in your specification, and you see who well the model actually can reconstruct that somewhat separate, although obviously typically dependent statistic, and that gives you a way of just seeing how it does represent properties you have not explicitly placed in your model. The other thing is just a standard statistical approach of doing exact testing, based on this model, is of a similar ilk, where you can do exact testing to look for model misspecification forms. I think Peter has raised a good point here. I have been very critical of these models in the past for the reasons I have said, and I’m actually using them. I’ll leave it at that. 373

REFERENCES Barndorff-Nielsen, Ole. 1978. Comments on Paper by B. Efron and D.V. Hinkley. Biometrika 65(3):483. Besag, J.E. 1974. Spatial interaction and the statistical analysis of lattice systems (with Discussion). Journal of the Royal Statistical Society, Series B 36:192-236. Cressie, N. A two-dimensional random walk in the presence of a partially reflecting barrier. Journal of Applied Probability 11:199-205. Doreian, P., and Frans Stokman. 1997. Evolution of Social Networks. Amsterdam, The Netherlands: Overseas Publishers Association. Frank, Ove, and D. Strauss. 1986. Markov Graphs. Journal of the American Statistical Association 81(395):832-842. Holland, P.W., and S. Leinhardt. 1981. An Exponential Family of Probability Distributions for Directed Graphs. Journal of the American Statistical Association 76(373):33-50. Lauritzen, S.L., and D.J. Spiegelhalter. 1988. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, Series B 50:157-224. Leinhardt, Samuel. 1977. Social Networks: A Developing Paradigm. Burlington, Maine: Academic Press. Morris, Martina. 2004. Network Epidemiology: A Handbook for Survey Design and Data Collection. London: Oxford University Press. Newman, M.E.J. 2003. The structure and function of complex networks. SIAM Review 167-256. Wasserman, Stanley, and Katherine Faust. 1994. Social Network Analysis: Methods and Applications. Cambridge: U.K.: Cambridge University Press. 374

Next: Visualization and Scalability--Characterizing Brain Networks with Granger Causality--Mingzhou Ding, University of Florida »
Proceedings of a Workshop on Statistics on Networks (CD-ROM) Get This Book
×
Buy Cd-rom | $123.00 Buy Ebook | $99.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

A large number of biological, physical, and social systems contain complex networks. Knowledge about how these networks operate is critical for advancing a more general understanding of network behavior. To this end, each of these disciplines has created different kinds of statistical theory for inference on network data. To help stimulate further progress in the field of statistical inference on network data, the NRC sponsored a workshop that brought together researchers who are dealing with network data in different contexts. This book - which is available on CD only - contains the text of the 18 workshop presentations. The presentations focused on five major areas of research: network models, dynamic networks, data and measurement on networks, robustness and fragility of networks, and visualization and scalability of networks.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!