Skip to main content

Currently Skimming:

Network Data and Models--Martina Morris, University of Washington
Pages 226-253

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 226...
... I am closer to the biologists since I put the people up front. We have a large group of people working on network sampling and network modeling -- Steve Goodreau, Mark Handcock and myself at the University of Washington; Dave Hunter and Jim Moody, who are both here; Rich Rothenberg, who is an M.D., Ph.D.; Tom Snijders who some of you know, is a network statistician; Phillipa Pattison and Garry Robbins from Melbourne have also done a lot of work on networks over the year, and then a gaggle of grad students that come from lots of different disciplines as well.
From page 227...
... There are clearly properties of the transmission system that you need to think about, which include biological properties; heterogeneity, which is the distribution of attributes of the nodes, the persons, but also infectivity and susceptibility in the host population. There are multiple time scales to consider, and time scales are something that we haven't talked about much here, but I think are very important in this context.
From page 228...
... In fact, you can generate connectivity in very-lowdegree networks as well, as shown in Figure 2. Your simple square dance circle is a completely connected population where everybody only has two partners, so it is important to recognize that there are lots of other ways that you can generate connectivity, and that even if one person, for example, did act as a hub, and you figured that out somehow and you removed them, you would still have a connected population.
From page 229...
... It is interesting that in popular language we have both of those ideas already ensconced in a little aphorism. So, thinking about partnership dynamics and the timing and sequence and why that would matter, one of the things that we have started looking at in the field of networks and HIV is the role that concurrent partnerships play.
From page 230...
... You can see in Figure 4, in the long yellow horizontal line labeled "1", we have got somebody who has a partnership all the way across the time interval, and then a series of concurrent partnerships with that, including a little one night stand labeled "5" that makes three of them concurrent at some point in time. Now, this is the same number of partners as in the upper graph, so it is not about multiple partners, although you do need to have multiple partners to have concurrent partners.
From page 231...
... It is an exponential random graph model. It basically takes the probability of observing a network or a graph, a set of relationships, as a function of something that looks a little bit like an exponentiated version of a linear model, and then a normalizing constant down below that is all possible graphs of that size.
From page 232...
... Almost anything you can think of in terms of a configuration of dyads can be represented as a network statistic. Then you have the parameter θ, and your choice there is whether you want to impose homogeneity constraints, and I believe Peter Hoff talked about this a little bit in his talk this morning.
From page 233...
... Referring to Figure 6, the vector here can range from a really minimal number one, such as the number of edges, which is the simple Bernoulli model for a random graph. But the vector and also be a saturated model, with one term for every dyad, which is a large number of terms.
From page 234...
... The parametric forms are usually used to parsimoniously represent configuration distributions so degree distributions, shared partner distributions, and things like that.
From page 235...
... that we are going to be trying to maximize. The normalizing constant c makes direct maximum likelihood estimation of that θ vector impossible because, even with 50 nodes, there are an almost uncountable number of graphs.
From page 236...
... We are going to set the density to be about 4 percent, which is about 50 edges for a Bernoulli graph. The expected clustering, if this were a simple random graph, would then just be 3.8 percent, but let's give it some more clustering.
From page 237...
... One thing we can say now pretty clearly is that statistics for endogenous processes -- so, like this friend-of-a-friend stuff, the clustering parameter -- need to be handled differently, because they induce very large feedback effects. A little bit of that and all of a sudden the whole graph goes loopy.
From page 238...
... It tends to create reasonably nice, smooth parametric distributions for shared partner statistics. In the time I have left I want to give you a sense of how these models work in practice, and how friendly they look when you get them to actually estimate things well.
From page 239...
... We are going to try four models: edges alone (this is our simple random graph) ; edges plus attributes, which says the only thing going on in this model is that people are choosing others like themselves; edges plus this weighted edge-wise shared partner statistic, which is transitivity only, so just the friend-of-a-friend process; and then both of these things together.
From page 240...
... We can then simulate graphs with those properties, those particular coefficients, and those statistics, drawn from a probability distribution. We are going to compare the graph statistics from the simulated data to the graph statistics from our observed data, but the graph statistics that we are going to use are not actually statistics that were in the model.
From page 241...
... Students were only allowed to nominate their five best male and five best female friends. So, it is truncated at 10.
From page 242...
... FIGURE 16 Figure 16 shows the edgewise shared-partner statistic. This is, for every edge in the graph, how many shared partners do they have.
From page 243...
... FIGURE 17 Finally, Figure 18 shows the minimum geodesic distance between all pairs, with a certain fraction of them here being unreachable. FIGURE 18 Figure 19 shows how the Bernoulli model does, and it doesn't do very well.
From page 244...
... The first column shows the degree distribution; even a Bernoulli model is pretty good. Adding attributes doesn't get you much, but adding the shared partners, you get it exactly right on, and the same is true when you add both the shared partner and the attributes.
From page 245...
... They are actually getting the structure from the eyeball perspective pretty well.
From page 246...
... FIGURE 22 There are 50 schools from which we can draw information. There are actually more, but 50 that have good information.
From page 247...
... FIGURE 23 The other thing you can do is to compare the parameter estimates across the models, which is really nice. In Figure 24, we look at 59 schools, using the attribute-only model.
From page 248...
... The race homophily usually falls, but actually sometimes rises, so once you account for transitivity you find that the race effects are actually even stronger than you would have expected with just the homophily model. This is shown in Figure 26.
From page 250...
... For the cross-sectional snapshots, this is a direct result of the fitting procedure. For dynamic stationary networks, it is based on a modified MCMC algorithm, and dynamic evolving networks means you have to have model terms for how that evolution proceeds.
From page 251...
... What I am doing is proposing certain rules about how people choose friends. So, I choose friends because I tend to like people the same age as me, the same race as me, the same socioeconomic status.
From page 252...
... What made this work was the edgewise shared partner. When we had originally tried using the local clustering term as either the clustering coefficient or the number of triangles with just a straight theta on it, those are degenerate models.
From page 253...
... They do a lot of heavy lifting in these models and they actually explain, I think, a fair amount. I would call it model degeneracy only in this case because you get an estimate and you might not even realize it was wrong.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.