Skip to main content

Currently Skimming:

Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University
Pages 396-424

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 396...
... I'm a computer scientist, so I'm coming at this from a computer science perspective. In particular, the kinds of networks that I'm interested in are the large information networks that are built up on top of the applications on the Internet, starting with things like the World Wide Web and spilling over into networks that affect the way we communicate through things like email and instant messaging, through the way we express interest in hobbies through online communities like Live Journal and so forth, to the way we do electronic commerce and so forth, as indicated in Figure 1.
From page 397...
... In reality it's not so much that they both exist but that they are really intertwined. You can't disentangle the technological networks from the social networks anymore.
From page 398...
... I think, especially for the statistics aspect of this workshop, this is going to be something important to think about. So, if I could caricature the traditional research obstacle in studying complex networks, it would be that you'd like your networks to be large-scale, realistic, and completely mapped, but in actuality you can only achieve any two of those.
From page 399...
... It was this, for example, that caused three of us at Cornell -- Paul Ginsparg, the author of the arXiv, Johannes Gehrke, and myself -- to choose this as a data set for the annual KDD Cup Competition, basically saying here is a very rich system in which you can do data analysis in all sorts of dimensions, looking at real social data. There are also things that we would like to know about the people in this network that we don't, but for looking at very-large-scale data, so far this is the best we can do.
From page 400...
... We'll see some rather surprising ways in which it has shown up in hyperlinks on the Web, email communication, and friendships on these online communities, and also how it's been used as a design principle in things like peer-to-peer systems. And finally, some recent work I've been doing with Faloutsos and Yuri Leskovec looking at how well these networks evolve over time, and some things that actually completely challenge the conventional wisdom we had had on this.
From page 401...
... There are a few things going on with this experiment, and I don't have time to unravel them all, but one is clearly the structural issue. Why is it that our social networks in which our friends all seem to know each other, which as a number of the talks have said before looks fairly clustered, why should it have these short chains?
From page 402...
... To take one of the flavors of their model, we can imagine that in a very simple version we have a grid with nodes, and these might represent the people in the social network, knowing something about their geographic coordinates. We then add random links, even as few as one per node; an example is shown in red in Figure 4.
From page 403...
... The funny thing here is that, although one could have asked this question immediately upon seeing the Milgram experiment, the Watts-Strogatz model, although not intended for this purpose, is really ideal as a kind of minimal model you would need if you wanted to ask this question. If you want to ask this question in an interesting way, what are the minimal ingredients you need?
From page 404...
... We know people in the Milgram experiment also used occupational information and other things. I'll try folding some of these things in, but I want to start with an intentionally very simple model, as shown in Figure 6.
From page 405...
... So when I choose links with probability according to an inverse square distribution, I can actually achieve this gold standard of polynomial log squared n. As I move away from r = 2 in either direction, things get worse rapidly, at least in an asymptotic sense.
From page 406...
... Therefore, every log n steps or so you expect to halve your distance. The distance can only be halved log n times, and so the delivery time is proportional to log n times log n.
From page 407...
... Let's think about these exponentially layered distance scales around the nodes. All nodes that are within distance 1-2, 24, 4-8, 8-16 and all these powers of 2.
From page 408...
... To look at how links are embedded in the structure, he took the open directory which classifies Web pages hierarchically (like on science/computer science/algorithms, biology/neuroscience, or art/music/Italian opera)
From page 409...
... Not only did he partially validate this density of links decaying with height in the tree, but it also became the basis for some focused Web crawling experiments. Again, an example of the turnaround of seemingly simple models rapidly becoming design principles in this case for something like focused Web crawling.
From page 410...
... So, 1 over group size to the first power is 1 over d2 which gives us back the inverse square law. This is a way to see all those results under one common rubric, as in searchable networks can be achieved if you link with probably 1 over the size of the smallest group containing them.
From page 411...
... But as noted in Figure 11, once Napster went away we were plunged from the world of the postal system or of directory assistance into the world of the Milgram experiment: if you want to find something, there is no central directory, you have to ask your network neighbors, who in turn ask their network neighbors, and maybe you find your MP3 file and maybe you don't. In fact, a lot of the early debates about things about Gnutella versus Freenet versus a lot of these early systems look very similar to different ways you might have run that experiment.
From page 412...
... For a lot of these peer-to-peer systems, it became remarkably easy to generate searchable networks explicitly by embedding them in some kind of Cartesian space, and some of these use these small world models quite explicitly. That's all I'm going to say about this as a design principle here.
From page 413...
... There was a nice experiment done in the past year by David Liben-Nowell, my former student at Cornell, who is now at Carleton College, and a bunch of colleagues of mine: Robby Kumar, Jasmine Novak, and Andrew Tomkins, most of who are now at Yahoo. They looked for how links span actual geography.
From page 414...
... So, rather than generalize things into group structures, they generalized it into rank-based friendships and proved an analog of the inverse square results, as shown in Figure 14.
From page 415...
... Well, because if distance d would be ranked d2, there are d2 people close to them. This gives me the inverse square law in that special case.
From page 416...
... You would like to ask this question every month or every week, and every week that means you have to do an all-pairs shortest-path problem on a 100,000-node network. This is computationally somewhat demanding, so we were actually able to take advantage of some nice stuff that one of my co-authors wrote for approximate all-pairs shortestpaths in a large graph, some stuff Faloutsos worked on to track this effective diameter in several networks over time.
From page 417...
... Given that was the original plan, we found these plots a bit surprising, because slowly growing functions as you know go up, and these were all going down. The point was, despite the fact that the network was growing -- and some of these networks were growing at an exponential rate over time -- the effective diameter, and this is basically a smooth version of the 90 percentile, which I can tell you more about offline, was going down as the network grew.
From page 418...
... That's what the dotted line in the left-hand graph represents, and you get this initial massive swoop down as the network catches up with links that essentially were already there. That effect quickly goes away, and we still see a steady decline in diameter long after that effect appears to be negligible.
From page 419...
... The common sense take on what the small world property means is that even as the world is growing, there are forces out there that are causing us to become more closely connected; in other words a decreasing diameter. Intuitively, maybe this has been consistent with the common sense intuition behind the small world property all along.
From page 420...
... So intuitively there is this tempting connection between social networks and, for example, homeland security applications in which you discover statements like my next door neighbor once sublet an apartment from a person who took a class with the brother of a 9/11 hijacker. You read in the news that social network analysis is going to cause us to discover things 420
From page 421...
... I would say that we are really lacking, and I think this is interesting direction for the statistics aspect of this workshop, some way of talking about when short paths in the network are meaningful, and when they simply suffer from what you could call the small world problem, that you see short paths everywhere. Another area where this blends into economics and game theory -- this is some work I have been doing with Prabhakar Raghavan, who is the head of research at Yahoo, where they are very interested in social media -- is the notion that a lot of these onlines like LiveJournal and Orchid and Friendster and Linked In, and you can name any number of these, they attracted a lot of excitement a year ago based on the idea that they would become big marketplaces for information and services.
From page 422...
... So, here unfortunately, I'm not going to draw so much on social science studies since I haven't looked that much at the organizational theory of literature. But simply suggesting that obviously groups split and merge based on increased or decreased traffic between that, and that's a sense in which the e-mail traffic in the long-range sense is going to feed back into the organizational structure.
From page 423...
... 2005. "Geographic Routing in Social Networks." Proc.
From page 424...
... 2002. "Identity and Search in Social Networks." Science 296:1302-1305.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.