National Academies Press: OpenBook

Proceedings of a Workshop on Statistics on Networks (CD-ROM) (2007)

Chapter: Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University

« Previous: Visualization and Scalability--Characterizing Brain Networks with Granger Causality--Mingzhou Ding, University of Florida
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 396
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 397
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 398
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 399
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 400
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 401
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 402
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 403
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 404
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 405
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 406
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 407
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 408
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 409
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 410
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 411
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 412
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 413
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 414
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 415
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 416
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 417
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 418
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 419
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 420
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 421
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 422
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 423
Suggested Citation:"Visualization and Variation: Tracking Complex Networks Across Time and Space--Jon Kleinberg, Cornell University." National Research Council. 2007. Proceedings of a Workshop on Statistics on Networks (CD-ROM). Washington, DC: The National Academies Press. doi: 10.17226/12083.
×
Page 424

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Visualization and Variation: Tracking Complex Networks Across Time and Space Jon Kleinberg, Cornell University DR. KLEINBERG: Thanks very much to the organizing committee for inviting me to come speak. This has been a very interesting workshop for me, especially seeing how the techniques across all these different areas make use of common principles. Obviously there is a lot we are going to have to think about afterwards. I’m a computer scientist, so I’m coming at this from a computer science perspective. In particular, the kinds of networks that I’m interested in are the large information networks that are built up on top of the applications on the Internet, starting with things like the World Wide Web and spilling over into networks that affect the way we communicate through things like email and instant messaging, through the way we express interest in hobbies through online communities like Live Journal and so forth, to the way we do electronic commerce and so forth, as indicated in Figure 1. FIGURE 1 There is this whole, what some people call “speciation,” event on the Internet where we see all these new forms of media springing up very suddenly, looking at the emergence of not just the technological structure here, but also the social structure. This notion that there are interesting social structures going on in these online applications is something that has interested 396

me for quite a while, starting back with some work I did on the World Wide Web where the question was how to use the linked structure of the Web, and the fact that these are documents authored by people, to help improve a search. This is something that combines the social and technology aspect very clearly, although I haven’t always been so good at conveying this. I have been at workshops of this flavor for some time. A few years ago I gave a talk in a very early workshop that brought together some computer scientists and sociologists. We talked about links on the Web and kept casually referring to the Web as a social network without really trying to unpack what I was really saying. This was at the Santa Fe Institute, and during the break some of us were standing out on the patio looking out at this beautiful view of Santa Fe when a very eminent sociologist came up next to me smoking a cigar and asked, “Did you see that talk by the guy from Cornell? That was terrible. He thought he was talking about social networks, and all he was talking about was how Google worked.” After explaining that I was that guy from Cornell, we actually had a reasonable conversation under the circumstances. That was what I view as one of the early steps in learning about some of the cultural gaps that exist between these different areas. I think more than anything that what I have done, or social scientists have done, or what these workshops like this have accomplished, has simply been the unbelievable perfusion of these online media, which has just made it abundantly clear that all of these technological things we are seeing are in the end about content authored by people and people communicating with each other. In reality it’s not so much that they both exist but that they are really intertwined. You can’t disentangle the technological networks from the social networks anymore. If I were to ask what are the networks and which email communication, instant messaging, or blogging or any of these other things take place, sure, your packets are traveling over the Internet, but if you are trying to configure a new instant messaging system you have to know a lot about the kinds of people that are going to be using it. You have to know the diurnal rhythms of them and whether it’s going to be used by teenagers or by grown-ups and so forth. All this affects the way in which we configure the technology, so in a real sense these things are inextricable at this point. A particular thing I wanted to focus on here is in the context of these social and technological networks, the way in which they are embedded in space, embedded in time, and embedded in surrounding organizational structures. One thing that I wanted to get at is that we shouldn’t just view the networks in isolation, but to think about them as being physically embedded as concretely as we view networks in neuron-anatomy as being physically embedded. I’m going to be talking about network models, as have most of the other talks at this 397

workshop, reflecting our interest in the structural properties of networks, but also particularly for me, in the processes unfolding on networks, the algorithmic processes that unfold on networks, and trying to find models that relate these to each other. In the end, hopefully, even very simple toy models—and in this talk I’ll try to go from a progression of toy models, up through more and more general models—they don’t just help us form a vocabulary of patterns, although that is certainly a very useful thing, but also in the computer science context, it’s often very surprising to me how quickly those turn into design principles. In the online world the ability to create new online media like peer-to-peer file sharing gives you a certain kind of flexibility such that things that had seemed like toy artificial models can very rapidly become principles by which you actually architect a system. That is one of the interesting things about this domain also. I also wanted to mention one thing that has been lurking around the surface of some of the talks, and that is the social science component. It is the notion of working with model systems. I think, especially for the statistics aspect of this workshop, this is going to be something important to think about. So, if I could caricature the traditional research obstacle in studying complex networks, it would be that you’d like your networks to be large-scale, realistic, and completely mapped, but in actuality you can only achieve any two of those. For obvious reasons it is very hard to achieve all three at once. FIGURE 2 So, just as Mark Handcock stole a slide from Martina Morris, I’ll steal a slide from Mark. 398

Figure 2 shows his Zachary Karate Club Network, and the only thing I want to point out here is that this is clearly a real network of real people, and it’s completely mapped. We know a lot about it, but the nature of how data like that are collected inherently limits the size. In his after- dinner talk last night, Steve Borgatti mentioned that a trend over the last few years has been to see networks go from sizes in the hundreds to sizes in the ten thousands or hundred thousands. In the process of doing that, we have to ask what are we losing? We are obviously gaining a lot. Many of these systems are what I would call online model systems. When I say model systems, I don’t mean that there is anything fake or synthetic about them. All of the people involved in the activities listed on Figure 2 are real people engaged in social interaction. It’s simply that we don’t know as much about them as we might have known about these smaller more self-contained systems such as the Karate Club Network. It’s hard to draw their boundaries; I’m thinking about things like the links among Web pages or Web blogs in the communities of bloggers and how they interact—social networking services like Live Journal or Orchid or any of the other things, which are actually trying to create a social network on their members. Email and instant messaging within organizations define network interaction. For online e-commerce sites, there has been some amount of interesting research on simply the behavior of people at these sites, because of course there is still the interaction of product reviews, forum discussions, replies, posts, and so forth, and then the download behavior of these people. We can then pick—this is a line of work that Mark really stimulated thinking—something like arXiv, which is a model system of, in this case, high-energy physicists, all publishing papers, co- authoring, citing each other, and so forth. It’s a very rich domain in which we can try studying things on a very large scale. It was this, for example, that caused three of us at Cornell—Paul Ginsparg, the author of the arXiv, Johannes Gehrke, and myself—to choose this as a data set for the annual KDD Cup Competition, basically saying here is a very rich system in which you can do data analysis in all sorts of dimensions, looking at real social data. There are also things that we would like to know about the people in this network that we don’t, but for looking at very-large-scale data, so far this is the best we can do. Figure 3 gives an overview of my talk. The hope was to show how all these themes can be traced out through a single particular topic, small-world networks and decentralized search. I could have easily picked several other topics. 399

FIGURE 3 We’ll start with an initial model, the small-world model of Watts and Strogatz, which many of you know, and then we’ll move from that, which is really a structural model, looking at the structure of the network, to a model about processes on top of the network. In particular, the problem of searchability of networks is really more about the process that operates on the network. We’ll start with a very simple model of that. I’ll try to pull back to create more general models. In particular, by abstracting a general pattern that’s going on here, we’ll start looking at both data and design principles. Then we’ll try to find that pattern in large-scale network data. We’ll see some rather surprising ways in which it has shown up in hyperlinks on the Web, email communication, and friendships on these online communities, and also how it’s been used as a design principle in things like peer-to-peer systems. And finally, some recent work I’ve been doing with Faloutsos and Yuri Leskovec looking at how well these networks evolve over time, and some things that actually completely challenge the conventional wisdom we had had on this. And necessarily, since my time is short, I will go through all of these rather briefly. 400

FIGURE 4 To introduce small-world networks, rather than reach back to the Watts-Strogatz model, I wanted to reach much further back to the original work in social psychology that stimulated a lot of this, Stanley Milgram’s small world experiments. Both models are summarized in Figure 4. To briefly recap Milgram’s experiment for those of you who haven’t seen it before, the idea was to trace out short paths in a social network, and particularly the network of acquaintanceships in the United States. He picked a target person who his subject was supposed to reach, who lived in a suburb of Boston, and he picked a whole bunch of starting people, mainly drawing them from out in the Midwest. Each starter person would start with a letter that he had to forward to this target, and rather than mail it directly to them, they had to use what they were told about the target and mail it to one of their friends with the goal of reaching the target in as few steps as possible. Those were the rules of the experiment. You get a letter, you're trying to reach a stockbroker in Sharon, Massachusetts, and you think about the things that you might know about them and you know about your friends. You figure out who you should pass it to, and the famous outcome of this was that the median number of steps in the completed chains was six, a number that has entered into the phrase six degrees of separation. There are a few things going on with this experiment, and I don’t have time to unravel them all, but one is clearly the structural issue. Why is it that our social networks in which our friends all seem to know each other, which as a number of the talks have said before looks fairly clustered, why should it have these short chains? This highly influential Nature paper of Watts and Strogatz in 1998 can really be viewed as a proposal for a kind of network that we ought to be 401

studying. More than any particular result, it really said let’s think about networks as superpositions; the notion of a random network embedded in some reference frame, in this case maybe a spatial reference frame. To take one of the flavors of their model, we can imagine that in a very simple version we have a grid with nodes, and these might represent the people in the social network, knowing something about their geographic coordinates. We then add random links, even as few as one per node; an example is shown in red in Figure 4. These random links are enough to make the diameter of the world very small, even while locally it looks very clustered. So, adding a very small number of random links, the network still looks relatively grid-like, even though we now know that its diameter has become very small. We can actually prove things like that using results that have been known for some time in the area of random graphs, in particular the results of Bollobás and Chung (1988), which studied almost exactly this network. So this was the idea, think of these networks as a superposition or, for our purposes, a random network embedded in some reference frame, be it spatial, organizational, or what have you. When I first came across the Watts-Strogatz work—which wasn’t hard, because they were also at Cornell at the time—the thing that really struck me about the original experiment was not just the structure. It was not just about the fact that there were these short chains, but it was about a process. As we teach our undergraduates in computer science classes, if you want to find the shortest path in a network you shouldn’t have done this experiment. You should have given a letter to someone and said, send this to all your friends. Tell them to send it to all their friends, and eventually you will see what the shortest chain to the target is. You can think about the obvious reasons why Milgram wasn’t able to do that experiment, therefore, he was forced to embark on this much more interesting one where you actually have to tunnel your way, you have make your best guess over the next hop. Essentially, what he was doing was decentralized routing. Decentralized routing is a completely non-technological context on the social network of people. That’s the first thing I would like to try modeling here. So, we would like to model a decentralized algorithm looking for a path through a network. Figure 5 describes a very basic model: imagine there is a current message holder who knows the message, the grid structure, their coordinates on the grid, and the coordinates of the target on the grid. They don’t know all the random links, but they know their own random links. So, this simply says when you get the letter, you know that you are living wherever you live. You know the target is in Sharon, Massachusetts, and you know you have friends in New Jersey and Montana and Singapore and so forth, and you want to try to figure out how best to direct the letter. That’s the idea. That’s why 402

it is decentralized. FIGURE 5 The gold standard here, which is going to pop up through a lot of the results, is this difference between things that grow logarithmically in n versus things that grow linearly in n; or more generally, things that grow like polynomials in log n versus polynomials in n. The idea is that the network has n × n nodes, but we want to be exponentially better. We want to be something like log n or log n squared in our delivery time. That would somehow tell us that the world is small. We can get there in a short number of steps. The funny thing here is that, although one could have asked this question immediately upon seeing the Milgram experiment, the Watts-Strogatz model, although not intended for this purpose, is really ideal as a kind of minimal model you would need if you wanted to ask this question. If you want to ask this question in an interesting way, what are the minimal ingredients you need? Well, you need a network that is partially known and partially unknown. A completely known network of course leads to the shortest-path algorithms, but that’s a very different subject. Finding you way in a known network—that’s basically the Mapquest problem. A completely unknown network is equally different. Then we have nothing to go on. In the known part you have high diameter, or else it would be too easy. For the full networks you have low diameters, so there is a solution to find. And we want to explore this relation between the 403

known and the unknown and the unknown part. How does the random network relate to the way in which it’s embedded in the spatial or organization structure or whatever? Again, if you are worried that this model is too simple, obviously, in a million ways it’s too simple. Obviously, to the extent that the grid in Figure 4 is a proxy for anything, maybe it’s a proxy for geographic information. We know people in the Milgram experiment also used occupational information and other things. I’ll try folding some of these things in, but I want to start with an intentionally very simple model, as shown in Figure 6. FIGURE 6 The Watts-Strogatz model is a bit too simple. So, the first theorem you can prove here is a negative result, an impossibility result which says that the network has n × n nodes. There really is a path of length of log n. There was a solution to find, but probably with local information you can’t find it. Any decentralized algorithm, however hard it computes, simply its lack of information is going to mean you need about at least n to the ⅔ steps to find your way, and ⅔ is actually the right answer here. You can also achieve this, but this is exponentially larger than they want. So, that’s what happens when you have a simple toy model. You have this striking phenomenon, and try analyzing it, and sometimes there is not enough in there. So, let’s try generalizing the model just a little bit, just minimally to try to do something. We could add one more knob we could turn, and I’ll call this a clustering exponent r. I’ll still add 404

a random long-range link out of each node, as you see in Figure 6, but it’s going to be tuned by this parameter r, which basically says so if I’m node v, I don’t actually choose my long-range link uniformly at random, but rather, I take into account the spatial structure. I choose w as the other end, with probability something like lattice distance to the minus r. So, think about the effect r has: when r = 0, that’s the uniform distribution and I’m back to the Watts-Strogatz model. As I crank r way up to some large number, I go from the network on the left in Figure 6 to the one on the right, where long links are very unlikely. Therefore, the network goes back to having a large diameter. My long-range links are not really helping me that much. I’m in this funny spectrum here where I know that when r = 0, the network is too confusing. There were the short paths, but I couldn’t find them, and when r is very big, there are no short paths to find. The question is what happens in this middle region as we trace it out. At least we have a 1-dimensional family we can study. The answer is that something kind of funny happens. There actually is an optimal operating point for this network as shown in Figure 7, and that’s when r = 2. So when I choose links with probability according to an inverse square distribution, I can actually achieve this gold standard of polynomial log squared n. As I move away from r = 2 in either direction, things get worse rapidly, at least in an asymptotic sense. Things become n to some power, where the power depends on how far away r is from 2. Two was somehow the right operating point for this network. FIGURE 7 405

Let’s think about that. Figure 8 gives some idea of how you prove something like that, and then we can go on to how we might extract out what we have actually learned from this beyond the fact that we are on a grid, beyond the fact that we have one long-range link, and so forth. But even in this very simple system there is obviously something to be proven. Referring to Figure 8, let me tell you how to prove the positive result with exponent 2, and I’ll skip the negative results for the sake of time. The basic idea is that I say I’m trying to get to this target t and I’m at this node v. Rather than try to trace analyze the whole sequence of steps from here on, I want to ask a simpler question: What’s the probability that in this very step I have my distance to the target? To have my distance to the target I have to go from d to jump into this ball of radius d/2. You can basically compute the probability mass in this ball and show that it’s about 1/log n, independent of d. The d cancels out, which is the key here. FIGURE 8 Basically, with 1/log n probability you have your distance to the target right away. Therefore, every log n steps or so you expect to halve your distance. The distance can only be halved log n times, and so the delivery time is proportional to log n times log n. That’s the one- slide outline. My interest here is to try figure out whether we can conclude something more 406

generally about networks, since networks aren’t n × n grids with random jumps and so forth. What is actually going on? Somehow, it was key that d was independent of your probability of halving. But somehow d just washed out completely. It had something to do with the fact that no matter what distance scale you were at, you could be sure of halving your distance, or expect to halve your distance. Think, somehow about this result qualitatively. Let’s think about these exponentially layered distance scales around the nodes. All nodes that are within distance 1-2, 2- 4, 4-8, 8-16 and all these powers of 2. There are log n different distance scales in the network, and the great thing about r = 2 or, if I were in k dimensions, r = k—somehow it’s the dimension that matters here—is that nodes have about the same portion of links to each distance scale. You have about the same probability of a link to somebody very close as to one moderately close or one somewhat far away, very far away, and so forth. That was actually in one sentence what was going on in the proof. In the early version of this talk I had some hand-drawn picture trying to illustrate distance scales. And then I realized there are people who have done this much better than I have. For example, Sol Steinberg with his well-known 1974 New Yorker cover that was designed to show something else illustrates this perfectly. We devote the same amount of attention to 9th Avenue as we do to the rest of our neighborhood in Manhattan, as we do to all of New York City, to all the rest of the United States and of the world. These different layers of distances all occupying about the same amount of probability mass, and that’s what is going on qualitatively. Now, why is it useful to observe that? Well, that doesn’t help us prove anything new, but it does help us go out and look for other models where we might see that happening. For example, we quickly conclude that the grid wasn’t really very important to this behavior. The grid just gave us a system of balls, centered at points that had a certain way in which they grew in area. I could just as easily put the nodes on a hierarchy, and this is something that people ask about, because we also think about the world hierarchically. We even think about people hierarchically as we break them down, say, by occupation. It’s fine to think about that kind of a model. So, now nodes would reside at the leaves of this hierarchy, as shown in Figure 9. How is each node going to generate its random links? Here is a way of generating random links: first pick an ancestor at random in the tree, look at my path to the root and some distribution on the ancestors, and then pick a node uniformly at random. Suppose that’s my generation process. Then the analog of searchability is explored in a 2002 paper of mine and also in a paper of Duncan Watts, Peter Dodds, and Mark Newman (2002), which also considered the case of multiple hierarchies. You get the best navigation in this case. Again, you have this uniformity over distance scales, which here would mean I should choose my ancestor uniformly 407

at random in this generation process. In a way, for hierarchies, the result is much more intuitive than this sort of mysterious inverse square law that you then have to explain. Here it makes sense that if I want to have good searchability to any target, I should be able to funnel down to any level of the tree as the search proceeds. If I’m trying to make my way over to some node over here, I first have to span this barrier, which requires a really long link. Once I’m into the subtree, I have to span the next level barrier, and I need to be able to do this at every level. This is actually the first indication that you could potentially look at real measurements doing that. FIGURE 9 So, with the initial searchability result, you had the grid. You had these long-range links, and people said, okay, take searching the Web. You are browsing on the Web with local information. Where is the grid? Where are the random links? And the answer was, I don’t know. Once you have a hierarchy, you can actually do things like take a hierarchical classification of Web pages. This is something that was done by Menczer at the University of Indiana. To look at how links are embedded in the structure, he took the open directory which classifies Web pages hierarchically (like on science/computer science/algorithms, biology/neuroscience, or art/music/Italian opera). Right now it’s not a spatial structure, but a 408

taxonomical structure, and you could say if I actually wanted to do focused Web crawling, if I wanted to get a page on Italian opera, then as I look around at my links I should try to find ones that lead to pages on art. Once I’m within there, I assume there is going to be a slightly higher density of links into music, from there a higher density into opera and so forth. Not only did he partially validate this density of links decaying with height in the tree, but it also became the basis for some focused Web crawling experiments. Again, an example of the turnaround of seemingly simple models rapidly becoming design principles in this case for something like focused Web crawling. There are a bunch of other proposed models here. For example Duncan, Peter and Mark had this model where you actually had several hierarchies. One hierarchy could be, as in the case of the Milgram experiment, geography. Another hierarchy could be occupation. You can choose among these different hierarchies, and there were some other proposed models. Maybe the right model for social networks is a hierarchy on top of a lattice. At some point you think that, if these two results have very similar proofs, there must be some kind of more general statement that covers all of these. Figure 10 is actually a proposal for something that simultaneously includes searchability in grids, searchability in hierarchies, and a bunch of other things. In the end you could say nodes are embedded in some structure, some reference frame, unless it’s imagined now as a collection of finite sets. Nodes form links because they belong to certain groups: I’m friends with you because we live on the same block, or because we work in the same field, or because we are fans of the same science fiction author. They all belong to some overlapping collection of groups. 409

FIGURE 10 If I were just in a lattice the groups could be balls, essentially of points. In a hierarchy, they would be the subtrees. In any set of groups, if you just draw some red dots as the nodes and some blue circles as the groups around them, I can generate a network by saying let v link to node w with probability proportional to 1/kr. Here r is my exponent again, and k is the size of the smallest group that contains both v and w. That’s now going to be my very general proxy for distance here. It’s somehow generalizing distance in a grid, and also least common ancestor height in a hierarchy. Simply how far apart you are is the smallest group you share. Sure, we both live in the United States, but the real point is that we are both fans of the same obscure author, and that’s really the smallest group. You can’t take any set system and have this work. Subject to some technical conditions that include all these cases, often searchability is achieved when the exponent is 1, which is one very general way to see where this inverse square law is coming from also, because if I were on a grid and the groups were all the balls centered at points, then if you are at distance d from me, what’s the smallest group containing both of those? How big is it? If you are distance d, then because area grows like the square, it’s d2. So, 1 over group size to the first power is 1 over d2 which gives us back the inverse square law. This is a way to see all those results under one common rubric, as in searchable networks can be achieved if you link with probably 1 over the size of the smallest group containing them. We now have a bunch of general kinds of results and now a few things happen in different directions. We have both a pattern that we can look for; we 410

also potentially have a proposal for ways to design networks. Searching for a file with the original Napster was the equivalent of sending a letter not by four different acquaintances, but by using the U.S. postal system. It was a central index that knew everything, and you just asked it where the file is. But as noted in Figure 11, once Napster went away we were plunged from the world of the postal system or of directory assistance into the world of the Milgram experiment: if you want to find something, there is no central directory, you have to ask your network neighbors, who in turn ask their network neighbors, and maybe you find your MP3 file and maybe you don’t. In fact, a lot of the early debates about things about Gnutella versus Freenet versus a lot of these early systems look very similar to different ways you might have run that experiment. For example, comparing Gnutella’s flooding solution and Freenet’s targeted solution, to take two of the very earliest ones of these decentralized things, is analogous to comparing a chain letter to the Milgram experiment. One is much more likely to get you a target, assuming it scales, but it scales very badly. Freenet was able to take advantage of some of the earliest of these results by configuring itself to look like a searchable small world. More recent work has actually tried to do this even more explicitly. FIGURE 11 The online world gives us so much flexibility that there is no need to care that we don’t live on an n × n grid with uniform spacing and random links. Once we build an overlay network on the Internet—as in CAN, for example—if you want the world to look like an n × n grid with 411

uniform spacing, you simply take all your nodes include in the protocol that each will be assigned two coordinates. That’s the node’s position in the grid, and it can then generate some random links. For a lot of these peer-to-peer systems, it became remarkably easy to generate searchable networks explicitly by embedding them in some kind of Cartesian space, and some of these use these small world models quite explicitly. That’s all I’m going to say about this as a design principle here. FIGURE 12 Let me go back to data. About a year and a half back there was this nice study by Lada Adamic and Eytan Adar where they looked at a particular surrogate social network, shown in Figure 12, on 436 researchers at Hewlett-Packard Labs in the Bay area. They simply asked Hewlett-Packard (HP) to contribute suitably anonymized logs of who communicate with whom by e-mail. They then built a network joining pairs of people who exchange at least 6 emails each way. That’s our proxy for having a link on the network shown. Now, the question was what should we embed this into? There is a very natural embedding for the social structure of HP Labs, the organizational hierarchy. The lab director, area managers, and lower managers can be seen in the graph in Figure 12. Most organizations would look something like this. Because we have these more general ways of thinking about searchability, we can now ask how well this 412

maps onto some of these models of a searchable network. I can ask if g(x,y) is the size of the smallest group containing nodes x and y, how does the density of links scale as I go up the hierarchy? As we know, optimal searchability should be one over group size. When I first saw they were doing this, I was a little concerned because it wasn’t clear to me they would go down with group. That was the hope, but in fact while it doesn’t give us g −1 , it gives us something that is in the ballpark: it gives us density of link scaling as g −3 / 4 . I don’t want to conclude too much from this, because this is a single group of 436 people. They fit something as best they could here, and of course this network was not optimized for searchability. HP Labs didn’t say let’s set up everything so that we can do this Milgram experiment as efficiently as possible, although you can make qualitative arguments about why that might be a good thing for your organization. In fact, it says that the organization has something of that character, so it does give us a very general way of talking about how frequency of communication scales as you kind of move up layers, as you have to span larger and larger layers of an organization. Around this point in time there were these peer-to-peer networks. There was the open directory. There was actually something like a social network, and I began thinking it’s nice that this pattern is appearing in real data. In the end the grid was just a metaphor. The grid with the random long-jumping links was just a way to get us thinking about this, but it was really about hierarchies and groups. There really was no grid, right? That, too, was not quite right. There was a nice experiment done in the past year by David Liben-Nowell, my former student at Cornell, who is now at Carleton College, and a bunch of colleagues of mine: Robby Kumar, Jasmine Novak, and Andrew Tomkins, most of who are now at Yahoo. They looked for how links span actual geography. Remember that question, where was the grid, where are the long-range links? Well, for that you would need a large network in which people have somehow identified their friends and provided you geographic location information, which five years ago seemed way too much to hope for, but which we now actually have in things like LiveJournal. So, you go into LiveJournal, which is an online community mainly populated by teenagers, and it’s very easy to write a crawler that collects names, zip codes, and links. This is actually out there in public. So, as shown in Figure 13, you collect 500,000 members of LiveJournal with U.S. zip codes, giving you about 4 million links. The average number of friendships declared here is about 8. You may ask the question how it decays with distance. This gets you remarkably close; somehow you are still one step short of the goal, because the inverse square law was about the growth rate of balls, and the growth rate of balls was really on this assumption that the density is uniform. Within distance d, I have d2 nodes no matter where I am. You look at the scatter plot of who is using LiveJournal and you see it’s very non-uniform. The 413

first thing I thought when I saw the scatter plot was isn’t that weird, no one out west is using LiveJournal. But actually this is the same thing you would see if I just plotted the population density of the United States. So, they are indistinguishable at the level of scatter plot. FIGURE 13 LiveJournal is sampling pretty broadly, but it’s very non-uniform. The growth rate of the ball of somebody, whoever is using it out here in eastern Nevada, is going to grow very gradually compared with someone who is using it on a particular block in Manhattan. What do we do about that? One thing we do is use groups, because groups didn’t require a grid. Liben- Nowell et al. simply said the groups are the first some number of people within some distance of me, all the balls centered on all the points. They took something that was slightly easier to analyze in their context. It’s very similar; it’s what they called rank-based friendship. Node v simply rank orders all their potential friends by everyone in the world by proximity. So the center of the circle in Figure 14 is node v, and their closest people are the small circles within that circle, the diameter of which is equal to the distance to person 7; that person is ranked 7th. Then what you do is link with probability somehow decaying in the rank: I’m more likely to link to the nth most close person than the (n+1)st closest person. What they did was prove a different generalization. So, rather than generalize things into group structures, they generalized it into rank-based friendships and proved an analog of the inverse square results, as shown in Figure 14. 414

FIGURE 14 Efficient routing is possible for a nearly arbitrary population density, again, mild assumptions here, if the link of probability is decay like 1/rank. Why is this a generalization? Well, because if distance d would be ranked d2, there are d2 people close to them. This gives me the inverse square law in that special case. Now they’ve got results saying what would be optimal for searchability? They have got 500,000 people with zip codes, and they look at some data. The punch line, which made me very happy, is that LiveJournal friendships actually approximate 1/rank quite closely. Time prevents me from getting into a lot of what they did here, but these lines are basically slope 1 and 1.05, and the chart in Figure 14 is one particular view of that data. For Hewlett-Packard Labs you could have argued somebody sat down and drew up the order chart. Maybe they were thinking about this. This is weirder still; it somehow configured itself to be like this. I don’t have a good explanation for that. I should also mention that independence of searchability is a little surprising because LiveJournal is a purely virtual online community, so if someone asked me what do you expect, I would have said a mixture model. I would have said you log on and you immediately link to all your high school friends who live in the same zip code as you. So, I get a big spike there, then I meet people utterly at random, and it’s kind of like a uniform distribution on the rest. The point is that is not what happened. If you lived in Nevada, you were nonetheless more likely to link to somebody in Nebraska and 415

Wyoming than you were to someone in New York City, and that’s bizarre. There is a lot more that should be looked at in this data, so one hesitates to draw sweeping conclusions yet, but it’s very intriguing. In the final few minutes I have remaining I want to talk about not just spatial embedding, but how things have evolved over time because here, too, there tend to be some surprises that actually make us go back and think about some things we need to be looking at. The question we wanted to ask was really a very simple one, stated in Figure 15, which is take one of these large model systems—we’ll take the arXiv, because it really works very nicely for our purposes—and ask how do the degrees of separation change in the network as it grows over time? We could define degrees of separation to say the 90th percentile distance among all pair-wise distances, but we have tried other ways. Why in the world has no one studied this on some large-scale system? Well, it’s computationally hard to answer this question. You want a network you can run up to any point in time and stop it. You would like to ask this question every month or every week, and every week that means you have to do an all-pairs shortest-path problem on a 100,000-node network. This is computationally somewhat demanding, so we were actually able to take advantage of some nice stuff that one of my co-authors wrote for approximate all-pairs shortest- paths in a large graph, some stuff Faloutsos worked on to track this effective diameter in several networks over time. 416

FIGURE 15 We looked at things beyond the arXiv of this model system, so on Figure 16 you will see plots. Time runs on the x axis, and effective diameter of the network runs on the y axis. The original plan was we’ll look at this, and we’ll try to see which slowly growing function of n best fit the diameter. Was it log2 n, was it log n, or some other function? Given that was the original plan, we found these plots a bit surprising, because slowly growing functions as you know go up, and these were all going down. The point was, despite the fact that the network was growing— and some of these networks were growing at an exponential rate over time—the effective diameter, and this is basically a smooth version of the 90 percentile, which I can tell you more about offline, was going down as the network grew. The network was growing, but the degrees of separation were actually getting closer. I have a few lines here which are probably better explained in question and answer form, but basically, there is a big missing past issue here. We are coming and observing networks starting at a particular time. 417

FIGURE 16 With the arXiv, we technically have no missing past problem because we have the arXiv from its very first paper. But in a very real sense we have a missing past problem, because the archive got plopped down in the middle of a lively, thriving research community that has been exchanging pre-prints for a long time. We only started measuring it at time “zero.” One worries that what these results really show is the process of catching up with a stationary structure the network already had. I can’t say that that’s not what is going on, but one can look at the decrease in diameter in many different ways. For example, one can artificially cut off the past, any number of years you want. You can say supposed we only started measuring in 1995. That’s what the dotted line in the left-hand graph represents, and you get this initial massive swoop down as the network catches up with links that essentially were already there. That effect quickly goes away, and we still see a steady decline in diameter long after that effect appears to be negligible. One can also do things where you keep the network intact, but only start measuring—I would have to explain this more slowly—but it, too, catches up with this. So, what we seem to be seeing, long after a lot of the edge effects go away, we are still seeing this steady march downward. Maybe in retrospect I should have seen this coming, because back on Figure 5 I said our gold standard is log n. We have n nodes and we want the diameter to grow like log n. That’s sort of the graph 418

theorist’s take on what the small world property means. It means that you grow exponentially slower than the size of the network. The common sense take on what the small world property means is that even as the world is growing, there are forces out there that are causing us to become more closely connected; in other words a decreasing diameter. Intuitively, maybe this has been consistent with the common sense intuition behind the small world property all along. It was simply our random-graph background that caused us to inject this log imagery. Having said that, log n and log2 n and these slowly growing functions are a very useful modeling tool for the kinds of analyses I have been talking about up until now, which was not of a temporal trend nature but simply looking at static snapshots, and talking about asymptotics as a way of capturing these. But it says that there is a lot to do here. A related property, one reason that sort of makes it easier to believe why it might be going down is that the number of edges also decreases, so the average degree in the network is not constant. If you compare now versus 10 years ago, people cite more papers, they author more papers, they co-author with more people. Everything in the arXiv network is densifying as we go forward, and clearly, any model is going to have decreasing diameter. There are models that we have been thinking about to do this. For example, one way, at least through simulation—this seems very hard to analyze—but through simulation if I have a model where people enter a social network via some point of contact, you are introduced into the network through a sort of epidemic process in which you befriend some of that person's friends, some of their friends, and so forth. Those are your links, and now we iterate, add a new person. Through simulation, one can set the parameters so that it densifies, the diameter goes down, but it seems very hard to prove anything about that kind of model. At least there are models out that that seem plausible, that might capture this kind of thing. 419

FIGURE 17 Figure 17 presents some reflections on various things and suggests some other questions. As I said, the set of themes that I wanted to try tracing through this was somehow the way the network models, even very specific models, become more general patterns, they become design principles. In the end, with enough patience, you can actually find data sets that really start to bear some of these things out. The shrinking diameters thing is again very suggestive of more work that could be done here, and in general I think we are only starting to have the computational ability to look at very large networks and snapshot them densely over time. I think it suggests that there are more surprises coming up. One network that we would like to try understanding over time is the Web itself. There is currently a large effort going on at Cornell involving some pure scientists, and also some people in the sociology department like Michael Macy and David Strang. In particular, we are trying to get a handle on Web evolution over time and some questions we can ask them. In small world networks, there are any number of other interesting questions that one can ask, which I haven’t touched on here. One, for example, which came up in a paper a year ago by Cristos Faloutsos, Kevin McCurly, and Andrew Tomkins, was roughly the question of when are short paths meaningful. So intuitively there is this tempting connection between social networks and, for example, homeland security applications in which you discover statements like my next door neighbor once sublet an apartment from a person who took a class with the brother of a 9/11 hijacker. You read in the news that social network analysis is going to cause us to discover things 420

like this and that’s going to be very useful. Are we really lacking any sense of how you score? Is this short path meaningful? Isn’t the whole point of the small world phenomenon that I could say this about any instantiation? I would say that we are really lacking, and I think this is interesting direction for the statistics aspect of this workshop, some way of talking about when short paths in the network are meaningful, and when they simply suffer from what you could call the small world problem, that you see short paths everywhere. Another area where this blends into economics and game theory—this is some work I have been doing with Prabhakar Raghavan, who is the head of research at Yahoo, where they are very interested in social media—is the notion that a lot of these onlines like LiveJournal and Orchid and Friendster and Linked In, and you can name any number of these, they attracted a lot of excitement a year ago based on the idea that they would become big marketplaces for information and services. You join one of these, and what’s the value to you in joining something like Friendster or Link In or Orchid? Well, because maybe a year from now you are going to need to rent an apartment in Boulder, Colorado you ask your friends. You say you need to rent an apartment in Boulder, Colorado. Do you know anyone who has a room to rent? They ask their friends, and gradually an answer percolates back. Not only is it an answer, but it somehow has been vetted by your friends. That was some of the appeal of these networks, but once you start thinking about this as a marketplace for information and services, you have to ask about the incentive issues at work. If you are actually offering incentives to people, and they have to act essentially as middle men for these products and services, then when does this function efficiently? When do things break down simply because of the friction of everybody skimming off money that you offer them? It turns out there are some interesting transitional behaviors here, which relate to some things about branching processes, and some sharp transitions between networks that function efficiently in this mode and networks that don’t. There is something qualitatively important about an effective branching factor of 2, and something I can’t really get into, but there is also a lot when you think about this interface of small world networks with the sort of economic game theory take on it. In general, I think this is a window into a lot of interesting questions in these kinds of networks, a lot of which can be studied with some of the model system data that we have. We are trying to accumulate more data, and I think in general there is much to be done here. I would be happy to talk about it more offline, but for now I’ll stop here. 421

QUESTIONS AND ANSWERS DR. HANDCOCK: I have one question which has four related parts. This is coming back to the HP real world email data set. How did you fit the model? And then how did you assess the goodness of fit. Three, was there any missing data from the email responses, and how was that dealt with? And four, because it's an ad hoc model based on heuristics, how did it compare to other models that took into account the organizational structure? DR. KLEINBERG: Those are all good questions, and actually I will start with an answer which applies to all four parts, which is I didn’t do any of this, because this was actually the work of Adamic and Adar. So, I don’t want to answer in detail on their behalf, because there was thinking behind that which I don’t have access to all of. There was actually not that much missing data as far as I actually understood, because they essentially collected these from logs. They got people’s permission to collect these from logs. In terms of competing models, I would say there wasn’t really a generative network model here, so, this was not a statement about a generative network model. It was a statement about simply trying to fit a decay of linkage frequency to group size. And so then it just comes down to a question of the goodness of fit for that particular data set, for which you can consult their paper. But there wasn't an attempt to then infer from that a general model. DR. CROVELLA: I’m Mark Crovella from Boston University. I’m just sort of curious about that example. What do you think? Is it more likely that the e-mail messages are the result of the organizational structure, or the organizational structure is a result of the e-mail messages? If it’s the latter, I would be interested in the social engineering aspects of that. DR. KLEINBERG: Yes, I think that’s a very interesting question. So, this may sound like a cop out, but I don’t think it is that obvious. As we know from looking at organizations, they do change: groups get reassigned, groups split, groups merge, and that clearly is the result of increased bandwidth. So, here unfortunately, I’m not going to draw so much on social science studies since I haven’t looked that much at the organizational theory of literature. But simply suggesting that obviously groups split and merge based on increased or decreased traffic between that, and that’s a sense in which the e-mail traffic in the long-range sense is going to feed back into the organizational structure. But it’s a good question. You could also argue that both of these are consequences of some hidden variable. So, in fact, people are working on similar or different things. That influences both the organization structure you see and who communicates with whom. In a sense, that’s the single root cause of these, but I think it’s true that as that changes over time, you are going to see changes in communication patterns in the organization. 422

REFERENCES Bollobas, B., and F.R.K. Chung. 1988. “The Diameter of a Cycle Plus a Random Matching.” SIAM J. of Discrete Math 1. Faloutsos, C., K. McCurley, and A. Tomkins. 2004. “Fast Discovery of Connection Subgraphs.” Tenth ACM SIGKDD Conference. Seattle, WA. Kleinberg, J. 2000. “Navigation in a Small World.” Nature 406. Kleinberg, J. 2002. “Small-World Phenomena and the Dynamics of Information.” Advances in Neural Information Processing Systems (NIPS) 14. Kleinberg, J., and P. Raghavan. 2005. “Query Incentive Networks.” Proc. 46th IEEE Symposium on Foundations of Computer Science. Leskovec, J., J. Kleinberg, and C. Faloutsos. 2005. “Graphs Over Time: Densification Laws, Shrinking Diameters and Possible Explanations.” Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. Liben-Nowell, D., J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. 2005. “Geographic Routing in Social Networks.” Proc. Natl. Acad. Sci. USA 102. Liben-Nowell, D., J. Novak, R. Kumar, P. Raghavan, and A. Tomkins. 2005. “Geographic Routing in Social Networks.” Proc. Natl. Acad. Sci. USA 102. Malkhi, D., M. Naor, and D. Ratajczak. 2002. “Viceroy: A Scalable and Synamic Emulation of the Butterfly.” Proceedings of 21st Annual Symposium on Principles of Distributed Computing. Manku, G.S., M. Bawa, and P. Raghavan. 2003. “Symphony: Distributed Hashing in a Small World.” Proc. 4th USENIX Symposium on Internet Technologies and Systems. Menczer, F. 2002. “Growing and Navigating the Small World Web by Local Content.” Proc. Natl. Acad. Sci. USA 99(22):14014-14019. Milgram, S. 1967. “The Small World Problem.” Psychology Today 1. Ratnasamy, S., P. Francis, M. Handley, R. Karp, and S. Shenker. 2001. “A Scalable Content- Addressable Network.” Proc. ACM SIGCOMM. Rowstron, A., and P. Druschel. 2001. “Pastry: Scalable, Distributed Object Location and Routing for Large-Scale Peer-to-Peer Systems.” Proc. 18th IFIP/ACM International Conference on Distributed Systems Platforms (Middleware 2001). Stoica, I., R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan. 2001. “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications.” Proc. ACM SIGCOMM. Watts, D.J., and S.H. Strogatz. 1998. “Collective Dynamics of ‘Small-World’ Networks.” Nature 393. 423

Watts, D.J., P.S. Dodds, and M.E.J. Newman. 2002. “Identity and Search in Social Networks.” Science 296:1302-1305. Zhao, B.Y., J.D. Kubiatowicz, and A.D. Joseph. 2001. “Tapestry: An Infrastructure for Fault- Tolerant Wide-Area Location and Routing.” UC Berkeley Computer Science Division, Report No. UCB/CSD 01/1141. 424

Next: Dependency Networks for Relational Data--David Jensen, University of Massachusetts »
Proceedings of a Workshop on Statistics on Networks (CD-ROM) Get This Book
×
Buy Cd-rom | $123.00 Buy Ebook | $99.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

A large number of biological, physical, and social systems contain complex networks. Knowledge about how these networks operate is critical for advancing a more general understanding of network behavior. To this end, each of these disciplines has created different kinds of statistical theory for inference on network data. To help stimulate further progress in the field of statistical inference on network data, the NRC sponsored a workshop that brought together researchers who are dealing with network data in different contexts. This book - which is available on CD only - contains the text of the 18 workshop presentations. The presentations focused on five major areas of research: network models, dynamic networks, data and measurement on networks, robustness and fragility of networks, and visualization and scalability of networks.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!