Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1

Keynote Address, Day 1
1

OCR for page 1

Network Complexity and Robustness
John Doyle, California Institute of Technology
DR. DOYLE: I am going to try to set the stage for this meeting. The work that I am
going to talk about is the result of a lot of collaborations. I certainly won’t give justice to those
people here. All the good ideas and all the good work that I am going to talk about is done with
them, so when I say “we” I mean “they.” A lot of them are here today; two of them have posters
and Jean Carlson is going to talk later. Walter Willinger is here. I am going to talk a lot about
work with them.
There are buzzwords related to network complexity—network-centric, systems biology,
pervasive, embedded. What I am interested in is the core theory challenges that underlie those
common themes. I am going to be a little narrow in my focus in the sense that I am going to
concentrate on biological networks and technological networks: I want to stick with stuff where
we mostly know how the parts work, so that means not much social networks. There is a
remarkably common core of theoretical challenges, so from the math-stat side I think there are
some really common themes here. There has been recent dramatic progress in laying the
foundation, yet there has also been amazingly, at the same time, a striking increase in what I
would call unnecessary confusion. I will talk a little bit about both of these.
One of the common themes I am going to talk about is the fact that we see power laws all
over. I think a lot of people at this workshop are going to be talking about that, because it is a
common trait across all advanced technology and biology. Another common theme is that many
of these systems in biology and advanced technology are robust yet fragile. What I mean by that
is they work really well most of the time but that they fail occasionally, and when they do fail it
can be catastrophically. We will hear more about that from other speakers and poster presenters.
What I am going to do today, though, is talk about motivation for new theory and also education.
We need to educate each other about even some fairly elementary aspects of statistics and
mathematics. To get this started on the broadest level possible, I am going to stick with stuff that
only assumes the background provided by a Caltech undergraduate education.
Let’s start with some data. The figures below show the 20th century’s hundred largest
disasters worldwide. What I have plotted here are three kinds of disasters: technological disasters
in tens of billions of dollars, natural disasters in hundreds of billions of dollars (and natural
disasters can be even bigger), and power outages over some period of time and tens of millions of
customers. What you see on a log-log chart are roughly straight lines with slopes of minus one.
2

OCR for page 1

Figure 2 shows the raw data. I have just taken the event sizes and plotted them; the worst events
are along the bottom, while the more common events are higher up and to the left. The
significance of this is that the worst event is orders of magnitude worse than the median event,
which we are learning much more about this last year. It also means that demonstrations of these
behaviors (events of the type represented in the upper left corner of Figures 1 and 2) and reality
(which includes the events toward the lower right) can be very different, both in scale and in
character, with the more common behaviors being robust and the rare behavior displaying the
great fragility of these systems. When we take and build our new network-centric embedded-
everywhere systems that some of us in this room are going to provide to all of us, they will
degrade, and the reality may be much, much worse. This is also an indication of robust yet
fragile. The typical behavior is much smaller than the worst case, and the worst case is very, very
bad.
FIGURE 1
3

OCR for page 1

FIGURE 2
I am not going to focus so much on disasters. Jean Carlson is going to talk about that
tomorrow, but I do want to get some of the ideas behind these kinds of statistics.
They are either all you study or you won’t study it at all, and there is not much conversation back
and forth between those groups of people. The reason you get straight lines is that logarithms of
a power law give you straight lines, and that is the slope alpha. That just describes the data, and I
don’t assume any kind of model. If I said I am going to think of this as samples from a stochastic
process, that is a model. What I have written down is just data and I have done no statistics on it.
The fact that I have drawn a line there is just to help your eye visualize things. One important
thing to do is to look at data. One of the things that isn’t often done in papers is people don’t
show you their data in a way that you can make judgments about the statistics they have made.
We will see that there are lots of errors that arise because of this.
Power laws are really ubiquitous. Why is that and what do probability theory and
statistics tell us about power laws? Well, if I were to ask you why are Gaussians so ubiquitous
you would say it is central limit theorem. If I asked you why the exponential distribution is so
popular or so common you would probably talk about the Markov properties or the
marginalization properties, which are essentially, mathematically the same thing. They both
occur in situations of low variability, so expect to see Gaussians and exponentials all over the
4

OCR for page 1

place.
When it comes to systems with high variability, power laws actually have even more
strong statistical properties than Gaussians or exponentials. In fact, they have all that Gaussians
and exponentials have, and even more. They are in some sense—I think Walter Willinger coined
this phrase—more normal than normal. We should not call Gaussians normal; we should call
power laws normal. They arise for no reason other than the way you measure data, and the way
you look at the world will make them appear all over the place, provided there is high variability.
Much of science goes out of its way to get rid of high variability. We study in our labs
low-variability phenomena, but high variability is everywhere. With the former, you get
convergent moments, while the latter corresponds to divergent moments. Typical notions of
moments and mean and variance don’t mean anything. We talk about the coefficient of variation
in a Gaussian. That is a well-defined concept in an exponential; it is actually 1. In power laws,
of course, it is divergent. The issue is that power laws are not exceptional, but the really
important issue is low versus high variability. What about mechanisms? We have large disasters
because we have uncertain environments, we put assets at risk, and we only devote a certain
amount of resources to ameliorating those risks, and thus sometimes those resources get
overwhelmed. So, large events are not really surprising. If we were just to plot the large power
outage of August 14, 2003, it is completely consistent with this data. The ramifications of the
attacks on September 11, 2001, as a technological disaster, are not off the map. Hurricane
Katrina is not off the map. These are all more or less consistent with the largest event. Like I
said, I am not going to discuss disasters very much. High variability is much more fundamental
and is very ubiquitous, particularly in highly engineered or evolved systems. Power laws are
more normal. They shouldn’t be thought of as signatures of any specific mechanisms any more
than Gaussians are. It also means that their statistical properties lead people to find them where
they aren’t, through statistical errors. They are very abundant in networks, and I will come back
to try to talk about in biology, in particular, why they are everywhere in biology.
Because of all these properties, there are major errors that are not just isolated instances,
but typical in high-impact journals in science, presumably because mathematicians and
statisticians play little or no role in writing, reviewing, or on the editorial boards of these journals.
One of these errors is variability in power laws. They are badly misestimating the slope or
basically, even more profoundly, misunderstanding where high variability comes from in the first
place. A typical mistake that is made, taking data that are actually exponential and plotting them
in such a way that suggests a power law, is exemplified in the following series of figures. I
numerically generated random data with the little Matlab code shown, using Matlab’s random
5

OCR for page 1

number generator (actually a pseudo-random number generator). I generated exponentially
distributed data, and on Figure 3’s semi-log plot you see it is nearly a straight line. I checked that
I had written my Matlab program right, and that the Matlab random number generator works. It
is linear on a semi-log plot because, obviously, you take logs of an exponential and you get linear
with slope -a.
FIGURE 3
Instead of these rank plots you might think that we could just look at frequencies.
Frequencies are obviously related to the rank as shown in Figure 4, even in this case where they
are integers. You could write a little program to calculate the frequency plot and do a frequency
plot and, lo and behold, a power law as shown in Figure 5, slope 1.5. All of a sudden, I think I
have a power law here. In this case I know I don’t, but in some sense I have made a classic error.
The idea is that I just plotted in a silly way—again, you are differentiating data—it is noisy and
you are going to get problems. I don’t want to belabor this point too much because it is well
understood why this happens, there is no mystery about it; it is just a matter of proper education.
The statistics community has to explain this to people.
6

OCR for page 1

FIGURE 4
FIGURE 5
7

OCR for page 1

FIGURE 6
8

OCR for page 1

FIGURE 7
We really have got to not do that. You would think that it was an isolated instance but, in
fact, it is almost always the case that you will see the plots on the right above are systematic
errors. There are a bunch of networks that have been claimed to have power laws in them, and the
data that was presented actually didn’t.
Figure 8 comes from a paper-on-protein interaction power law, an article from Nature. I
got the data and re-plotted it, but this is basically a plot that appears in a supplement. The paper
concluded that it is a power law with a slope 1. It doesn’t even have finite mean or variance, if
you think of it as a probability model. What does it really have? Roughly it has exponential.
Again, just to comment, this is an undergraduate sort of error. If a Caltech undergraduate did this
they wouldn’t last long. Of course, this is in Nature and in some sense—I have reviewed a few
papers where I have said, no, you have got to plot it—they say, no, if we don’t plot it the way we
do on the right, then the editors won’t accept them. I said we have got to change that.
9

OCR for page 1

FIGURE 8
10

OCR for page 1

FIGURE 9
Figures 9-13 deal with an analogous error, from Science, does the power grid have a
power law? No, it’s almost a perfect exponential. You can’t get a better exponential than that
and yet it was called a power law. You can ask, what about the World Wide Web? Let’s re-plot
to make sure of what they were doing. They did logarithmic binning. You might think that helps.
In this case, it really might be a power law, but you get the wrong slope. For a power law, slopes
of 1.1 versus 1.7 are really different. You are off by orders of magnitude. So, once you get into
power laws, the slopes are really important—again the point is, it doesn’t exactly look like a
power law, the real data. That is not the big deal. The big deal is it does have high variability.
The Web has high variability. In fact, when you look at the Internet, practically everything has
high variability. It is almost impossible to find something that doesn’t have high variability,
although Internet routers provide an exception. If you look at the router size frequency plot, you
could come up with a power law whereas, in fact, it is clearly exponential in the tail because there
is an excess of small-degree routers. That is probably because, either at the edge of the network
there are a lot of degree-one routers or, in fact, it is probably an artifact of the way the data was
taken, and you have to check this, but it is certainly not a power law. These errors are typical, but
the big one, I think, is badly misunderstanding the origins of high variability, because high
11

OCR for page 1

FIGURE 51
Hard bounds: The bounds are achievable under certain assumptions, and there is a
decomposition theorem that is well known in coding theory about how to achieve that. In control
theory there is the corresponding thing which is a little bit less known, and I am beginning to
realize the reason it is less well known is because people like me teach this stuff really badly. So,
everybody takes their undergraduate course in controls. How many people took an undergraduate
course in controls? It is always the course you hated most. I don’t understand this, and it is a
pathology and a virus that I have. We have got to get it out of some of this stuff.
51

OCR for page 1

FIGURE 52
Anyway, control theory has a similar thing. You take D and E, you take the transforms
and you can define this thing called a sensitivity function. If you think of the other thing as a
conservation law based on channel capacity, you can’t exceed channel capacity here. You have
another conservation law that says that the total sensitivity is conserved. So, I have this demo,
shown in the following illustration. It is harder to move the tip of the pointer around if it is up
than if it is down. If I want to go around here, I can move it around real easy—same parts but way
harder. This theorem tells us why. It turns out that there is an instability it adds to the problem,
and it turns out that the instability gets worse as it gets shorter, and eventually I can’t do it at all,
and you can do the experiment yourself at home. You can check out this theorem at home—same
thing, hard bounds, achievable solution, decomposable.
52

OCR for page 1

FIGURE 53
These two things have been sitting staring at us for 60 years, which is a total
embarrassment. What happens if you stick them together? What would be the best thing that
could possibly be true? You just stick them together? There is the cost of stabilization, there are
the benefits of remote sensing and there is a need for low latency. That can’t possibly be right,
because it is so obvious you would have thought some undergraduate would have done it 50 years
ago. The other weird thing is Shannon worked for Bode at Bell Labs. They actually sat and
talked about this stuff together. Not only that but it is a hard bound; it is achievable under certain
assumptions, with the usual, you first prove it for the added Gaussian case, and then the solution
is decomposable under certain assumptions. It is possible to unify these things, and there are
actually some undergraduate results that come out right away. This hasn’t been published yet,
but it is going to appear at the next conference on decision and control; it has been submitted to
IEEE Transactions on Control, and so on.
53

OCR for page 1

FIGURE 54
54

OCR for page 1

FIGURE 55
55

OCR for page 1

FIGURE 56
Here is a claim or probably, more properly, irresponsible speculation. A lot of the
complexity we see in biology is dominated by dealing with this trade off, and the deal for techno-
networks. We are starting to use communication networks to do control. Biology already does
that. Biology is latency driven everywhere. What does the Internet manage? Really latency. We
need to have a latency theory of communications. Go to the FAST web site and read about this.
This is pretty cool. It is a case where a very interesting new theory was done, global stability
with arbitrary numbers, arbitrary areas and delays so you get very great robustness, global
stability of a nonlinear system in the presence of time delays. I never thought I would see this. If I
had to do it, I wouldn’t. I have good students who did this. One of the co-authors of the paper is
here, Lun Li, and you can talk to her about it. She, I think, actually understands it. Anyway, it
has been very practical and it has led to the theory of new protocols that, again, here this is macho
stuff; this is breaking the world record.
What is the theory? The way you connect these things is it is really a constrained
optimization problem. It starts out as constrained optimization, but then you really have to redo
optimization theory to get it to work in this layer decomposition way. There is actually a
coherent theory for the first time of these layered, decentralized, asynchronous protocols. It is a
56

OCR for page 1

nascent theory. That little theorem I showed you before is an example of the kind of unified
theory that is now coming along, but it is very much in its beginnings because we had these
stovepipe areas chugging along for 50 years with the smartest people I know cranking out
theorems in an isolated way. We have to go back 50 years and start putting them back together
again. It looks doable and as I tried to show you, we are going to have to think about high
variability all over the place. The high-variability statistics is going to be a dominant issue, and it
is not necessarily a bad thing. In our disaster world it is a bad thing, but a lot of times this high
variability can be exploited. TCP exploits at a huge amount. My cell phone story, if that wasn’t
there, TCP wouldn’t work. So, it is a situation where we have got to be able to learn to exploit
that more systematically. I think you are probably going to hear about bits and pieces of this
from others throughout the next couple of days. I will stop and see if you have any quick
questions.
QUESTIONS AND ANSWERS
QUESTION: A very quick question. In your earlier plot you showed for the power law,
you said that apparently it is not power law, but more like exponential. It looks like it is a better
fit with the power law than exponential. How do you explain that?
DR. DOYLE: It is a statistical error. So, it is a case where you simply are making a
mistake. Again, I didn’t really explain the details. This is a standard thing. The idea is that the
thing on the right was a differentiated version of the thing on the left. The simple way of thinking
about it is, you differentiate it and you create all this noise. There are two advantages to the
picture on the left, the rank plot. First of all, I show you the data in its raw form, and you can say
whether or not you think it is a power law. There are no statistics done. I show you the raw data.
Then you can do a little bit of statistics by just sort of eyeballing the straight line. The idea is that
you shouldn't think of a straight line it has to fit. The idea is that we use mean and variance to
describe all sorts of low-variability phenomena. We know it isn’t really Gaussian and we don’t
check the other moments. We need similarly to find better ways, in a broader sense, to describe
high-variability data. All of a sudden you get means of 1 and variances of 100—it is not a useful
statistic, it doesn't convert. You take the data over again. Means and variances don’t mean
anything in this high-variability world. What you need to do robust statistics is a solvable power
law, but you would want to use that even when it wasn't a power law, just like you use mean and
variance, so there are robust ways to do this.
This is all decades old statistics, and we don't teach it very well. That is one of the
57

OCR for page 1

challenges that we need to do a better job, just of teaching these things. The problem is science
has so focused on getting rid of high variability. High variability was thought to be a bad thing,
like you are doing bad experiments. As Jean will point out, high variability exists all over in
nature. In the laboratory, it only exists in very exotic circumstances, so it is associated with
exotica.
DR. HANDCOCK: I would like to pick up on that question a little bit, about the
identification and characterization of power laws. I think one issue that needs to be discussed, or
at least I would like to hear some discussion more of, is the case where statisticians in particular
have looked at questions for a long period of time, but they haven’t really been used, that
knowledge has not been used in other areas of science.
DR. DOYLE: That is absolutely right, it has not been used.
DR. HANDCOCK: In this area, I think it is particularly important. Clearly, statisticians
have had a lot to say about curve fitting and other issues like that, but what I am most reminded
of is the recurrence of these debates. The classic example to my mind is Morris Kendell’s 1961
address to the Royal Statistical Society, where he essentially lambastes previous work done in a
half a century before that time on essentially very similar questions here.
DR. DOYLE: This is a very old story, a very old story.
DR. HANDCOCK: I would like to hear people say more, how can the work of
statisticians be more recognized and just routinely used by other sciences, to avoid occurrences of
this kind?
DR. DOYLE: I will tell you that my experience has been to give them good software
tools do it right and make it easy for them to do it right. If you go to Matlab and get the stat tool
box, it’s all low variability. So, my experience has certainly been, if we want to get good, new
robust control theory into the hands of people, you make software.
If you want to get biologists to be able to share models, you have to have a systems
biology mark-up language, you have to make software. So, you have to turn your theories into
software, and it has got to be usable, it has got to be user friendly. So, we use Matlab and call the
SVD, and how many of us know actually how the SVD works, sort of?
Well, you don’t need to. The point is, if you do it right, you need to know what it does,
but you don’t need to know exactly how it does it. We need to do the same thing for high-
variability statistics. It is one thing to lambaste everybody about it. It is another thing to try to
create the tools, and show people, teach people. This is the thing we have got to do, we have got
to teach people. There are people out there who are discovering this high-variability data
everywhere. We are going to hear a lot about it in the next two days, and it is everywhere. The
58

OCR for page 1

irony is, because of its strong statistical properties, you end up finding it where it isn’t, and the
stats community can make a lot of contribution. The problem is that I am not sure anybody
studies this any more. It is this old, old topic, old classical stability laws; it was hot in the 1920s,
it was hot in the 1950s, and maybe it will come back. I think it should come back.
DR. KLEINFELD: Implicit in your talk was this notion that in engineering systems
things are components, and also implicit in your talk and sort of in the world is that, the way
biology seems to design things, they are very much in modules. Is there some sort of theorem that
says that one should really build things in modules, like some complexity diverges or . . .
DR. DOYLE: Yes and no. I would say we have nascent, little toy theorems that suggest
that these architectures—here the point is, you don’t have modules unless you have protocols. If
you look around, you discover the modules first, you see them, you see the parts, but they are
meaningless unless there are protocols. None of this stuff would work or hook together unless
they had protocols. What we can prove now is that some of these protocols are optimal in the
sense that they are as good as they can be.
For example, TCP properly run achieves a global utility sharing, that is fair sharing
among all the users that use it. What that says is that you can get the same performance as if you
had a centralized non-modular solution, but you can get it with a protocol that is both robust and
evolvable. Now, is there any other way to do it? We don't know. You prove that this is optimal
and that this is robust. So, there is a lot more work to be done. I mean, it is only the last few years
that we have even had a coherent theory of how TCP/IP works. So, this is quite a new area. I
think that is an important question. How much further can we go in proving properties? What we
need to design now, as engineers, we need to design protocols. We need to design protocols that
are going to run our world, and we need to do them and make them robust and verifiable, and
scalable. Now we don’t do that very well.
In biology, what we are doing is reverse engineering the protocols that evolution has
come up with. Unfortunately, the good news is, it looks like it uses more or less the same
protocols. So, it is not going to be incomprehensible. It could just be this infinite parts list. If we
just made a parts list of everything on the Internet, it would be completely bewildering, we would
never make any sense of it. Because we know the architecture, and because we know the
protocols, we know what everything is doing. We have got to do the same thing with biology.
59

OCR for page 1

REFERENCE
Han, J.D., et al. 2004. “Evidence for Dynamically Organized Modularity in the Yeast Protein-
Protein Interaction Network.” Nature 430.
60

OCR for page 1

Network Models
61