Graph Spectra

Sparsest Cuts in Minimum Time

When chemists want to identify the molecules in an unknown sample—say, a rock from another planet—the first thing they do is measure its spectrum. That is, for instance, how space scientists detected water on the Moon and methane on Saturn’s moon, Titan.

It may seem surprising that mathematical scientists are interested in measuring spectra too, in a whole host of applications that seem to have little to do with the original meaning of the word. To detect the edges of a cancerous region in a mammogram, they can use an “image segmentation” program that computes a spectrum. Similarly, problems like designing a computer chip that runs efficiently, identifying communities within a social network, or creating a computer network that won’t shut down if some of the computers go offline, all involve analyzing the spectra of networks, or what mathematicians call “graphs.”

To understand what a molecular spectrum is, think of a molecule as a collection of balls bound together with the aid of springs. The springs are always vibrating, and there are certain modes of vibration that, if left alone, will keep on going indefinitely. Each mode of vibration corresponds to a specific energy. The molecule prefers to absorb and emit photons with exactly the right energy to excite one particular mode. Those photons can be detected by a spectrometer, and the characteristic frequencies of the photons allow you to identify the molecule.

A network, which is a type of graph (a mathematical structure used to represent relationships between objects), is a more general mathematical version of the balls-and-springs model of a molecule. The most obvious difference, perhaps, is that networks of transistors or people don’t vibrate and therefore don’t have an obvious energy function



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 40
/ Graph Spectra Sparsest Cuts in Minimum Time W hen chemists want to identify the molecules in an unknown sample—say, a rock from another planet—the first thing they do is measure its spectrum. That is, for instance, how space scientists detected water on the Moon and methane on Saturn’s moon, Titan. It may seem surprising that mathematical scientists are interested in measuring spectra too, in a whole host of applications that seem to have little to do with the original meaning of the word. To detect the edges of a cancerous region in a mammogram, they can use an “image segmentation” program that computes a spectrum. Similarly, problems like designing a computer chip that runs efficiently, identifying communities within a social network, or creating a computer network that won’t shut down if some of the computers go offline, all involve analyzing the spectra of networks, or what mathematicians call “graphs.” To understand what a molecular spectrum is, think of a molecule as a collection of balls bound together with the aid of springs. The springs are always vibrating, and there are certain modes of vibration that, if left alone, will keep on going indefinitely. Each mode of vibration corresponds to a specific energy. The molecule prefers to absorb and emit photons with exactly the right energy to excite one particular mode. Those photons can be detected by a spectrometer, and the characteristic frequencies of the photons allow you to identify the molecule. A network, which is a type of graph (a mathematical structure used to represent relationships between objects), is a more general mathematical version of the balls-and- springs model of a molecule. The most obvious difference, perhaps, is that networks of transistors or people don’t vibrate and therefore don’t have an obvious energy function FUELING innovation and discovery 40

OCR for page 40
associated with them. However, this difference is only superficial. In fact, mathematicians have derived a way of measuring energy on any network. The way they do this is with something called a Laplacian, which has characteristic frequencies or eigenvalues, just like a vibrating molecule. Those frequencies collectively form a spectrum, and this spectrum is in many cases the best way to gain an overview of the network. For instance, it helps you find cliques (closely connected sets of nodes) and transportation bottlenecks. But there is a problem of scale, because the networks of today are much larger than those in molecules. For example, the users of Facebook form a social network with nearly 1 billion active users. An Intel Xeon chip is a network with 800 million transistors. The Web is a network with 46 billion Web pages (and growing). Mathematical scientists have to design, in effect, a new kind of spectrometer to compute the spectra of such immense graphs. Laplacians have a very long history—in fact, they predate both the concept of molecular spectra and the concept of graph theory. They were first defined in the 19th century to solve problems of heat flow and wave motion. The heat equation says that heat will flow toward any point that is colder than the average of its neighbors and away from any point that is warmer. The wave equation says that a point on a vibrating drumhead will accelerate in the direction of the average of its neighbors. In both cases, the operation “averaging the neighbors” is performed by a differential operator called the Laplacian (see Figure 17). In the drum example, the eigenvalues of the Laplacian correspond to particularly simple modes of vibration, which we hear as the fundamental frequency and the overtones of the drum. In the heat equation, the eigenvalues have a slightly different interpretation. A cooling bar of metal has a relaxation time that governs how rapidly the bar cools off. This time is inversely related to the first (or smallest) eigenvalue of the Laplacian. 17 / Fundamental modes of vibration of a drum. Notice how certain frequencies of vibration isolate certain regions of the drum. Graph spectra are based on a similar idea, with the drumhead being replaced by a network of springs. / in the 21st Century The Mathematical Sciences 41

OCR for page 40
In 1970, mathematician Jeff Cheeger proved that Laplacians are good at detecting bottlenecks in surfaces and higher dimensional objects called manifolds. The reason is especially apparent in the case of heat flow. If a metal bar is shaped like an hourglass, the constriction means that the first eigenvalue of the Laplacian will be small. In turn, this implies that the relaxation time will be longer than for a rectangular bar, and that is indeed what one measures. The Laplacian on a network is defined analogously to the Laplacian on a surface. It is again computed by taking the difference between a function’s values at a point and the average of its values at the neighboring points. In 1985, it was shown that Cheeger’s result applied perfectly to this new setting. The first eigenvalue of the Laplacian, which is relatively easy to compute, is a good proxy for the width of the narrowest bottleneck, which is harder to determine. A network has a bottleneck if it contains two (large) subgraphs with (relatively) few connections between them. (There are some subtleties connected with the words “large” and “relatively.”) A driveway isn’t a bottleneck, because if you cut it you take only one house out of the network. However, if two large cities are joined by only one bridge—such as the Ambassador Bridge between Detroit and Windsor, Canada—that bridge is definitely a bottleneck. Even if you include the Detroit-Windsor Tunnel, creating a second link between the cities, the ratio of the number of cuts required to disconnect the network (two) to the minimum number of people taken out of the network (216,000 on the Windsor side) is still very small (2/236,000, or 0.00001) (see Figure 18). 18 / The Ambassador Bridge connecting Detroit, Michigan, and Windsor, Ontario, is a very visible example of a “miminum cut.” / Detroit-Windsor Tunnel Ambassador Bridge Port of Detroit FUELING innovation and discovery 42

OCR for page 40
In a map of a city it is easy to spot the links that, if cut, have the largest effects on connectivity, especially if there is a river flowing through it. Such a link is called a sparsest cut, and in a typical graph, it is much harder to spot. But there are many instances in which we want to make those cuts, such as when we wish to decompose a network into simpler subgraphs with minimal effect on connections. For instance, a Laplacian-based method has been used to extract features from images. The surface of an object can be approximated by a mesh. The spectrum of this mesh can be used by a computer to identify distinct regions of the image without any human input. Cheeger’s work was important because it related something that was known to be very difficult to compute (sparsest cuts) to something that is much easier to compute (eigenvalues). It should be noted that Cheeger’s theorem is an estimate, not a precise equation. However, the connection has proved robust enough to make the spectrum of Until recently, the Laplacian an important tool in graph partitioning. algorithms for Spectral methods—a class of techniques to numerically solve dynamical (or constantly finding the evolving) systems—can be used to design new networks as well as to study existing ones. For example, networks with the largest possible eigenvalues, called “expander graphs,” eigenvalues did have some remarkable properties. They convey information very rapidly, so it is easy to not scale well and route messages from one point to another. They have no bottlenecks, so they cannot were not practical be disabled by the failure of a few nodes or links. Unfortunately, they are not so easy to for such large design, so research in this area is currently a high priority. Finally, although eigenvalues are easier to compute than sparsest cuts, even they get networks. However, difficult when you are faced with monster networks consisting of millions or billions of it now appears that nodes. Until recently, algorithms for finding the eigenvalues did not scale well and were this barrier has not practical for such large networks. However, it now appears that this barrier has been breached. been breached. In 2004, a method was devised for solving equations involving graph Laplacians that was significantly faster (in theory) than any previously known method. Like many great scientific advances, the strategy was wonderful, completely unexpected, and yet in retrospect completely natural. The idea is to replace the Laplacian of the graph with the Laplacian of a well-chosen subgraph. If this is done carefully, the spectrum is almost undisturbed yet the graph becomes much sparser. Imagine ripping out 95 percent of the streets in Manhattan and having everybody’s commute time remain almost the same! That is roughly what this method managed to do. There is still plenty of room for practical improvement, and some substantial improvements have already been made. The long-standing dream of finding sparsest cuts in approximately linear time—in other words, being able to understand the structure of a network almost as quickly as one can read in the data—now appears much closer. in the 21st Century The Mathematical Sciences 43