The vitality of the U.S. mathematical sciences enterprise is excellent. The mathematical sciences have consistently been making major advances in both fundamental theory and high-impact applications, and recent decades have seen tremendous innovation and productivity. The discipline is displaying great unity and coherence as more and more bridges are built between subfields of research. Historically, such bridges serve as drivers for additional accomplishments, as do the many interactions between the mathematical sciences and fields of application, and so the existence of several striking examples of bridge-building in this chapter is a very promising sign for the future. The programs of the National Science Foundation’s (NSF) mathematical science institutes offer good evidence of this bridge-building, and the large-scale involvement of graduate students and postdoctoral researchers at those institutes suggests that the trend will continue. New tools for scholarly communications, such as blogs and open-access repositories of research, contribute to the excellent rate of progress at the research frontiers. As shown in this chapter and in the separate report Fueling Innovation and Discovery: The Mathematical Sciences in the 21st Century, the discipline’s vitality is providing clear benefits for diverse areas of science and engineering, for industry and technology, for innovation and economic competitiveness, and for national security.
Further down the road, toward 2025, stresses are likely, and these will be discussed in Chapters 4 through 6. The focus of this chapter is to document some recent advances as illustrations of the health and vitality of the mathematical sciences. Indeed, the growth of new ideas and applications in the mathematical sciences is so robust that inevitably it far exceeds the spectrum
of expertise of a modest-sized committee. What appears below is just a sampling of advances, to give a flavor of what is going on, and it is not meant to be either comprehensive or proportionally representative. This chapter is aimed primarily at the mathematical sciences community, and so the examples here presume a base of mathematical or statistical knowledge. The topics covered range from the solution of a century-old problem by using techniques from one field of mathematics to solve a major problem in another, to the creation of what are essentially entirely new fields of study. The topics are, in order:
• The Topology of Three-Dimensional Spaces
• Uncertainty Quantification
• The Mathematical Sciences and Social Networks
• The Protein-Folding Problem and Computational Biology
• The Fundamental Lemma
• Primes in Arithmetic Progression
• Hierarchical Modeling
• Algorithms and Complexity
• Inverse Problems: Visibility and Invisibility
• The Interplay of Geometry and Theoretical Physics
• New Frontiers in Statistical Inference
• Economics and Business: Mechanism Design
• Mathematical Sciences and Medicine
• Compressed Sensing
The modest title of this section hides a tremendous accomplishment. The notion of space is central to the mathematical sciences, to the physical sciences, and to engineering. There are entire branches of theoretical mathematics devoted to studying spaces, with different branches focusing on different aspects of spaces or on spaces endowed with different characteristics or structures.1 For example, in topology one studies spaces without assuming any structure beyond the notion of coherence or continuity. By contrast, in geometry one studies spaces in which, first of all, one can differentiate, leading to notions such as tangent vectors, and, second, for which the notion of lengths and angles of tangent vectors are defined. These concepts were first introduced by Riemann in the 1860s in his thesis “The hypotheses that underlie geometry,” and the resulting structure is called a Riemannian metric. Intuitively, one can imagine that to a topologist spaces are made out of rubber or a substance like taffy, while
BOX 2-1 Mathematical Structures
At various points, this chapter refers to “mathematical structures.” A mathematical structure is a mental construct that satisfies a collection of explicit formal rules on which mathematical reasoning can be carried out. An example is a “group,” which consists of a set and a procedure by which any two elements of the set can be combined (“multiplied”) to give another element of the set, the product of the two elements. The rules a group must satisfy are few: the existence of an identity element, of inverses for each element of the set, and the associative property for the combining action. Basic arithmetic conforms with this definition: For example, addition of integers can be described in these terms. But the concept of groups is also a fundamental tool for characterizing symmetries, such as in and theoretical physics. This level of abstraction is helpful in two important ways: (1) it enables precise examinations of mathematical sets and operations by stripping away unessential details and (2) it opens the door to logical extensions from the familiar. As an example of the latter benefit, note that the definition of a group allows the combination of two elements to depend on the order in which they are “multiplied,” which is contrary to the rules of arithmetic. With the explicit recognition that that property is an assumption, not a necessary consequence, mathematicians were able to define and explore “noncommutative groups” for which the order of “multiplication” is significant. It turns out that there are many natural situations that are naturally represented by a noncommutative group.
It is possible, of course, to define a mathematical structure that is uninteresting and that has no relevance to the real world. What is remarkable is how many interesting mathematical structures there are, how diverse are their characteristics, and how many of them turn out to be important in understanding the real world, often in unanticipated ways. Indeed, one of the reasons for the limitless possibilities of the mathematical sciences is the vast realm of possibilities for mathematical structures. Complex numbers, a mathematical structure build around the square root of –1, turn out to be rooted in the real world as part of the essential equations describing electromagnetism and quantum theory. Riemannian metrics, the mathematical structure developed to describe objects whose geometry varies from point to point, turns out to be the basis for Einstein’s description of gravitation. “Graphs” (these are not the graphs used to plot functions in high school) consisting of “nodes” joined by “edges,” turn out to be a fundamental tool used by social scientists to understand social networks.
A striking feature of mathematical structures is their hierarchical nature—it is possible to use existing mathematical structures as a foundation on which to build new mathematical structures. For example, although the meaning of “probability” has long vexed philosophers, it has been possible to create a mathematical structure called a “probability space” that provides a foundation on which realistic structures can be built. On top of the structure of a probability space, mathematical scientists have built the concept of a random variable, which encapsulates in a rigorous way the notion of a quantity that takes its values according to a
certain set of probabilities, such as the roll of a pair of dice—one gets a different random variable depending on whether the dice are honest or loaded. There then are certain broad classes of random variables, of which the most famous are Gaussian random variables, the source of the well-known bell curve and which provide the rigorous basis of many of the fundamental tools of statistics. These different classes of random variables can be put together into structures known as probabilistic models, which are an incredibly flexible class of mathematical structures used to understand phenomena as diverse as what goes on inside a cell, financial markets, or the physics of superconductors.
Mathematical structures provide a unifying thread weaving through and uniting the mathematical sciences. For example, algorithms represent a class of mathematical structure; algorithms are often based on other mathematical structures, such as the graphs mentioned earlier, and their effectiveness might be measured using probabilistic models. Partial differential equations (PDEs) are a class of mathematical structure built on the most basic of mathematical structures—functions. Most of the fundamental equations of physics are described by PDEs, but the mathematical structure of PDEs can be studied rigorously independent of knowing, say, what the ultimate structure of space looks like at subnuclear scales. Such study can, for example, provide insight into the structures of the potential solutions of a certain PDE, explaining which phenomenology can be captured by that PDE and which cannot. Computations involving PDEs involve yet another type of mathematical structure, a “discretization scheme,” which transforms what is a fundamentally continuous problem into one involving just a very large but finite set of values. The catch is that finding the right discretization scheme is highly subtle, and discretization is one example of the mathematical depth of the computational wing of the mathematical sciences.
to a geometer they are made out of steel. Although we have no direct visual representation of spaces of higher dimension, they exist as mathematical objects on the same footing as lower-dimensional spaces that we can directly see, and this extension from physical space has proved very useful. Topological and geometric spaces are central objects in the mathematical sciences. They are also ubiquitous in the physical sciences, in computing, and in engineering, where they are the context in which problems are precisely formulated and results are expressed.
Over 100 years ago Poincaré initiated the abstract and theoretical study of these higher-dimensional spaces and posed a problem about the three-dimensional sphere (a unit sphere in a four-coordinate space whose surface has three dimensions) that motivated topological investigations into three-
dimensional spaces over the century that followed.2 The three-dimensional sphere sits in ordinary four-dimensional coordinate space as the collection of vectors of unit length: that is, the space
It has the property that any closed path (one that starts and ends at the same point) on the sphere can be continuously deformed through closed paths, always staying on the sphere, so as to shrink it down to a point path—that is, a path in which no motion occurs. Poincaré asked if this was the only three-dimensional space of finite extent, up to topological equivalence, with this property. For the next 100 years this problem and its generalizations led to enormous theoretical advances in the understanding of three-dimensional spaces and of higher-dimensional spaces, but Poincaré’s original problem remained unsolved. The problem proved so difficult, and the work it stimulated so important, that in 2000 the Clay Mathematics Institute listed it as one of its seven Millennium Prize problems in mathematics, problems felt to be among the hardest and most important in theoretical mathematics.
This is purely a problem in topology. The condition about paths is on the face of it a topological condition (since to make sense of it one only needs the notion of continuity), and the conclusion is explicitly topological. For nearly 100 years it was attacked in purely topological terms. Then in 2002, Grigory Perelman succeeded in answering this question in the affirmative. Resolving a central question that has been the focus of attention for many decades is an exciting event. Here the excitement was accentuated by the fact that the solution required powerful results from other parts of theoretical mathematics, suggesting connections between them that had not been suspected. Perelman invoked deep ideas from analysis and parabolic evolution equations developed by Richard Hamilton. Briefly, Hamilton had introduced and studied an evolution equation for Riemannian metrics that is an analogue of the heat equation. Perelman was able to show that under the topological hypotheses of Poincaré’s conjecture, Hamilton’s flow converged to the usual metric on the 3-sphere, so that the underlying topological space is indeed topologically equivalent to the 3-sphere. But Perelman’s results applied to Riemannian metrics on every three-dimensional space and gave a description of any such space in terms of simple geometric pieces, which is exactly what William Thurston had conjectured some 20 years earlier. Ironically, the ideas that led Thurston to his conjectured description
2 The convention in mathematics is that the surface of a basketball is a 2-sphere (because it has two dimensions) even though it sits in three-dimensional space. The entire basketball including the air inside is called a “ball” or a “3-ball” rather than a sphere.
have their source in other, more geometric, parts of Poincaré’s work, his work on Fuchsian and Kleinian groups.
Perelman’s solution brings to a close a chapter of theoretical mathematics that took approximately 100 years to write, but at the same time it has opened up new avenues for the theoretical study of space. This mathematical breakthrough is fairly recent and it is too early to accurately assess its full impact inside both mathematics and the physical sciences and engineering. Nevertheless, we can make some educated guesses. Even though this is the most abstract and theoretical type of mathematics, it may well have practical import because spaces are so prevalent in science and engineering. What Perelman achieved for the particular evolution equation that he was studying was to understand how singularities develop as time passes. That particular equation is part of a general class of equations, with the heat equation being the simplest. In mathematics, various geometric problems belong to this class, as do equations for the evolution of many different types of systems in science and engineering. Understanding singularity development for these equations would have a huge impact on mathematics, science, and engineering, because the behavior of solutions near singularities can be so important. Already, we are seeing the use of the techniques Perelman introduced to increase the understanding in other geometric contexts, for example in complex geometry. Time will tell how far these ideas can be pushed, but if they extend to equations in other more applied contexts, then the payoff in practical terms could well be enormous.
Central to much of science, engineering, and society today is the building of mathematical models to represent complex processes. For example, aircraft and automotive manufacturers routinely use mathematical representations of their vehicles (or vehicle components) as surrogates for building physical prototypes during vehicle design, relying instead on computer simulations that are based on those mathematical models. The economic benefit is clear. A prototype automobile that is destroyed in a crash test, for example, can cost $300,000, and many such prototypes are typically needed in a testing program, whereas a computer model of the automobile that can be virtually crashed under many varying conditions can often be developed at a fraction of the cost.
The mathematical modeling and computational science that underlies the development of such simulators of processes has seen amazing advances over the last two decades, and improvements continue to be made. Yet the usefulness of such simulators is limited unless it can be shown that they are accurate in predicting the real process they are simulating.
A host of issues arise in the process of ensuring that the simulators are accurate representations of the real process. The first is that there are typically many elements of the mathematical model (such as rate coefficients) that are unknown. Also, inputs to the simulators are often themselves imperfect—for example, weather and climate predictions must be initiated with data about the current state, which is not known completely. Furthermore, the modeling is often done with incomplete scientific knowledge, such as incomplete representation of all the relevant physics or biology, and approximations are made because of computational needs—for example, weather forecasting models being run over grids with 100-kilometer edges, which cannot directly represent fine-scale behaviors, and continuous equations being approximated by discrete analogues.
The developing field for dealing with these issues is called Uncertainty Quantification (UQ), and it is crucial to bringing to fruition the dream of being able to accurately model and predict real complex processes through computational simulators. Addressing this problem requires a variety of mathematical and statistical research, drawing on probability, measure theory, functional analysis, differential equations, graph and network theory, approximation theory, ergodic theory, stochastic processes, time series, classical inference, Bayesian analysis, importance sampling, nonparametric techniques, rare and extreme event analysis, multivariate techniques, and so on.
UQ research is inherently interdisciplinary, in that disciplinary knowledge of the process being modeled is essential in building the simulator, and disciplinary knowledge of the nature of the relevant data is essential in implementing UQ. Thus, effective UQ typically requires interdisciplinary teams, consisting of disciplinary scientists plus mathematicians and statisticians.
This appreciation for the need for UQ has grown into a clamor; calls for investment in UQ research can be found in any number of science and engineering advisory reports to funding agencies. Good progress has been made, and building the necessary capabilities for UQ is essential to the reliable use of computational simulation. The clamor has also been noticed by the mathematical and statistical societies. Interest groups in UQ have been started by the Society for Industrial and Applied Mathematics and the American Statistical Association. The two societies have also started a joint journal, Uncertainty Quantification.
The emergence of online social networks is changing behavior in many contexts, allowing decentralized interaction among larger groups and fewer geographic constraints. The structure and complexity of these networks
have grown rapidly in recent years. At the same time, online networks are collecting social data at unprecedented scale and resolution, making social networks visible that in the past could only be explored via in-depth surveys. Today, millions of people leave digital traces of their personal social networks in the form of text messages on mobile phones, updates on Facebook or Twitter, and so on.
The mathematical analysis of networks is one of the great success stories of applying the mathematical sciences to an engineered system, back to the days when AT&T networks were designed and operated based on graph theory, probability, statistics, discrete mathematics, and optimization. However, since the rise of the Internet and social networks, the underlying assumptions in the analysis of networks have changed dramatically. The abundance of such social network data, and the increasing complexity of social networks, is changing the face of research on social networks. These changes present both opportunities and challenges for mathematical and statistical modeling.
One example of how the mathematical sciences have contributed to the new opportunity is the significant amount of recent work focused on the development of random-graph models that capture some of the qualitative properties observed in large-scale network data. The mathematical models developed help us understand many attributes of social networks. One such attribute is the degree of connectivity of a network, which in some cases reveals the smallness of world, in which very distant parts of a population are connected via surprisingly short paths.3 These short paths are surprisingly easy to find, which has led to the success of decentralized search algorithms.
Another important direction is the development of mathematical models of contagion and network processes. Social networks play a fundamental role in spreading information, ideas, and influence. Such contagion of behavior can be beneficial when a positive behavior change spreads from person to person, but it can also produce negative outcomes, as in cascading failures in financial markets. Such concepts open the way to epidemiological models that are more realistic than the “bin” models that do not take the structure of interpersonal contacts into account. The level of complexity in influencing and understanding such contagion phenomena rises with the size and complexity of the social network. Mathematical models have great potential to improve our understanding of these phenomena and to inform policy discussions aimed at enhancing system performance.
Knowing the shape of a protein is an essential step in understanding its biological function. In Nobel-prize winning work, biologist Christian Anfinsen showed that an unfolded protein could refold spontaneously to its original biologically active conformation. This observation led to the famous conjecture that the one-dimensional sequence of amino acids of a protein uniquely determines the protein’s three-dimensional structure. That in turn led to the almost 40-year effort of quantitative scientists in searching for computational strategies and algorithms for solving the so-called “protein-folding problem,” which is to predict a protein’s three-dimensional structure from its primary sequence information. Some subproblems include how a native structure results from the interatomic forces of the sequence of amino acids and how a protein can fold so fast.4 Although the protein-folding conjecture has been shown to be incorrect for a certain class of proteins—for example, sometimes enzymes called “chaperones” are needed to assist in the folding of a protein—scientists have observed that more than 70 percent of the proteins in nature still fold spontaneously, each into its unique three-dimensional shape.
In 2005, the protein-folding problem was listed by Science magazine as one of the 125 grand unsolved scientific challenges. The impact of solving the protein-folding problem is enormous. It will have a direct and profound impact on our understanding of life: how these basic units, which carry out almost every function in living cells, play their roles at the fundamental physics level. Everything about the molecular mechanism of protein motions—folding, conformational changes, and evolution—could be revealed, and the whole cell could be realistically modeled. Furthermore, such an advance would have a great influence on novel protein design and rational drug design, which may revolutionize the pharmaceutical industry. For example, drugs might be rather accurately designed on a computer without much experimentation. Genetic engineering for improving the function of particular proteins (and thus certain phenotypes of an individual) could become realistic.
Conceptually, the protein-folding problem is straightforward: Given the positions of all atoms in a protein (typically tens of thousands of atoms), one would calculate the potential energy of the structure and then find a configuration that minimizes that energy. However, such a goal is technically difficult to achieve owing to the extreme complexity of the means by which the energy depends on the structure. A more attractive strategy, termed “molecular dynamics,” has a clear physical basis: One uses
4 Dill et al., 2007, The protein folding problem: When will it be solved? Current Opinion in Structural Biology 17: 342-346.
Newton’s law of motion to write down a set of differential equations, called Hamiltonian equations, to describe the locations and speeds of all the atoms involved in the protein structure at any instant. Then, one can then numerically solve the protein structural motion equations, which not only predict low-energy structures of a protein but also provide information on protein movements and dynamics. To achieve the goal, one typically discretizes the time and approximates the differential equations by difference equations. Then, a molecular dynamic algorithm such as the leapfrog algorithm is used to integrate over the equations of motion. However, because the system is large and complex, the time step in the discretization must be sufficiently small to avoid disastrous conflicts, which implies that the computation cost is extremely high for simulating even the tiny fraction of a second that must be simulated.
Another strategy is based on the fundamental principle of statistical mechanics, which states that the probability of observing a particular structural state is proportional to its Boltzmann distribution, which is of the form P(S) ∝ e–E(S)/κT, where E(S) is the potential energy of structural state S. Thus, one can use Monte Carlo methods to simulate the structural state S from this distribution. However, because of the extremely high dimensionality and complexity of the configuration space and also because of the complex energy landscape, simulating well from the Boltzmann distribution is still very challenging. New Monte Carlo techniques are needed to make the simulation more efficient, and these new methods could have a broader impact on other areas of computation as well.
Both molecular dynamics and Monte Carlo methods rely on a good energy function E(S). Although much effort has been made and many insights have been gained, it is still an important challenge to accurately model interatomic interactions, especially in realistic environments—for example, immersed in water or attached to a membrane—in a practical way. The overall approach is still not as precise as would be desired. Given the availability of a large number of solved protein structures, there is still room for applying certain statistical learning strategies to combine information from empirical data and physics principles to further improve the energy function.
In recent years, great successes have been made in computational protein structure prediction by taking advantage of the fast-growing protein structure databases. A well-known successful strategy is called “homology modeling,” or template-based modeling, which can provide a good approximate fold for a protein that has a homologous “relative”—that is, one with sequence similarity >30 percent—whose structure has been solved. Another attractive strategy successfully combines empirical knowledge of protein structures with Monte Carlo strategies for simulating protein folds. The idea is to implement only those structural modifications that are supported
by observed structural folds in the database. So far, these bioinformatics-based learning methods are able to predict well the folding of small globular proteins and those proteins homologous to proteins of known structure.
Future challenges include the prediction of structures of medium to large proteins, multidomain proteins, and transmembrane proteins. For multidomain proteins, many statistical learning strategies have been developed to predict which domains tend to interact with one another based on massive amounts of genomic and proteomic data. More developments in conjunction with structure modeling will be needed. There are other more technical, but also very important, challenges, such as how to rigorously evaluate a new energy function or a sampling method; how to more objectively represent a protein’s structure as an ensemble of structures instead of a single representative; how to evaluate the role of entropies; and how to further push the frontier of protein design. The potential impact of solving the protein-folding problem in its current form is muted by a grander, much more challenging issue. That is, how to come up with a description of protein folding and structure prediction, as well as of function and mechanism, that accords with a quantum mechanical view of reality. A protein’s dynamic properties, given that they presumably conform to the laws of quantum electrodynamics, may exhibit unexpected, counterintuitive behavior unlike anything that we have ever seen or that can be predicted based on classical physics, which is the prevailing viewpoint now owing to computational limitations. For example, when a protein folds, does it utilize “quantum tunneling” to get around (imaginary) classical energy barriers that would cause problems for a molecular dynamics program? This is a mathematical and algorithmic problem that has been largely left to one side by biologists because it appears to be intractable. Some highly novel, outside-the-box mathematical concepts will likely be required if one is to overcome these limitations. An important challenge for mathematical biologists is to help discover additional protein properties, beyond the handful identified thus far, that are nonclassical and thus counterintuitive. Currently, statistical approaches have provided an indirect way of attacking such problems and a way that is analogous to the statistical approaches used by classical geneticists to work around their lack of molecular biological and cytological methods. In a similar way, statistical approaches, applied in an evolutionary context, may augment our current arsenal of experimental and theoretical methods for understanding protein folding and predicting protein structure, function, and mechanisms.
The fundamental lemma is a seemingly obscure combinatorial identity introduced by Robert Langlands in 1979, as a component of what is now
universally called the Langlands program. The program as a whole consists of a set of conjectures that, addressed methodically, could resolve the most fundamental questions of number theory, phrased in the language of automorphic representations of the absolute Galois group.
The fundamental lemma is a statement about symmetries defined by objects called algebraic groups. One of several families of such groups is the group of linear transformations of an n-dimensional vector space. Some idea of the sweeping scope of the lemma and the Langlands program that rests on it can be gleaned from the fact that the proof of Fermat’s Last Theorem, a problem that lay open for over 300 years, rested on just the case n = 2.5 Langlands initially believed that the fundamental lemma would be an easy first step—although even its statement requires too much specialized knowledge and notation to accomplish here—and he assigned it as a thesis problem for a student, Robert Kottwitz. Through the work of Kottwitz and many others, some special cases were proven, but the general case remained unproven year after year. The lack of a proof came to be seen as the great obstacle to carrying out the program.
The fundamental lemma stimulated 30 years of valuable research in representation theory, number theory, algebraic geometry, and algebraic topology, and a proof was completed by Ngô Bao Châu in 2009. Ngô was promptly awarded a Fields medal for this work. (The proof ranked seventh on Time magazine’s list of the top 10 scientific discoveries of the year, not bad for a result whose statement alone must have been out of reach for the vast majority of Time’s readers.)
Any actual statement or sketch of the fundamental lemma would require many pages to be comprehensible. An excellent attempt to provide a rough understanding can be found in a recent expository paper.6 Although it is not practical here to attempt a conceptual picture of the Fundamental Lemma or the implications of its proof, this topic demonstrates that very difficult problems continue to be solved, while also providing fresh insights for further research. This accomplishment is strong evidence of the continuing vitality of the mathematical sciences.
It is interesting to note that, having grown up in Vietnam and studied in France, Ngô did his greatest work in the United States and is now a professor at the University of Chicago. This is an example of the way in which
5 The proof of Fermat’s Last Theorem was important and exciting for mathematicians because of the way the proof exposed deep connections between different areas of knowledge, not because people doubted the truth of the Theorem. By generalizing beyond the case of n = 2, the Langlands program could similarly reveal exciting and important connections and, with them, deep understanding.
6 David Nadler, 2012, The geometric nature of the fundamental lemma. Bulletin of the American Mathematical Society 49(1):1-50.
the powerful and appealing culture of mathematics in the United States attracts some of the greatest scientists in the world to come and settle here.
The prime numbers—whole numbers divisible only by themselves and 1—have been a focus of mathematical attention since the time of Euclid or before. This is partly because the primes are the building blocks of all numbers in a very precise sense; partly because the collection of primes displays a mixture of regularity and randomness that has been irresistibly tempting to curious mathematicians; and partly, in recent times, because of their amazing applications in computer science, especially cryptography.
Euclid’s work contains a proof that there are infinitely many primes. One very old puzzle is whether there are infinitely many pairs of primes whose difference is only 2, such as 5 and 7, or 11 and 13. In spite of the ease with which this question is stated, it remains unsolved. Another old question, raised by Lagrange and Waring in 1770, concerns primes in arithmetic progression. One form of this question is easy to state: Is there a prime number p and some number q so that every one of the million numbers in the arithmetic progression
is prime? Of course one could ask this same question with any number N in place of a million.
There was very little progress on this question until Ben Green and Terence Tao (who won a Fields medal based largely on his part in the work) proved that, indeed, for any N there are arithmetic progressions of N primes as above. In fact they proved much more; as with virtually every major mathematical advance, there is really a whole family of results, of which this is just an easy-to-state sample.
Why have we included this example to illustrate the vitality of the mathematical sciences? Because a key feature of the advance is the forging of a surprising link between prime numbers and two apparently unrelated fields of mathematics, harmonic analysis and ergodic theory. In this sense, the result of Green and Tao is typical of many great advances in mathematics, which seem to come from nowhere and combine apparently unrelated fields, opening new opportunities in the process.
Indeed, ergodic theory is generally considered a part of the mathematical theory of probability. The fact that it is useful in the study of prime numbers reflects the tension between treating primes as a completely deterministic phenomenon and recognizing that we can deal with them best by pretending that they are, in some ways, very random and thus best viewed
through the lens of probability theory. Green and Tao made progress by showing that even a very disordered set, like the set of primes, can sometimes be decomposed into a highly structured part and a part that behaves in a highly random way.
For a detailed but still relatively comprehensible treatment of this development the reader may consult the 2005 paper of Bryna Kra.7
Hierarchical modeling (HM) is a set of techniques used in two related ways: to estimate characteristics of population distributions such as means and variances, often through combining information from different sources and to forecast individual characteristics taken from the population. As an intuitive example of how HM works, one can take the problem of ranking batters in baseball after a few games, based on the proportions of times at bat that batters succeeded in getting a hit. The proportion (or batting average) is partly reflective of the ability of the batter, but it also contains a great deal of randomness because only a few games have been played. It is well recognized that the initially very high (or very low) batting averages of many players will revert to the mean as the season progresses, but it is not always recognized that this is an almost inevitable process owing to the large component of randomness that is present initially.
Such situations are naturally modeled by an HM. One assumes that there is an unknown “true” batting average for each batter, based on skill level, and then observes a combination of this true average and measurement error. The measurement error is modeled (typically, in this case, by a simple Bernoulli model), and the true batting averages are also modeled as arising from a “population distribution” (the unknown distribution of true batting averages); it is this second-level modeling that leads to the name “hierarchical modeling.” There are various possible ways to analyze the resulting model, but all can lead to interesting and surprising conclusions, such as a possible “cross-over effect,” wherein a batter with a higher current average but fewer games played can be predicted to have less skill than a batter with a lower current average but more games played (because the random component of this latter batter’s current average would be smaller). Of course, baseball aficionados would find such reversals to be reasonable.
While HM was introduced over 40 years ago, it became central to statistics and other sciences primarily over the last decade, as its computational demands have become tractable. In various contexts, hierarchical models have been referred to as random effects models, empirical Bayes models, multilevel
7 Bryna Kra, 2005, The Green-Tao theorem on arithmetic progression in the primes: An ergodic point of view. Bulletin (New Series) of the American Mathematical Society 43(1):3-23.
models, random coefficient models, shrinkage methods, hidden Markov models, and many other terms. The surge in development and use of HM coincided with the surge in computational power, as well as an explosion of accompanying theoretical and algorithmic developments going under the name of Markov chain Monte Carlo (MCMC) methods. The following illustrations indicate how HM is being used in science and society today:
• Climate and environmental studies often require the derivation of climate variables such as temperature or precipitation based on data from various sources. For example, in paleoclimate reconstruction, a climate field needs to be recovered from different types of observations such as tree rings and pollen records from lake sediments or ice cores, as well as information about external forcings. (The three main forcings are volcanism, solar irradiance, and greenhouse gases.) Any model that relates tree ring widths to climate conditions is imperfect and thus contains uncertainty; other uncertainties are introduced through factors such as measurement errors and the changing number of trees as a function of time. The HM framework provides a set of probability models to link different observations to the climate process and to model uncertainties at different levels. The estimated latent climate variables together with their corresponding uncertainties can be assessed explicitly. Since the climate and environmental data are typically from multiple sources, it is rarely possible for a single model to feature all different types of data and reflect their intricate relationships. On the other hand, HM is particularly useful for modeling their complex structures and integrating all the information to produce a coherent solution for the unknown climate process.
• Computational biologists use HM methods to analyze microarray data and to study patterns in genomic sequences of various species. For example, one can assume a latent hidden Markov structure for a protein sequence and use observations from multiple species or multiple analogous copies from one species to find common conserved parts of the protein, which often correspond to functionally critical regions and can be informative for drug design. A similar structure can also be designed for control regions (called promoters) of multiple coregulated genes so as to discover binding sites of transcription factors. Algorithms such as BioProspector, MEME, AlignACE, and MDscan are all successful implementations of such models and have been widely used by biologists. A very exciting tool resulting from such models is GENSCAN, which is highly successful for ab initio prediction of complete gene
structures in vertebrate, drosophila, and plant genomic sequences. Scientists also create HM structures to relate variances of the expressions of different genes so as to more accurately identify genes that behave (express) differently under different conditions—for example, in a cancerous fashion rather than a normal one.
• An extension of the basic HM structure is the Bayesian network (BN), which has become an important machine learning tool in artificial intelligence. A BN is a graphical approach to encode a probabilistic dependence model, in which each node in a graph represents a random variable that may or may not be observed and a directed link between two nodes indicates that they are dependent. Researchers find this structure very rich and efficient in learning relationships among many factors and making highly accurate predictions in complex situations. For example, scientists and engineers use BN for building spam-filtering tools, for information retrieval, image processing, biosurveillance, decision support systems, and so on.
• Public health researchers, Census Bureau scientists, and geographers use HM for spatial analysis, such as for disease mapping in regions and for demographic estimates in small areas. Actuaries refer to shrinkage estimation as credibility rate making in order to estimate the relative risks of different insured groups. Epidemiologists use these models for multiple comparisons and to help assess environmental risk. Economists and financial researchers improve population estimates via multilevel random regression coefficient models. The Food and Drug Administration uses these models to monitor the complexities of adverse drug reaction reports. The list goes on and on, and these applications will only grow as more data analysts reliably learn to use HMs, and as theoretical, algorithmic, and computational research continues to enlarge the domain of HM’s applicability.
Underlying much of engineering are algorithms that solve problems, often problems with deep and interesting mathematical structure. In recent years there have been significant improvements in our ability to solve such problems efficiently and to understand the limits of what is solvable.
Classical examples of deep algorithmic questions arise in all parts of the transportation industry, from airlines to package delivery. Algorithmic tools built on mathematical frameworks are used to design and update schedules and to plan routes with low demands on resources. Such algorithmic questions are not limited to transportation but are important in
almost all large-scale business decisions made across most industries. Over the last few decades the mathematical science community made great strides in developing and improving algorithmic tools and making these available for wide use through companies like IBM—for instance, as embodied in its CPLEX optimization software package—or start-ups like Gurobi. The algorithms used for optimization all are based on deep mathematical ideas, although their efficiency has often been established only experimentally. Recent work initiated by Spielman and Teng on smoothed analysis8 gives a new framework for understanding the efficiency of these methods, one that estimates performance probabilistically rather than focusing on the rarely seen worst-case outcomes.
Some algorithmic decisions need to be made instantaneously, without the benefit of information about trade-offs that is commonly part of tools for business optimization decisions. An environment where such optimizations are commonplace is scheduling of bits on the Internet. An example is the method used by Akamai, which serves almost all of the large Web sites. The company was based on a very theoretical algorithmic idea of how to best distribute content over the Internet.
Perhaps the best-known mathematical algorithm of great commercial interest is the RSA encryption scheme. The RSA system dates back to the mid-1970s, and its developers—Ron Rivest, Adi Shamir, and Leonard Adleman—won the 2002 Turing award for their discovery. Encryption is now the basis of all Internet commerce, used not only for encrypting messages but also for all forms of sensitive communications including authentication, digital signatures, digital cash, and electronic voting, to name just a few. Our understanding of how to make such applications possible has grown tremendously in recent years. New encryption schemes have been developed, providing encryptions that are more secure or simpler to use, and we better understand the level of security provided by encryption schemes in increasingly complex environments with many new forms of attacks. Maybe the most fundamental of these developments is the discovery of what is known as fully homomorphic encryption, which allows one to compute with the encrypted information without decrypting the messages or learning their content.
Coding of information allows us to reliably transmit information across a noisy channel, later correcting the errors at the receiver, and to efficiently store and retrieve information. Such error-correcting codes are the basis of much of the digital communication technology underlying cell phones, videos, CDs, and so on. Shannon’s classical work in this area used probability and statistics to bound the amount of information that can be
8 Daniel A. Spielman and Shang-Hua Teng, 2004, Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM 51(3):385-463.
transmitted in a noisy channel; this bound is now known as the Shannon capacity of the channel. Recent work by Spielman and others in the computer science theory community9 has developed new codes that can be encoded and decoded in linear time and reach the Shannon capacity even against worst-case noise models. Coding has also had an important impact on many seemingly unrelated fields, such as understanding the limits of what is efficiently computable. Coding can be seen as the key tool in the development of a useful new type of proof system called probabilistically checkable proofs. These are proofs whose correctness can be verified only probabilistically; for example, a randomized algorithm that only inspects a few characters from the proof might show it to be correct with, say, 99 percent certainty. In a stunning development in complexity theory, it has been shown that every correct proof can be converted into a probabilistically checkable proof with only a polynomial increase in the size of the proof, effectively providing for a highly redundant coding of the proof. The theory of probabilistically checkable proofs is a key component in allowing us to prove the computational hardness of finding approximately optimal answers to classical optimization problems. To see the connection, observe that finding the most convincing probabilistically checkable proof is an approximation problem that is hard: true statements have probabilistically checkable proofs, and any probabilistic proof aiming to prove a false statement must fail with high probability.
Inverse problems are those for which one creates an understanding of the internal structure of a system from external observations.10 The system itself is hidden, a black box that cannot be probed directly: Examples are a patient undergoing a medical imaging procedure, a nontransparent industrial object whose internal structural integrity is being examined, or a structure beneath Earth’s surface, such as a petroleum reservoir. One knows a lot about the physics of the system, but much about the structure and parameters needs to be reconstructed. Based on the way external signals are affected by passing through the black-box system, one needs to recover the unknown parameters of the system. Such problems lie at the heart of contemporary scientific inquiry and technological development. Applications include a vast variety of imaging techniques for purposes such as the early detection of cancer and pulmonary edema, locating oil and mineral
9 See, for example, Dan Spielman, 1996, Linear-time encodable and decodable error-correcting codes. IEEE Transactions on Information Theory 42(6):1723-1731.
10 The committee thanks Gunther Uhlmann of the University of Washington for contributing material for this section.
deposits in Earth’s crust, creating astrophysical images from telescope data, finding cracks and interfaces within materials, optimizing shapes, identifying models in growth processes, and modeling in the life sciences.
A typical inverse problem calls for determining the coefficients of a partial differential equation given some information about the solutions to the equation. Research in this area draws on a diverse array of mathematics, including complex analysis, differential geometry, harmonic analysis, integral geometry, numerical analysis, optimization, partial differential equations, and probability, and it builds strong linkages between applications and deep areas of mathematics.
An archetypal example of an inverse boundary problem for an elliptic equation is the by-now-classical Calderón problem, also called electrical impedance tomography (EIT). Calderón proposed the problem in the mathematical literature in 1980. In EIT one attempts to determine the electrical conductivity of a medium by taking measurements of voltage and current at the boundary of the medium. The information is encoded in the Dirichlet to Neumann (DN) map associated with the conductivity equation. EIT arises in several applications, including geophysical prospecting and medical imaging. In the last 25 years or so there has been remarkable progress on this problem. In the last few years this includes addressing the two-dimensional problem, the case of partial data, and also the discrete problem.
New inverse problems arise all the time because there are so many applications. For example, in medical imaging, there has been considerable interest in recent years in hybrid methods, also called multiwave methods, which combine a high-resolution modality with a high-contrast one. For example, in breast imaging ultrasound provides a high (submillimeter) resolution, but it suffers from low contrast, whereas many tumors absorb much more energy from electromagnetic waves than do healthy cells, so that imaging with electromagnetic waves provides very high contrast.
Research on Calderón’s problem led to new insight on the problem of making objects invisible to detection by electromagnetic waves, sound waves, and other types of waves. Invisibility has been a subject of human fascination for millennia, from the Greek legend of Perseus and Medusa to the more recent Harry Potter stories. Since 2003 there has been a wave of serious theoretical proposals in the mathematics and physics literature about cloaking devices—structures that would not only render an object invisible but also undetectable to electromagnetic waves. The particular route to cloaking that has received the most attention is that of transformation optics, the design of optical devices with customized effects on wave propagation. Transformation optics was developed first for the case of electrostatics—that is, for Calderón’s problem. Transformation optics takes advantage of the transformation rules for the material properties of optics: the index of refraction for scalar optics and acoustics, governed by
the Helmholtz equation, and the electric permittivity and magnetic permeability for vector optics, as described by Maxwell’s equations. The index of refraction of these materials is such that light rays go around a region and emerge on the other side as if they had passed through empty space. Advances in metamaterials (material not found in nature) have made possible the realization for microwave frequencies of the blueprints for cloaking found theoretically using transformation optics. There has been a concerted effort in recent years in constructing these materials for a broader range of frequencies, including visible light. Metamaterials were identified by Science as one of the top 10 insights of the last decade.11
There is a long history of interactions of mathematics, and in particular geometry, with theoretical physics. At one point in the mid-nineteenth century the fields were one and the same. For example, Dirichlet’s argument for the existence of harmonic functions on the disk with given boundary values on the circle was an appeal to physical intuition about electrostatics. One model for the interaction is seen in both the final formulations of quantum mechanics and of general relativity. By the late 1920s, after many false starts and partial formulations, quantum mechanics was finally formulated in terms of Hilbert spaces and operators acting on these spaces. These mathematical objects had been introduced by Hilbert in the 1880s for purely mathematical reasons having nothing to do with quantum mechanics, which had not even been conceived of then. It is interesting to note, though, that Hilbert called the decomposition of his operators the “spectral decomposition” because it reminded him of the spectrum of various atoms, something that was mysterious at the time and finally explained by quantum mechanics. Einstein struggled for many years to formulate general relativity without finding the appropriate mathematical context. Finally, he learned of Riemann’s work on what is now called Riemannian geometry. This was exactly the formalism he was looking for, and soon after learning about this theory he had formulated general relativity in terms of a four-dimensional space-time continuum with an indefinite Riemannian metric which is positive in the time directions and negative in the spatial directions. In each of these cases the mathematics in question had been developed for internal mathematical reasons and its existence predated the need for it in physics. It was there ready to use when physics was ready for it. Interactions such as these led the physicist
11 This paragraph draws on Allen Greenleaf, Yaroslav Kurylev, Matti Lassas, and Gunther Uhlmann, 2009, Cloaking devices, electromagnetic wormholes, and transformation optics. SIAM Review 51 (March):3.
Eugene Wigner to wonder what accounts for the unreasonable effectiveness of mathematics in physics.
A more recent version of the same basic pattern is the Yang-Mills theory. Here again the physicists were struggling to develop a mathematical framework to handle the physical concepts they were developing, when in fact the mathematical framework, which in mathematics is known as connections on principal bundles and curvature, had already been introduced for mathematical reasons. Much of the recent history of quantum field theory has turned this model of interaction on its head. When quantum field theory was introduced in the 1940s and 1950s there was no appropriate mathematical context. Nevertheless, physicists were able to develop the art of dealing with these objects, at least in special cases. This line of reasoning, using as a central feature the Yang-Mills theory, led to the standard model of theoretical physics, which makes predictions that have been checked by experiment to enormous precision. Nevertheless, there was not then and still is not today a rigorous mathematical context for these computations. The situation became even worse with the advent of string theory, where the appropriate mathematical formulation seems even more remote. But the fact that the mathematical context for these theories did not exist and has not yet been developed is only part of the way that the current interactions between mathematics and physics differ from previous ones. As physicists develop and explore these theories, for which no rigorous mathematical formulation is known, they have increasingly used ever more sophisticated geometric and topological structures in their theories. As physicists explore these theories they come across mathematical questions and statements about the underlying geometric and topological objects in terms of which the theories are defined. Some of these statements are well-known mathematical results, but many turn out to be completely new types of mathematical statements.
These statements, conjectures, and questions have been one of the main forces driving geometry and topology for the last 20 to 25 years. Some of them have been successfully verified mathematically; some have not been proved but the mathematical evidence for them is overwhelming; and some are completely mysterious mathematically. One of the first results along these lines gave a formula for the number of lines of each degree in a general hypersurface in complex projective four-dimensional space given by a homogeneous degree-5 polynomial. Physics arguments produced a general formula for the number of such lines where the formula came from a completely different area of mathematics (power series solutions to certain ordinary differential equations). Before the input from physics, mathematicians had computed the answer for degrees 1 through 5 but had no conjecture for the general answer. The physics arguments provided the general formula, and this was later verified by direct mathematical argument.
These direct mathematical arguments gave no understanding of the original physics insight that connected the formula with solutions to an ordinary differential equation. Indeed, finding such a connection is one of the central problems today in geometry. Many mathematicians work on aspects of this problem, and there are partial hints but no complete understanding, even conjecturally. This statement characterizes much of the current input of physics into mathematics. It seems clear that the physicists are on the track of a deeper level of mathematical understanding that goes far beyond our current knowledge, but we have only the smallest hints of what that might be. Understanding this phenomenon is a central problem both in high-energy theoretical physics and in geometry and topology.
Nowadays, sophisticated mathematics is essential for stating many of the laws of physics. As mentioned, the formulation of the standard model of particle physics involves “gauge theories,” or fiber bundles. These have a very rich topology. These topological structures are described by Chern-Simons theories, Index theory, and K-theory. These tools are also useful for condensed matter systems. They characterize topological phases of matter, which could offer an avenue for quantum computing.12 Here the q-bits are encoded into the subtle topology of the fiber bundle, described by Chern-Simons theory. Recently, K-theory has been applied to the classification of topological insulators,13 another active area of condensed matter physics.
String theory and mathematics are very closely connected, and research in these areas often straddles physics and mathematics. One recent development, the gauge gravity duality, or AdS/CFT, has connected general relativity with quantum field theories, the theories we use for particle physics.14 The gravity theory lives in hyperbolic space. Thus, many developments in hyperbolic geometry, and black holes, could be used to describe certain strongly interacting systems of particles. Thinking along these lines has connected a certain long-distance limit of gravity equations to the equations of hydrodynamics. One considers a particular black-hole, or black-brane, solution of Einstein’s equations with a negative cosmological constant. These black holes have long-distance excitations representing small fluctuations of the geometry. The fluctuations are damped since they end up being swallowed by the black hole. According to AdS/CFT, this system is described by a thermal system on the boundary, a thermal fluid of quantum interacting particles. In this formulation, the long-distance perturbations are described by hydrodynamics, namely by the Navier-Stokes equation
12 A. Yu. Kitaev, 2003, Fault-tolerant quantum computation by anyons. Annals of Physics 303:2-30
13 A.P. Schnyder, S. Ryu, A. Furusaki, and A.W.W. Ludwig, 2008, Classification of topological insulators and superconductors in three spatial dimensions. Physical Review Letters B 78:195125.
14 J. Maldacena, 2005, The illusion of gravity. Scientific American October 24.
or its relativistic analog. The viscosity term in this equation produces the damping of excitations, and it is connected with the waves falling into the black hole. Computing the viscosity in the particle theory from first principles is very difficult. However, it is very simple from the point of view of Einstein’s equations, because it is given by a purely geometric quantity: the area of the black hole horizon. This has been used to qualitatively model strongly interacting systems of quantum particles. These range from the quark-gluon fluids that are produced by heavy ion collisions (at the Relativistic Heavy Ion Collider at Brookhaven National Laboratory or at the Large Hadron Collider at Geneva) to high-temperature superconductors in condensed matter physics.
Many examples of AdS/CFT involve additional structures, such as supersymmetry. In these cases the geometry obeys special constraints, giving rise to Sasaki-Einstein spaces, which are closely related to Calabi-Yau spaces. This is merely an example of a more general trend developing connections between matrix models, algebraic curves, and supersymmetric quantum field theory.
A recent development of the past decade has been the discovery of integrability in N = 4 super-Yang-Mills. This four-dimensional quantum field theory is the most symmetric quantum field theory. The study of this highly symmetrical example is very useful since it will probably enable us to find some underlying structures common to all quantum gauge theories. Integrability implies the existence of an infinite-dimensional set of symmetries in the limit of a large number of colors. In this regime the particles of the theory, or gluons, form a sort of necklace. The symmetry acts on these states and allows us to compute their energies exactly as a function of the coupling. The deep underlying mathematical structures are only starting to be understood. Integrability in so-called (1 + 1)-dimensional systems has led to the development of quantum groups and other interesting mathematics. The way integrability appears here is somewhat different, and it is quite likely that it will lead to new mathematics. A closely related area is the computation of scattering amplitudes in this theory. A direct approach using standard methods, such as Feynman diagrams, quickly becomes very complicated. On the other hand, there are new methods showing that the actual answers are extremely simple and have a rich structure that is associated with the mathematics of Grassmanians. This has led to another fruitful collaboration.15
The connection between theoretical physics and mathematics is growing ever stronger, and it is supported by the emergence of interdisciplinary centers, such as the Simons Center for Geometry and Physics at Stony
15 See, for example, A.B. Goncharov, M. Spradlin, C. Vergu, and A. Volovich, 2010, Classical polylogarithms for amplitudes and Wilson loops. Physical Review Letters 105:151605.
Brook, and math/physics initiatives at various universities, such as Duke University, the University of Pennsylvania, and the University of California at Santa Barbara.
We live in a new age for statistical inference, where technologies now produce high-dimensional data sets, often with huge numbers of measurements on each of a comparatively small number of experimental units. Examples include gene expression microarrays monitoring the expression levels of tens of thousands of genes at the same time and functional magnetic resonance imaging machines monitoring the blood flow in various parts of the brain. The breathtaking increases in data-acquisition capabilities are such that millions of parallel data sets are routinely produced, each with its own estimation or testing problem. This era of scientific mass production calls for novel developments in statistical inference, and it has inspired a tremendous burst in statistical methodology. More importantly, the data flood completely transforms the set of questions that needs to be answered, and the field of statistics has, accordingly, changed profoundly in the last 15 years. The shift is so strong that the subjects of contemporary research now have very little to do with general topics of discussion from the early 1990s.
High dimensionality refers to an estimation or testing problem in which the number of parameters about which we seek inference is about the same as, or much larger than, the number of observations or samples we have at our disposal. Such problems are everywhere. In medical research, we may be interested in determining which genes might be associated with prostate cancer. A typical study may record expression levels of thousands of genes for perhaps 100 men, of whom only half have prostate cancer and half serve as a control set. Here, one has to test thousands of hypotheses simultaneously in order to discover a small set of genes that could be investigated for a causal link to cancer development. Another example is genomewide association studies where the goal is to test whether a variant in the genome is associated with a particular phenotype. Here the subjects in a study typically number in the tens of thousands and the number of hypotheses may be anywhere from 500,000 to 2.5 million. If we are interested in a number of phenotypes, the number of hypotheses can easily rise to the billions and trillions, not at all what the early literature on multiple testing had in mind.
In response, the statistical community has developed groundbreaking techniques such as the false discovery rate (FDR) of Benjamini and Hochberg, which proposes a new paradigm for multiple comparisons and has had a tremendous impact not only on statistical science but also in the
medical sciences and beyond.16 In a nutshell, the FDR procedure controls the expected ratio between the number of false rejections and the total number of rejections. Returning to our example above, this allows the statistician to return a list of genes to the medical researcher assuring her that she should expect at most a known fraction of these genes, say 10 percent, to be “false discoveries.” This new paradigm has been extremely successful, for it enjoys increased power (the ability of making true discoveries) while simultaneously safeguarding against false discoveries. The FDR methodology assumes that the hypotheses being tested are statistically independent and that the data distribution under the null hypothesis is known. These assumptions may not always be valid in practice, and much of statistical research is concerned with extending statistical methodology to these challenging setups. In this direction, recent years have seen a resurgence of empirical Bayes techniques, made possible by the onslaught of data and providing a powerful framework and new methodologies to deal with some of these issues.
Estimation problems are also routinely high-dimensional. In a genetic association study, n subjects are sampled and one or more quantitative traits, such as cholesterol level, are recorded. Each subject is also measured at p locations on the chromosomes. For instance, one may record a value (0, 1, or 2) indicating the number of copies of the less-common allele observed. To find genes exhibiting a detectable association with the trait, one can cast the problem as a high-dimensional regression problem. That is to say, one seeks to express the response of interest (cholesterol level) as a linear combination of the measured genetic covariates; those covariates with significant coefficients are linked with the trait.
The issue is that the number n of samples (equations) is in the thousands while the number p of covariates is in the hundreds of thousands. Hence, we have far fewer equations than unknowns, so what shall we do? This is a burning issue because such underdetermined systems arise everywhere in science and engineering. In magnetic resonance imaging, for example, one would like to infer a large number of pixels from just a small number of linear measurements. In many problems, however, the solution is assumed to be sparse. In the example above, it is known that only a small number of genes can potentially be associated with a trait. In medical imaging, the image we wish to form typically has a concise description in a carefully chosen representation.
In recent years, statisticians and applied mathematicians have developed a flurry of highly practical methods for such sparse regression problems. Most of these methods rely on convex optimization, a field that has
16 As of January 15, 2012, Google Scholar reported 12,861 scientific papers citing the original article of Benjamini and Hochberg.
registered spectacular advances in the last 15 years. By now there is a large literature explaining (1) when one can expect to solve a large underdetermined system by L1 minimization and when it is not possible and (2) when accurate statistical estimation is possible. Beyond the tremendous technical achievements, which have implications for nearly every field of science and technology, this modern line of research suggests a complete revision of those metrics with which the accuracy of statistical estimates is evaluated. Whereas classical asymptotic approaches study the size of errors when the number of parameters is fixed and the sample size goes to infinity, modern asymptotics is concerned with situations when the number of both parameters p and observations n tends to infinity but perhaps in a fixed ratio, or with p growing at most polynomially in n. Further, the very concept of errors has to be rethought. The question is not so much whether asymptotic normality holds but whether the right variables have been selected. In a sea of mostly irrelevantvariables, collected because data are now so readily acquired in many contexts. What is the minimum signal-to-noise ratio needed to guarantee that the variables of true importance can be identified?
Accurate estimation from high-dimensional data is not possible unless one assumes some structure such as sparsity, discussed above. Statisticians have studied other crucial structures that permit accurate estimation, again, from what appear to be incomplete data. These include the estimation of low-rank matrices, as in the famous Netflix problem, where the goal is to predict preferences for movies a user has not yet seen; the estimation of large covariance matrices or graphical models under the assumption that the graph of partial correlations is sparse; or the resolution of X-ray diffraction images from magnitude measurements only. The latter is of paramount importance in a number of applications where a detector can collect only the intensity of an optical wave but not its phase. In short, modern statistics is at the forefront of contemporary science since it is clear that progress will increasingly rely on statistical and computational tools to extract information from massive data sets.
Statistics has clearly taken on a pivotal role in the era of massive data production. In the area of multiple comparisons, novel methodologies have been widely embraced by applied communities. Ignoring statistical issues is a dangerous proposition readily leading to flawed inference, flawed scientific discoveries, and nonreproducible research. In the field of modern regression, methods and findings have inspired a great number of communities, suggesting new modeling tools and new ways to think about information retrieval. The field is still in its infancy, and much work remains to be done. To give one idea of a topic that is likely to preoccupy statisticians in the next decade, one could mention the problem of providing correct inference after selection. Conventional statistical inference requires that a
model be known before the data are analyzed. Current practice, however, typically selects a model after data analysis. Standard statistical tests and confidence intervals applied to the selected parameters are in general completely erroneous. Statistical methodology providing correct inference after massive data snooping is urgently needed.
Mechanism design is a subject with a long and distinguished history. One such example is designing games that create incentives to produce a certain desired outcome (such as revenue or social welfare maximization). Recent developments highlight the need to add computational considerations to classic mechanism-design problems.
Perhaps the most important example is the sale of advertising space on the Internet, which is the primary source of revenue for many providers of online services. According to a recent report, online advertising spending continues to grow at double-digit rates, with $25.8 billion having been spent in online advertising in the United States in 2010.17 The success of online advertising is due, in large part, to providers’ ability to tailor advertisements to the interests of individual users as inferred from their search behavior. However, each search query generates a new set of advertising spaces to be sold, each with its own properties determining the applicability of different advertisements, and these ads must be placed almost instantaneously. This situation complicates the process of selling space to potential advertisers.
There has been significant progress on computationally feasible mechanism design on many fronts. Three highlights of this research so far are the following:
• Understanding the computational difficulty of finding Nash equilibria. Daskalakis, Goldberg, and Papadimitriou won the Game Theory Society’s Prize in Game Theory and Computer Science in 2008 for this work.
• Quantifying the loss of efficiency of equilibria in games that do not perfectly implement a desired outcome, which is referred to as the price of anarchy. While initial success in this area has been in the area of games such as load balancing and routing, recent work in this direction that is relevant to online auctions also has great potential.
• Enabling computationally feasible mechanism design by developing techniques that approximately implement a desired outcome.
17 Data from emarketer.com. Available at http://www.emarketer.com/PressRelease.aspx?R=1008096. Accessed March 14, 2012.
The mathematical sciences contribute to medicine in a great many ways, including algorithms for medical imaging, computational methods related to drug discovery, models of tumor growth and angiogenesis, health informatics, comparative effectiveness research, epidemic modeling, and analyses to guide decision making under uncertainty. With the increasing availability of genomic sequencing and the transition toward more widespread use of electronic health record systems, we expect to see more progress toward medical interventions that are tailored to individual patients. Statisticians will be deeply involved in developing those capabilities.
To illustrate just one way in which these endeavors interact, consider some mathematical science challenges connected to the diagnosis and planning of surgery for cardiac patients. One of the grand challenges of computational medicine is how to construct an individualized model of the heart’s biology, mechanics, and electrical activity based on a series of measurements taken over time. Such models can then be used for diagnosis or surgical planning to lead to better patient outcomes. Two basic mathematical tasks are fundamental to this challenge. Both are much-studied problems in applied mathematics, but they need to be carefully adapted to the task at hand.
The first of these tasks is to extract cardiac motion from a time-varying sequence of three-dimensional computerized tomography (CT) or magnetic resonance imaging (MRI) patient images. This is achieved by solving the so-called deformable image registration problem, a problem that comes up over and over in medical imaging. To solve this problem—to effectively align images in which the principal subject may have moved—one needs to minimize a chosen “distance” between the image intensity functions of images taken at different times. Unfortunately, this problem is ill-posed: there are many different maps that minimize the distance between the two images, most of which are not useful for our purposes. To tease out the appropriate mapping, one must choose a “penalty function” for the amount of stretching that is required to bring the successive images into approximate alignment. Finding the right penalty function is a very subtle task that relies on concepts and tools from a branch of core mathematics called differential geometry. Once a good penalty function has been applied, the work also requires efficient computational algorithms for the large-scale calculations.
The second mathematical task is to employ the extracted cardiac motion as observational data that drive the solution of an inverse problem. In this case, the inverse problem is to infer the parameters for the bioelectromechanical properties of the cardiac model based on the motion observed externally through the imaging. The cardiac biophysical model draws on another area of mathematics—partial differential equations—and must
bring together multiple physics components: elasticity of the ventricular wall, electrophysiology, and active contraction of the myocardial fibers.
The full-blown setting of this problem is analogous to a “blind deconvolution” problem, in the sense that neither the model nor the source is fully known. As such, this presents enormous difficulty for the inversion solvers; as in the image registration case, it requires careful formulation and regularization, as well as large-scale computational solvers that are tolerant of ill-conditioning. Recent research18 is following a hybrid approach that interweaves the solution of the image registration and model determination problems.
The story of compressed sensing is an example of the power of the mathematical sciences and of their dynamic relationship with science and engineering. As is often the case, the development of novel mathematics can be inspired by an important scientific or engineering question. Then, mathematical scientists develop abstractions and quantitative models to solve the original problem, but the conversion into a more abstract setting can also supply insight to other applications that share a common mathematical structure. In other words, there is no need to reinvent the wheel for each instantiation of the problem.
Compressed sensing was motivated by a great question in MRI, a medical imaging technique used in radiology to visualize detailed internal structures. MRI is a wonderful tool with several advantages over other medical imaging techniques such as CT or X-rays. However, it is also an inherently slow data-acquisition process. This means that it is not feasible to acquire high-quality scans in a reasonable amount of time, or to acquire dynamic images (videos) at a decent resolution. In pediatrics for instance, the impact of MRI on children’s health is limited because, among other things, children cannot remain still or hold their breath for long periods of time, so that it is impossible to achieve high-resolution scans. This could be overcome by, for example, using anesthesia that is strong enough to stop respiration for several minutes, but clearly such procedures are dangerous.
Faster imaging can be achieved by reducing the number of data points that need to be collected. But common wisdom in the field of biomedical imaging maintained that skipping sample points would result in information loss. A few years ago, however, a group of researchers turned signal processing upside down by showing that high-resolution imaging was possible
18 H. Sundar, C. Davatzikos, and G. Biros, 2009, Biomechanically-constrained 4D estimation of myocardial motion. Medical Image Computing and Computer-Assisted Intervention (MICCAI):257-265.
from just a few samples. In fact, they could recover high-resolution pictures even when an MRI is not given enough time to complete a scan. To quote from Wired, “That was the beginning of compressed sensing, or CS, the paradigm-busting field in mathematics that’s reshaping the way people work with large data sets.”19
Despite being only a few years old, compressed sensing algorithms are already in use in some form in several hospitals in the country. For example, compressed sensing has been used clinically for over 2 years at Lucile Packard Children’s Hospital at Stanford. This new method produces sharp images from brief scans. The potential for this method is such that both General Electric and Phillips Corporation have medical imaging products in the pipeline that will incorporate compressed sensing.
However, what research into compressed sensing discovered is not just a faster way of getting MR images. It revealed a protocol for acquiring information, all kinds of information, in the most efficient way. This research addresses a colossal paradox in contemporary science, in that many protocols acquire massive amounts of data and then discard much of it, without much or any loss of information, through a subsequent compression stage, which is usually necessary for storage, transmission, or processing purposes. Digital cameras, for example, collect huge amounts of information and then compress the images so that they fit on a memory card or can be sent over a network. But this is a gigantic waste. Why bother collecting megabytes of data when we know very well that we will throw away 95 percent of it? Is it possible to acquire a signal in already compressed form? That is, Can we directly measure the part of the signal that carries significant information and not the part of the signal that will end up being thrown away? The surprise is that mathematical scientists provide an affirmative answer. It was unexpected and counterintuitive, because common sense says that a good look at the full signal is necessary in order to decide which bits one should keep or measure and which bits can be ignored or discarded. This view, although intuitive, is wrong. A very rich mathematical theory has emerged showing when such compressed acquisition protocols are expected to work.
This mathematical discovery is already changing the way engineers think about signal acquisition in areas ranging from analog-to-digital conversion, to digital optics, and seismology. In communication and electronic intelligence, for instance, analog-to-digital conversion is key to transducing information from complex radiofrequency environments into the digital domain for analysis and exploitation. In particular, adversarial communications can hop from frequency to frequency. When the frequency range is large, no analog-to-digital converter (ADC) is fast enough to scan the
19 Jordan Ellenberg, 2010, Fill in the blanks: Using math to turn lo-res datasets into hi-res samples. Wired, March.
full range, and surveys of high-speed ADC technologies show that they are advancing at a very slow rate. However, compressed sensing ideas show that such signals can be acquired at a much lower rate, and this has led to the development of novel ADC architectures aiming at the reliable acquisition of signals that are in principle far outside the range of current data converters. In the area of digital optics, several systems have been designed. Guided by compressed sensing research, engineers have more design freedom in three dimensions: (1) they can consider high-resolution imaging with far fewer sensors than were once thought necessary, dramatically reducing the cost of such devices; (2) they can consider designs that speed up signal acquisition time in microscopy by orders of magnitude, opening up new applications; and (3) they can sense the environment with greatly reduced power consumption, extending sensor life. Remarkably, a significant fraction of this work takes place in industry, and a number of companies are already engineering faster, cheaper, and more-efficient sensors based on these recently developed mathematical ideas.
Not only is compressed sensing one of the most applicable theories coming out of the mathematical sciences in the last decade, but it is also very sophisticated mathematically. Compressed sensing uses techniques of probability theory, combinatorics, geometry, harmonic analysis, and optimization to shed new light on fundamental questions in approximation theory: How many measurements are needed to recover an object of interest? How is recovery possible from a minimal number of measurements? Are there tractable algorithms to retrieve information from condensed measurements? Compressed sensing research involves the development of mathematical theories, the development of numerical algorithms and computational tools, and the implementation of these ideas into novel hardware. Thus, progress in the field involves a broad spectrum of scientists and engineers, and core and applied mathematicians, statisticians, computer scientists, circuit designers, optical engineers, radiologists, and others regularly gather to attend scientific conferences together. This produces a healthy cycle in which theoretical ideas find new applications and where applications renew theoretical mathematical research by offering new problems and suggesting new directions.