to design a network so that it has certain properties, how best to ascertain the structure of a network by the things one can actually measure, and how to search the network most effectively. There are purely mathematical questions relating to how to characterize asymptotically different classes of large random graphs and, for such classes, potential analogs of the theorems about Erdõs-Rényi graphs and the cutoff value of p for which there is an infinite connected component of the graph.

With the advent of digital images, the question of how to analyze them—to get rid of noise and blurring, to segment them into meaningful pieces, to figure out what objects they contain, to recognize both specific classes of objects such as faces and to identify individual people or places—poses remarkably interesting mathematical and statistical problems. Core mathematicians are aware of the extraordinary work of Fields medalist David Mumford in algebraic geometry, but many may be unaware of his seminal work in image segmentation (the Mumford-Shah algorithm, for example). Approaches using a moving contour often involve geometrically driven motion—for example, motion by curvature—and techniques such as Osher-Vese based in analysis involve decompositions of the image intensity function into two components, one minimizing total variation (this piece should provide the “cartoon”) and one minimizing the norm in the dual of the space of functions of bounded variation (this piece should provide the “texture”).

In machine learning, the starting point for many algorithms is finding a meaningful notion of distance between data points. In some cases, a natural distance suggests itself—for example, the edit distance for comparing two sequences of nucleotides in DNA that appear in different species where the expected relationship is by random mutation. In other cases, considerable insight is called for—to compare two brain scans, one needs to “warp” one into the other, requiring a distance on the space of diffeomorphisms, and here there are many interesting candidates. For large data sets, the distance is sometimes found using the data set itself—this underlies the method of diffusion geometry, which relates the distance between two data points to Brownian motion on the data set, where only a very local notion of distance is needed to get started. There are interesting theoretical problems about how various distances can be bounded in terms of one another, and to what extent projections from a high-dimensional Euclidean space to a lower-dimensional one preserves distances up to a bounded constant. This is one facet of dimensionality reduction, where one looks for lower-dimensional structures on which the dataset might lie.

Many of these problems are part of large and very general issues—dealing with “big data,” understanding complex adaptive systems, and search and knowledge extraction, to name a few. In some cases, these represent new areas of mathematics and statistics that are in the process of being created and where the outlines of an emerging field can only be glimpsed “through a glass, darkly.” Research in core mathematics has a long track record of bringing the key issues in an applied problem into focus, finding the general core ideas needed, and thereby enabling significant forward leaps in applications. We take this for granted when

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement