Mathematics is facing a pivotal junction where it can either continue to utilize digital mathematics literature in ways similar to traditional printed literature, or it can take advantage of new and developing technology to enable new ways of advancing knowledge. This report details how information contained in individual items within the literature could be readily extracted and linked to create a comprehensive digital mathematics information resource that is more than the sum of its contributing publications. That resource can serve as a platform and focal point for further development of the mathematical knowledge base.
This new system, referred to throughout the report as the Digital Mathematics Library (DML), could support a wide variety of new functionalities and services over aggregated mathematical information, including dramatically improved capabilities for searching, browsing, navigating, linking, computing, visualizing, and analyzing the literature.
The Alfred P. Sloan Foundation commissioned this study and charged the committee to:
- Evaluate the potential value of a virtual global library of mathematical science publications;
- Assuming that a stable context for sharing copyrighted information has been achieved, assess the remaining issues to be addressed in setting up such a library;
- Identify a range of desired capabilities of such a library; and
- Characterize resource needs.
While a traditional library is perhaps the oldest formal information resource available, the manifestation of libraries has evolved dramatically over the past few decades. In many cases within mathematics, as for other fields of scholarship, buildings housing paper publications have given way to online collections of downloadable documents. While this increased access is not perfect—not all material is readily available to all researchers, and search tools vary from site to site—widespread digitization has made it easier for many to access the mathematical literature. Overall, a much greater proportion of the mathematical literature is available to more people than at any time before. The research libraries, scholarly societies, and other players that curate and steward this material continue to grapple with issues, such as long-term preservation of digital materials, but it is fair to say there exists a fairly comprehensive, distributed “digital library” for mathematics offering a much improved but not fundamentally different version of what existed in the time of printed books and journals.
The committee has thus taken the term library in its charge to mean a system that accumulates and shares knowledge, rather than the more traditional library that houses documents, either digital or physical. The committee’s focus has been on functionality that can meet the needs of mathematicians facing a rapidly expanding and diversifying knowledge base. The committee has largely ignored traditional issues of assembling and stewardship of those collections, which are being handled well, for the most part, by the existing distributed digital library.
The committee envisions its target digital library users to be working research mathematicians and advanced graduate students beginning their research careers throughout the world (hence the word global). The library discussed does not specifically target students below the advanced graduate student level or researchers outside of mathematics, although both sets would likely constitute some of the library’s user base. Having a clear understanding of the target user base directly impacts the types of content the library targets and the types of services it provides. The committee also believes that the disciplinary scope of the mathematics that this library could provide is best left undefined for now. Mathematics and the mathematical sciences have diffuse boundaries, and this committee takes no stance on where appropriate content lies. However, this is an issue that will have to be addressed by either a future management organization or the community of users.
The committee believes that there is much room for innovation and progress in the mainstream mathematical information services. To determine which potential areas for innovation are of the most interest to the mathematics community, the committee held three meetings where it heard from outside presenters on issues relevant to mathematics (November 27-28, 2012; February 19-20, 2013; and May 30-31, 2013—agendas for these meetings can be found in Appendix A) and two public data-gathering sessions (at the University of Minnesota on May 6, 2013, and at Northwestern University on May 30, 2013), posted questions on two mathematics discussion forums (MathOverflow1 and Math 2.02), and wrote a guest entry on Professor Terry Tao’s mathematics blog.3 The committee also referred to the information shared at the World Digital Mathematics Library workshop held by the International Mathematical Union (IMU) on June 1-3, 2012.4
The committee made an assessment of what computers can do today, what computers can help mathematicians to do, and how rapidly these capabilities are likely to grow, if provided with some ongoing focused research funding. The committee’s consensus is that by some combination of machine learning methods and community-based editorial effort, a significant portion of the information and knowledge in the global mathematical corpus could be made available to researchers as linked open data. Broadly defined, linked open data are structured data that are published in such a way that makes it easy to interlink them with other data, thereby making it possible to connect them with information from multiple sources. This connected data can provide a user with a more meaningful query of a subject by consolidating relevant information from a variety of places (e.g., in different research papers) and pulling out specific components that the user might be particularly interested in. The committee envisions that much of the existing mathematical information can be provided as linked open data through a central organizational entity—referred to in this report as the DML. It should be noted that linked open data are not the only way that this can be accomplished, but they are essentially today’s standard for ontologies and other important representations. The committee believes that the DML should make use of current best practices rather than trying to develop some other alternative, whenever possible.
1 I. Daubechies, “Math Annotate Platform?,” MathOverflow (question and answer site), February 18, 2013, http://mathoverflow.net/questions/122125/math-annotate-platform.
4 Many of the materials presented at the International Mathematics Union’s DML workshop can be found at http://ada00.math.uni-bielefeld.de/mediawiki-1.18.1/index.php/, updated April 23, 2013.
This report consists of five main chapters and several appendices. The rest of this chapter discusses previous digital mathematics library efforts, the universe of mathematical information, relevant conceptual tools, and current mathematical resources. Chapter 2 discusses what is missing from the mathematical information landscape and what gaps the DML would fill, and elaborates on the desired DML capabilities from a user’s perspective. This includes a discussion of what types of features would make the mathematical literature and current resource capability more meaningful to a mathematical researcher. Chapter 3 discusses some of the broad issues that the DML would face during development, including developing partnerships, managing large data sets, navigating open access, and planning for system and data maintenance. Chapter 4 provides a strategic plan for the development of the DML, including a discussion of fundamental principles, the constitution of a governing organization, steps toward initial development, and resources that would be needed. Chapter 5 discusses some details of entity collections and technical considerations for the DML that will be needed to make the features and capabilities discussed in Chapter 2 a reality.
In preparing this report, the committee reviewed many existing digital resources for mathematics, as well as relevant initiatives in some other sciences. A brief discussion of these tools is given in Appendix C.
The idea of a comprehensive digital mathematics library has been around for decades, and there have been several incarnations of the idea with different foci. The first step in this vision was retrospective digitization of the older parts of the literature that did not already exist in digital form, and this has largely been achieved (though the quality, and hence utility, of these converted materials varies widely, ranging from simple page scans to carefully proofread markups).
The Cornell University Digital Mathematics Library Planning Project was funded by the National Science Foundation from 2003 to 2004 as a step “toward the establishment of a comprehensive, international, distributed collection of digital information and published knowledge in mathematics.”5 Its vision statement reads as follows:
In light of mathematicians’ reliance on their discipline’s rich published heritage and the key role of mathematics in enabling other scientific disci-
5 Cornell University Library, Digital Mathematics Library. S.E. Thomas, principal investigator, R.K. Dennis and J. Poland, co-principal investigators, http://www.library.cornell.edu/dmlib/, last updated December 2, 2004.
plines, the Digital Mathematics Library strives to make the entirety of past mathematics scholarship available online, at reasonable cost, in the form of an authoritative and enduring digital collection, developed and curated by a network of institutions.
A follow-up report from the International Mathematical Union (IMU, 2006) shared this vision of a distributed collection of past mathematical scholarship that served the needs of all science, and it encouraged mathematicians and publishers of mathematics to join together in implementing this vision. However, it was clear within a few years that this vision was not going to become a reality soon. As David Ruddy of Project Euclid wrote (Ruddy, 2009):
The grand vision of a Digital Mathematics Library, coordinated by a group of institutions that establish policies and practices regarding digitization, management, access, and preservation, has not come to pass. The project encountered two related problems: it was overly ambitious, and the approach to realizing it confused local and community responsibilities. While the vision called for a network of distributed, interoperable repositories, the committee approached and planned the project with the goal of building a single, unified library.
At the time of this study, there has been some progress in this vision of a single, unified library in the form of the European Digital Mathematics Library (EuDML) project.6 The EuDML project, funded from 2010-2013 by the European Commission, created a network of 12 European repositories acquiring selected mathematical content for preservation and access and made progress in establishing a single distributed library with a collection of about 225,000 unique items, spanning 2.6 million pages. The EuDML succeeded in creating a unified metadata framework7—which includes items about a document such as the title, authors, abstract, comments, report number, category, journal reference, direct object identifier, Mathematics Subject Classification (MSC), and Association for Computing Machinery (ACM) computing classification—that is shared by these repositories and providing a single point of access to publications in these repositories, albeit with limited rights to search the full text from some sources. Impressive as the EuDML is, when compared to the full size and scope of the universe of published mathematics (described in the next section), and given the
6 T. Bouche, Université de Grenoble, “From EuDML to WDML: Next Steps,” Presentation to the committee on November 27, 2012.
7 European Digital Mathematics Library, “Appendix, EuDML Metadata Schema (Final)/ Tagging Best Practices,” in EuDML Metadata Schema Specification (v2.0-final), https://project.eudml.org/sites/default/files/d36-appendix_uncropped.pdf, accessed January 16, 2014.
essential requirement to integrate with copyrighted materials and the clear desirability and cost-effectiveness of leveraging existing repositories and services, the EuDML experience only emphasizes the difficulties inherent in aiming for a single, centrally managed and truly comprehensive collection of digitized mathematics as the cornerstone for a comprehensive DML. With the advent of recent advances in technology and the advantage of experience gained on EuDML and other projects, the study committee concluded that a more effective approach going forward would be to partner with existing content providers and focus instead on the innovations and elements of shared infrastructure and knowledge management that are not being adequately addressed by other entities (i.e., rather than on central harvesting and aggregation of primary content). The committee believes that this vision is consistent with the original vision of the EuDML, although it was not realized by that project.
Another example of an online resource that helps users connect with knowledge is the National Science Digital Library (NSDL).8 NSDL is an online educational resource for teaching and learning, with current emphasis on the sciences, technology, engineering, and mathematics. NSDL does not hold content directly—instead, it provides structured metadata about Web-based educational resources held on other sites by providers who contribute this metadata to NSDL for organized search and open access to educational resources via NSDL.org and its services.
A discussion of many other efforts and current digital resources can be found in Appendix C.
The Alfred P. Sloan Foundation supported a World Digital Mathematics Library workshop in June 2012,9 which was planned by the IMU’s Committee on Electronic Information and Communication. This workshop provided a wealth of information to the committee on the current state of the art and research efforts aimed at making the World Digital Mathematics Library a reality.
Much of the straightforward work of assembling digital mathematics libraries has been done (e.g., digitizing material, aggregating it into small to medium-sized collections). The difficulties that the EuDML faced in creating a single large aggregation of mathematics literature and the difficulty of other World Digital Mathematics Library efforts in gaining community support indicates that these challenges are unlikely to be overcome soon. The committee notes that there has been sizable ongoing investment from publishers (both commercial and noncommercial) to retrospectively digi-
9 International Mathematics Union, “The Future World Heritage Digital Mathematics Library: Plans and Prospects,” updated April 23, 2013, http://ada00.math.uni-bielefeld.de/mediawiki-1.18.1/index.php/Main_Page.
tize historical runs of their copyrighted journals and also, in many cases, even earlier historical materials that are now out of copyright, in order to capture comprehensive representations of their journals. However, broad services such as Google Scholar now provide much of the functionality that many of these specialized efforts had hoped to achieve in building comprehensive and coherent collections of the mathematical literature. Such services achieve this functionality by searching across a range of repositories, rather than trying to collect all of the material in one (or a very few) repositories. In the committee’s view, efforts to build centralized comprehensive resources are reaching a point of diminishing returns.
Finding: The construction of mathematical libraries through centralized aggregation of resources has reached a point of diminishing returns, particularly given that much of this construction has been coupled with retrospective digitization efforts.
While there is still a substantial amount of historical (mostly out of copyright) mathematical literature that would benefit from retrospective digitization, or higher quality digitization than has currently been done, the committee does not believe that there is justification for a major new program and investment in this area. In particular, although there is value in modest, sustained investment in existing efforts, these will make only incremental contributions. While the fundamental importance of the heritage literature remains, its size, as a fraction of the overall mathematics literature, is diminishing steadily. No amount of additional retrospective digitization will result in a fundamental change in the way that the mathematical literature can be used in new ways or evolved to meet new research needs. Moreover, while the historical (e.g., out of copyright) segments of the mathematical literature are valuable, any genuinely meaningful large-scale change in accessing the mathematical literature and knowledge base must encompass not only heritage but also current literature. Thus, the committee believes that a very different set of investments (as described in this report) is where the transformative opportunities await.
The next section provides some more detailed information on the existing landscape of mathematical literature and how much has been digitized.
Mathematics shares more with the arts than the sciences, in that its primary data are human creations, perhaps representations of ideas in a platonic realm, rather than data derived by observation or measurement of the physical universe. Mathematical information is primarily mined from its own literature or derived by computation. This section describes the state of
mathematical publishing and the world of mathematical objects that exist within the publications.
Digital Mathematical Publications
Most of the mathematics literature of the 20th century is now available digitally. Through the Jahrbuch Electronic Research Archive for Mathematics10 project and the independent efforts of publishers and others, much of the most important mathematical research of the last half of the 19th century also has been digitized. Appendix C provides an overview of the many sources for digitized mathematical source material, including repositories and many other types of sources, whether freely accessible or behind paywalls (and thus only accessible to subscribers). A large part of the mathematics literature in electronic form consists of papers written in the past 20 years. This portion of the literature is searchable and navigable by any user of a library with access to the main subscription services controlled by libraries and publishers.
In addition, a considerable body of the heritage literature in mathematics has been digitized over the past 15 years. The most comprehensive listing of the retro-digitized mathematics literature is Ulf Rehmann’s list of Retrodigitized Mathematics Journals and Monographs,11 which is a list of titles of serials and books that have been digitized without metadata. 12 Much of this metadata has found its way into indexes maintained by Google, MathSciNet, and Zentralblatt (zbMATH).13
The digital corpus of mathematics literature is extensive. The MathSciNet14 database includes approximately 2.9 million publications from 1940 to the present, with direct links to 1.7 million of them. MathSciNet currently indexes more than 2,000 journal/serial titles and contains about 100,000 books (post 1960). Of the items currently available on MathSciNet, 2.6 million of them are from the 1970s or later, and 1.7 million are from 1990 onward. The American Mathematical Society has kept track of new journal titles in the field since 1997, and there has been an average growth of about 40 new journal titles per year in mathematics.
11 DML: Digital Mathematics Library, http://www.mathematik.uni-bielefeld.de/~rehmann/DML/dml_links.html, accessed January 16, 2014.
12 Metadata are broadly defined as data about data. In the case of a typical mathematics journal digital publication, metadata may include information such as author, journal name and volume, date of publication, time of file creation, size of file.
zbMATH (1931-present) contains more than 3 million publications and currently indexes approximately 3,500 journals. The annual production of mathematics papers is more difficult to quantify. There has been a steady increase in the number of math papers added to arXiv15 over the past 5 years (shown in Table 1-1), although it is not clear from these data if this shows an increase in mathematics publications or an increase in mathematicians’ willingness to post their papers. Annual entries on MathSciNet and the number of mathematics papers listed in Web of Science16 have both remained relatively constant around 90,000 and 20,000, respectively (see Tables 1-2 and 1-3).
Components of the digitized corpus of mathematics are increasingly included in a variety of stable, well-curated repositories, although access to much of this corpus remains limited by copyright or other intellectual rights restrictions. For example, in terms of retrospectively digitized works cataloged under the subject heading (or subheading) of “mathematics,” the HathiTrust Digital Library17 includes approximately 40,000 bibliographically distinct resources.18 Of these, only 6,800 were digitized from public-domain works; the rest were digitized from copyrighted originals. These numbers are a mix of monograph titles and serial titles (a serial title in HathiTrust typically encompasses a complete run of a journal, edited series, or conference publication series). Each serial run could be expected to include tens or even hundreds of issues, with each issue containing at least several articles or papers. In terms of pages, using the HathiTrust repository-wide ratio of pages per bibliographic resource to estimate, this translates to a rough estimate of 25.5 million pages of retrospectively digitized mathematics in HathiTrust with approximately 17 percent (6,800 out of 40,000) digitized from public-domain sources.
The basic trends seem clear: more and more of the corpus of mathematical literature will be in digital form, including some with high-quality markup, specifically those items that are “born” digital or retro-digitized to be in a machine readable format and that use typesetting such as LaTeX or MathML (as opposed to page images of publications). As mentioned before, the fraction of the overall corpus that is pre-1970 is rapidly diminishing due to the relative explosion in the annual rates of publication in recent decades (however, this should in no way be seen as diminishing the fundamental importance of heritage literature).
18 Current as of September 2013.
TABLE 1-1 Number of Mathematics Papers Added to arXiv Annually Between 2008 and 2012
|Year||Mathematics Papers Added to arXiv|
SOURCE: arXiv, http://arxiv.org/, accessed January 16, 2014.
TABLE 1-2 Number of Articles in Research Journals in MathSciNet Annually Between 2006 and 2012
|Publication Year||Entries in MathSciNet|
NOTE: A steady growth of about 3 percent per year is seen.
SOURCE: American Mathematical Society, MathSciNet, http://www.ams.org/mathscinet/, accessed January 16, 2014.
TABLE 1-3 Mathematics Papers Listed in Web of Science Annually Between 2008 and 2012
|Year||Mathematics Papers Listed in Web of Science|
SOURCE: Thomson Reuters, “Web of Science Core Collection,” http://thomsonreuters.com/web-of-science/, accessed January 16, 2014.
Objects in the Mathematical Literature
Information found in the mathematical literature is diverse but largely falls into two main categories:
1. Bibliographic information, such as
a. Documents (e.g., articles, books, proceedings, talks, diagrams, homepages, blogs, videos);
b. People (e.g., authors, editors, referees, reviewers);
c. Events (e.g., discoveries, publications, conferences, talks, births, deaths, degrees, awards);
d. Organizations (e.g., universities, publishers, journals, libraries, service providers);
e. Subjects (e.g., major branches of mathematics—algebra, geometry, analysis, topology, probability, statistics—as well as their intersections and interactions and their various sub-branches, down to even finer topics and including ubiquitous mathematical terms like “number,” “set”)
2. Mathematical concepts (e.g., axioms, definitions, theorems, proofs, formulas, equations, numbers, sets, functions) and objects (e.g., groups, rings).
Collecting and aggregating mathematical bibliographic information has been the path many digital libraries and digital resources have taken in the past (Chapter 2 and Appendix C discuss many of these efforts to date). While there are many challenges in collecting this information, the even more difficult work lies in collecting mathematical concepts, which lack the standardization that most bibliographic information has acquired. However, an ability to explore these mathematical objects within the literature offers the potential to uncover currently under-explored connections in mathematics.
The recent National Research Council report The Mathematical Sciences in 2025 (NRC, 2013) discusses the importance of mathematical structures, which are part of the larger mathematical concepts described above:
A mathematical structure is a mental construct that satisfies a collection of explicit formal rules on which mathematical reasoning can be carried out. . . . What is remarkable is how many interesting mathematical structures there are, how diverse are their characteristics, and how many of them turn out to be important in understanding the real world, often in unanticipated ways. Indeed, one of the reasons for the limitless possibilities of the mathematical sciences is the vast realm of possibilities for mathematical structures. . . . A striking feature of mathematical structures is their hierarchical nature—it is possible to use existing mathematical
structures as a foundation on which to build new mathematical structures . . . . Mathematical structures provide a unifying thread weaving through and uniting the mathematical sciences. (pp. 29-30)
Given the size, diversity, and inherent nature of mathematics information in categories 1 and 2 above, it is clearly not sufficient to simply provide undifferentiated access to the universe of mathematics monographs, journal articles, and conference papers. Instead, the online research literature of mathematics must be organized into a well-structured network of resources linked together based on a variety of attributes—bibliographic and topical, of course, but also linked in a highly granular fashion on commonalities of mathematical structures and the shared use of mathematical objects, reasoning, and methodologies. The committee believes that the greatest potential for the DML lies in providing mathematicians access to a well-structured network of information and building services that both enhance and utilize this data. In the context of today’s Web environment, a well-structured network implies adherence to the Semantic Web19 and linked open data principles and to community-endorsed standards and best practices. While the foundation for such a well-structured network of digital research mathematics exists in established repositories and component digital libraries, the underlying thesauri and ontologies of mathematical objects do not yet exist (or have not yet been given permanence and formal identity), and the agreements on best practices for interoperability and the implementation of linked open data principles in the context of research mathematics repositories have not yet been reached.
General conceptual tools that are used to structure, organize, represent, and share knowledge include the closely related ideas of ontologies, taxonomies, and vocabularies. There is considerable debate about the precise definitions and differences among these tools, although ontologies (most commonly viewed as a tool for defining some classes of objects—the attributes that these objects may have and the way in which these objects may be related to each other) are usually seen as the most general formulation (Gruber, 2009). Taxonomies are specific, usually hierarchical, collections of terms that can be used to describe or classify objects in some contexts—examples of these include subject headings or the naming schemes used in biological systematics. “Controlled” vocabularies are collections of values that can be used to populate specific instances of object attributes within an ontology; in a certain sense, they are equivalent to taxonomies in that
they can be used to classify. However, controlled vocabularies are often “flat,” without other internal structure among the possible values, whereas taxonomies commonly include very rich internal hierarchical structure. Ontologies, vocabularies, and taxonomies work together. As a simple example, a part of an ontology might define a specific class of objects called documents; each of these has attributes that include subjects and languages. One might have a list of possible language values (a controlled vocabulary) associated with the ontology and also a tree structure of subject headings (a taxonomy, though it could also viewed as a simple vocabulary).
For instance, within the mathematical sciences, the widely accepted Bibliographic Ontology20 provides a fairly adequate accounting of the many common relations between objects in categories 1a through 1e listed above. The BibTeX21 schema that describes the structure of BibTeX records defines a similar ontology. The Citation Typing Ontology (CiTO)22 is an ontology for description of the citation relation between documents. The Mathematics Subject Classification (MSC2010)23 provides a very well thought out, largely hierarchical taxonomy for the classification of mathematical documents by subject, and thence for the subjects themselves. OpenMath,24 discussed further in Chapter 5, offers a potential standard for representing the semantics of mathematical objects that is very relevant to the DML’s goals.
The application of such ontologies to a mathematical objects data set can create graphical structures of information that can provide new insights. For instance, citations generate a citation graph, and collaborations generate a collaboration graph. Such graphical structures are commonly embedded in the structure of hyperlinked webpages, thereby connecting literature that was not obviously related otherwise.
Development of new ontologies is a complex process requiring a high level of community effort for consensus, even for limited sets of relations. The committee expects that when communities start to curate various digital collections of records of mathematical entities, there will be some “bottom up” development of at least minimal ontologies for these entities, as has already occurred with MSC2010 and OpenMath. The structure of these ontologies will be reflected in the necessary schemas25 for description of the objects they involve, and the graphical relations induced by these
25 A schema is broadly defined as a representation of a plan or theory in the form of an outline or model.
ontologies will be of potentially great interest in the process of extracting information and knowledge from mathematical publications.
The management of formal representations of mathematical concepts is known as mathematics knowledge management (Carette and Farmer, 2009). In this report, this issue is viewed more broadly as the management of mathematical information and concepts, both formal and informal, including the bibliographic information and mathematical concepts categories of objects introduced in the previous section, only the latter of which can be usefully regarded as part of mathematics itself.
Bibliographic Resources in Mathematics
Several general bibliographic resources exist, and some of these are described in Appendix C. Among them, mathematicians typically use Google26 and Google Scholar27 most often, although CrossRef28 is “under the hood” whenever a user navigates from one publisher’s site to another by a reference link. While many mathematicians heavily utilize these general information services because of their power and ubiquity, some mathematicians prefer the discipline-specific abstracting and indexing services provided by MathSciNet29 and zbMath.30 This discipline-specific service preference is partly for historical reasons and partly because the focus and quality of metadata provided by these services in mathematics makes it easier to find publications of interest. Both services offer bibliographic entries in BibTeX,31 which is machine-readable and reusable, for preparation of reference lists for LaTeX32 documents, and, with more technical effort, for publication of online bibliographies in HTML33 or JSON.34 Using search engines with access to well-curated bibliographic metadata and full-text indexing is how most mathematicians find mathematical primary sources today.
Services such as MathSciNet, zbMATH, and Google Scholar provide complementary and somewhat overlapping services. One distinct difference is that MathSciNet is organized chronologically and referentially, while Google Scholar is based on “importance” as qualified by page ranks or some variant thereof. Both are important and are used in literature searches. MathSciNet is great for tasks such as listing all articles by an author and listing all articles in a specific mathematical field, and it has high-quality metadata that are needed for many purposes. Its search capabilities are limited because it only searches over metadata. Google Scholar is often better for searches because it searches over full text, including reference lists, and has better ranking or returns for most purposes. One issue that some mathematicians have with Google Scholar is that it is not possible to limit searches to math or subfields of math. MathSciNet, zbMATH, and Google Scholar combined do a good job providing conventional discovery over the corpus of traditionally published mathematical literature, but no services currently provide a finer-grain search capability that allows a user to search for mathematical objects or ideas that cannot be easily defined by text search, such as an equation or the evolution of a specific notation. Ideally, a mathematician should have the best of both capabilities through a single interface, but this is challenging because neither MathSciNet nor Google Scholar currently allow their data to be merged with the other’s.
Mathematicians also make extensive use of arXiv as a platform for sharing preprints and keeping up with current research developments. Mathematicians strongly support arXiv in part because the full text is largely indexed and exposed to the Web through search engines. However, arXiv items are not indexed through services such as MathSciNet or zbMATH, which would help connect these items to the rest of the literature. Search tools associated with distinct subsets of the literature, such as arXiv, publisher-based repositories, library catalogs, and academic institutional repositories provide overlapping access to the mathematical literature. Unfortunately, the present configuration of these discipline-specific tools does not provide a single information source where mathematicians can find and access information from diverse sources, and the more general information sources often lack the mathematical metadata and details that make mathematics literature easy to search and browse.
Combining data from multiple information resources (e.g., Google, MathSciNet, zbMATH) is complicated. Partnering organizations would have to allow their data to be collected, reused, or recombined on a large scale, which many services are hesitant to do. Even seemingly open resources (such as arXiv) may have legal restrictions on outside data aggregation, depending on what is done with the data. This collaboration would have to be negotiated between potential partners with the goal of creating
a unified view of the mathematics literature. Some approaches toward developing partnerships and relevant examples are discussed in Chapter 3.
Given the central importance of bibliographic data searches and the repeated use of bibliographic information by researchers in preparation of research articles, it is essential for the DML to provide adequate bibliographic support tools with access to the best available bibliographic data in mathematics and related fields. Ideally, it should support advanced bibliographic data processing to detect and identify the structure of networks of papers, authors, topics, and the like. The foundations of such bibliographic data processing are provided by the larger existing bibliographic services in mathematics and beyond, especially MathSciNet, zbMATH, and Google Scholar, which are the most commonly used by mathematicians. At present, none of these services provides an application programming interface (API) for programmatic access, and none of them allow their data to be downloaded in bulk, except with severe restrictions on what can be done with it. To provide the greatest benefit to users of a DML, that would have to change. Both EuDML and Microsoft Academic Search provide steps in a positive direction with more or less open bibliographic data stores with an API for access, which allows tools and services to be built over the corpus.
To seriously engage the mathematics world with a digital library system, extensive coverage of mathematical information is essential. The committee considered whether the DML could initially focus on out-of-copyright material, but it concluded that there would not be community support or interest in this approach because it is too limited. On the other hand, much progress has been made in digitizing heritage content, and it is essential that this be integrated with the rest of the math literature base.
Specialized Mathematical Information Resources
General bibliographic services provide limited support for navigating and searching mathematical literature below the top five bibliographic classes (documents, people, events, organizations, subjects) discussed above. Beyond these five universal classes, information storage and retrieval for math-specific entities is fragmented and typically does not have links or references to the main indexing services.35
Research mathematics literature includes a diverse range of special objects—e.g., theorems, lemmas, functions, sequences—that are not represented adequately, or sometimes at all, in full-text indexing and article-level subject classification systems. Currently, these objects are computationally
35 MathSciNet and zbMATH share the MSC2010 subject classification, which provides some basic filtering of bibliographic data by subject. ArXiv uses a coarser classification, which is however easily mapped to sets of top-level MSC 2010 categories.
expensive and difficult to recognize through machine-based methods alone. Ontologies of objects—such as reference volumes that enumerate classes of functions, sequences, and other objects—have been developed and curated by mathematicians for centuries. These resources include mathematical handbooks, some of the most famous being the following:
- Abramowitz and Stegun (1972) and the subsequent Digital Library of Mathematical Functions,36
- The Bateman Manuscript,37
- Gradshteyn and Ryzhik (2007),
- Borodin and Salminen (2002), and
- The Princeton Companion to Mathematics (Gowers et al., 2008).
There are also examples of more recently developed resources that provide collections of some mathematical objects, including the following:
- Propositions: Wikipedia’s List of Theorems,38 Mizar39;
- Proofs: Proofs from the Book (Aigner and Ziegler, 2010), Mizar, Coq,40 and others41;
- Numbers: A Dictionary of Real Numbers (Borwein and Borwein, 1990);
- Sequences: The On-Line Encyclopedia of Integer Sequences (OEIS)42;
- Functions: Digital Library of Mathematical Functions,43 Wolfram MathWorld,44 Wolfram Functions Site45;
- Groups, rings, and fields: Wikipedia’s List of Simple Lie Groups,46Wikipedia’s List of Finite Simple Groups,47 Centre for Inter-
37 “Bateman Manuscript Project,” Wikipedia, last modified July 24, 2013, http://en.wikipedia.org/wiki/Bateman_Manuscript_Project.
41 “Category:Proof assistants,” Wikipedia, last modified September 21, 2011, http://en.wikipedia.org/wiki/Category:Proof_assistants.
46 “List of Simple Lie Groups,” Wikipedia, last modified March 30, 2013, http://en.wikipedia.org/wiki/List_of_simple_Lie_groups.
47 “List of finite simple groups,” Wikipedia, last modified December 18, 2013, http://en.wikipedia.org/wiki/List_of_finite_simple_groups.
- Identities: Piezas50; Petkovsek et al. (1996);
- Inequalities: Wikipedia’s List of Inequalities,51 DasGupta (2008); and
- Formulas: Springer LaTeX Search,52 Hijikata et al. (2009), Kohlhase et al. (2012).
From a review of these lists, as well as the resources discussed in Appendix C, it is clear that authors and editors continue to be motivated to create and publish lists of various kinds of mathematical objects. Some of these lists, especially ones like tables of integrals and lists of sequences, provide very useful tools for mathematicians and other users of mathematics, especially when combined with computational resources. Wikipedia currently plays a key role in supporting distributed creation and maintenance of numerous lists of serious interest to mathematicians.
Lists and tables have been an essential part of mathematical research throughout history, and the vast majority of working mathematicians have made use of appropriate tables (or, more recently, the equivalent numerical or symbolic software) in the course of their research. The most basic are numerical tables (e.g., values of logarithms, trigonometric functions, various special functions, zeros of the zeta function, integer sequences). More sophisticated are lists of mathematical objects (e.g., indefinite and definite integrals, finite simple groups, Fourier transforms, partial differential equations and their solutions). Or, at even a higher level, lists of theorems, concepts, etc.
At their most basic, tables provide a simple mechanism for speeding up research. Once one identifies that an object under investigation appears in a table, one can make use of prior knowledge about said object, thereby facilitating either applications or new advances in theory. Compiling a table is an important research contribution in its own right, helping codify the knowledge in a field, point out gaps therein, and inspire new research to fill in and extend what is known. Scanning a table often enables one to spot
49 Sage Development Team, “Finite Fields,” http://www.sagemath.org/doc/reference/rings_standard/sage/rings/finite_rings/constructor.html, accessed January 16, 2014.
otherwise obscure patterns, leading to new theorems and new directions of research.
Sara Billey and Bridget Tenner wrote that a database for mathematical theorems would “enhance experimental mathematics, help researchers make unexpected connections between areas of mathematics, and even improve the refereeing process” (Billey and Tenner, 2013, p. 1093). Extensive lists could also enhance search and retrieval of mathematical information and allow for connections to be made between mathematical topics and objects.
Currently, there are no satisfactory indexes of many mathematical objects, including symbols and their uses, formulas, equations, theorems, and proofs, and systematically labeling them is challenging and, as of yet, unsolved. In many fields where there are more specialized objects (such as groups, rings, fields), there are community efforts to index these, but they are typically not machine-readable, reusable, or easily integrated with other tools and are often lacking editorial efforts. So, the issue is how to identify existing lists that are useful and valuable and provide some central guidance for further development and maintenance of such lists.
Chapter 2 of this report discusses some of the user features that could advance mathematics research by increasing connections, and Chapter 5 discusses what collections of entity lists could start making these features and this connectivity a reality.
Abramowitz, M., and I.A. Stegun, eds. 1972. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover Publications, New York.
Aigner, M., and G.M. Ziegler. 2010. Proofs from THE BOOK. 4th edition. Springer-Verlag, Berlin. doi:10.1007/978-3-642-00856-6.
Billey, S.C., and B.E. Tenner. 2013. Fingerprint databases for theorems. Notices of the AMS 60(8):1034-1039.
Borodin, A.N., and P. Salminen. 2002. Handbook of Brownian Motion—Facts and Formulae. 2nd edition. Probability and Its Applications book series. Birkhäuser Verlag, Basel. doi:10.1007/978-3-0348-8163-0.
Borwein, J., and P. Borwein. 1990. A Dictionary of Real Numbers. Wadsworth and Brooks/Cole Advanced Books and Software, Pacific Grove, Calif. doi:10.1007/978-1-4615-8510-7.
Carette, J., and W.M. Farmer. 2009. A review of mathematical knowledge management. Pp. 233-246 in Intelligent Computer Mathematics. Springer.
DasGupta, A. 2008. A collection of inequalities in probability, linear algebra, and analysis. Pp. 633-687 in Springer Texts in Statistics. Springer, New York. doi:10.1007/978-0-387-75971-5 35.
Gowers, T., J. Barrow-Green, and I. Leader, eds. 2008. The Princeton Companion to Mathematics. Princeton University Press, Princeton, N.J.
Gradshteyn, I.S., and I.M. Ryzhik. 2007. Table of Integrals, Series, and Products. 7th edition. Elsevier/Academic Press, Amsterdam. Translated from the Russian, Translation edited and with a preface by A. Jeffrey and D. Zwillinger.
Gruber, T. 2009. Ontology. Encyclopedia of Database Systems (L. Liu and M. Tamer Özsu, eds.). Springer-Verlag. http://tomgruber.org/writing/ontology-definition-2007.htm.
Hijikata, Y., H. Hashimoto, and S. Nishida. 2009. Search mathematical formulas by mathematical formulas. Pp. 404-411 in Lecture Notes in Computer Science. Volume 5617. doi:10.1007/978-3-642-02556-3 46.
International Mathematics Union. 2006. “Digital Mathematics Library: A Vision for the Future.” http://www.mathunion.org/fileadmin/IMU/Report/dml_vision.pdf. Accessed August 20, 2006.
Kohlhase, M., B.A. Matican, and C.-C. Prodescu. 2012. MathWebSearch 0.5: Scaling an open formula search engine. Pp. 342-357 in Lecture Notes in Artificial Intelligence. Volume 7362. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-31374-5.
National Research Council. 2013. The Mathematical Sciences in 2025. The National Academies Press, Washington, D.C.
Petkovsek, M., H. Wilf, and D. Zeilberger. 1996. A = B. A.K. Peters, Ltd., Wellesley, Mass.
Ruddy, D. 2009. The evolving digital mathematics network. Pp. 3-16 in DML 2009 Towards a Digital Mathematics Library Proceedings (P. Sojka, ed.) Conferences on Intelligent Computer Mathematics, CICM 2009, Grand Bend, Ontario, Canada.