Mathematics is facing a pivotal junction where it can either continue to utilize digital mathematics literature in ways similar to traditional printed literature, or it can take advantage of new and developing technology to enable new ways of advancing knowledge. This report details how information contained in individual items within the literature could be readily extracted and linked to create a comprehensive digital mathematics information resource that is more than the sum of its contributing publications. That resource can serve as a platform and focal point for further development of the mathematical knowledge base.

This new system, referred to throughout the report as the Digital Mathematics Library (DML), could support a wide variety of new functionalities and services over aggregated mathematical information, including dramatically improved capabilities for searching, browsing, navigating, linking, computing, visualizing, and analyzing the literature.

**STUDY DEFINITION AND SCOPE AND THE COMMITTEE’S APPROACH**

The Alfred P. Sloan Foundation commissioned this study and charged the committee to:

- Evaluate the potential value of a virtual global library of mathematical science publications;

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 8

1
Introduction
OVERVIEW
Mathematics is facing a pivotal junction where it can either continue to
utilize digital mathematics literature in ways similar to traditional printed
literature, or it can take advantage of new and developing technology to
enable new ways of advancing knowledge. This report details how infor-
mation contained in individual items within the literature could be readily
extracted and linked to create a comprehensive digital mathematics infor-
mation resource that is more than the sum of its contributing publications.
That resource can serve as a platform and focal point for further develop-
ment of the mathematical knowledge base.
This new system, referred to throughout the report as the Digital Math-
ematics Library (DML), could support a wide variety of new functionalities
and services over aggregated mathematical information, including dramati-
cally improved capabilities for searching, browsing, navigating, linking,
computing, visualizing, and analyzing the literature.
STUDY DEFINITION AND SCOPE AND
THE COMMITTEE’S APPROACH
The Alfred P. Sloan Foundation commissioned this study and charged
the committee to:
• Evaluate the potential value of a virtual global library of math-
ematical science publications;
8

OCR for page 8

INTRODUCTION 9
• Assuming that a stable context for sharing copyrighted information
has been achieved, assess the remaining issues to be addressed in
setting up such a library;
• Identify a range of desired capabilities of such a library; and
• Characterize resource needs.
While a traditional library is perhaps the oldest formal information
resource available, the manifestation of libraries has evolved dramatically
over the past few decades. In many cases within mathematics, as for other
fields of scholarship, buildings housing paper publications have given way
to online collections of downloadable documents. While this increased
a
ccess is not perfect—not all material is readily available to all researchers,
and search tools vary from site to site—widespread digitization has made
it easier for many to access the mathematical literature. Overall, a much
greater proportion of the mathematical literature is available to more
p
eople than at any time before. The research libraries, scholarly societies,
and other players that curate and steward this material continue to grapple
with issues, such as long-term preservation of digital materials, but it is
fair to say there exists a fairly comprehensive, distributed “digital library”
for mathematics offering a much improved but not fundamentally different
version of what existed in the time of printed books and journals.
The committee has thus taken the term library in its charge to mean
a system that accumulates and shares knowledge, rather than the more
traditional library that houses documents, either digital or physical. The
committee’s focus has been on functionality that can meet the needs of
mathematicians facing a rapidly expanding and diversifying knowledge
base. The committee has largely ignored traditional issues of assembling
and stewardship of those collections, which are being handled well, for the
most part, by the existing distributed digital library.
The committee envisions its target digital library users to be work-
ing research mathematicians and advanced graduate students beginning
their research careers throughout the world (hence the word global). The
library discussed does not specifically target students below the advanced
graduate student level or researchers outside of mathematics, although
both sets would likely constitute some of the library’s user base. Having
a clear understanding of the target user base directly impacts the types of
content the library targets and the types of services it provides. The com-
mittee also believes that the disciplinary scope of the mathematics that this
library could provide is best left undefined for now. Mathematics and the
mathematical sciences have diffuse boundaries, and this committee takes
no stance on where appropriate content lies. However, this is an issue that
will have to be addressed by either a future management organization or
the community of users.

OCR for page 8

10 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
The committee believes that there is much room for innovation and
progress in the mainstream mathematical information services. To deter-
mine which potential areas for innovation are of the most interest to the
mathematics community, the committee held three meetings where it heard
from outside presenters on issues relevant to mathematics (November 27-
28, 2012; February 19-20, 2013; and May 30-31, 2013—agendas for these
meetings can be found in Appendix A) and two public data-gathering ses-
sions (at the University of Minnesota on May 6, 2013, and at Northwestern
University on May 30, 2013), posted questions on two mathematics discus-
sion forums (MathOverflow1 and Math 2.02), and wrote a guest entry on
Professor Terry Tao’s mathematics blog.3 The committee also referred to
the information shared at the World Digital Mathematics Library workshop
held by the International Mathematical Union (IMU) on June 1-3, 2012.4
The committee made an assessment of what computers can do today,
what computers can help mathematicians to do, and how rapidly these
capabilities are likely to grow, if provided with some ongoing focused re-
search funding. The committee’s consensus is that by some combination of
machine learning methods and community-based editorial effort, a signifi-
cant portion of the information and knowledge in the global mathematical
corpus could be made available to researchers as linked open data. Broadly
defined, linked open data are structured data that are published in such a
way that makes it easy to interlink them with other data, thereby making
it possible to connect them with information from multiple sources. This
connected data can provide a user with a more meaningful query of a sub-
ject by consolidating relevant information from a variety of places (e.g.,
in different research papers) and pulling out specific components that the
user might be particularly interested in. The committee envisions that much
of the existing mathematical information can be provided as linked open
data through a central organizational entity—referred to in this report as
the DML. It should be noted that linked open data are not the only way
that this can be accomplished, but they are essentially today’s standard for
ontologies and other important representations. The committee believes
that the DML should make use of current best practices rather than trying
to develop some other alternative, whenever possible.
1 I. Daubechies, “Math Annotate Platform?,” MathOverflow (question and answer site),
February 18, 2013, http://mathoverflow.net/questions/122125/math-annotate-platform.
2 I. Daubechies, “Math Annotate Platform?,” Math2.0 (discussion forum), February 18,
2013, http://publishing.mathforge.org/discussion/163/.
3 I. Daubechies, “Planning for the World Digital Mathematical Library,” What’s New (blog
by Terence Tao), daily archive for May 8, 2013, http://terrytao.wordpress.com/2013/05/08/.
4 Many of the materials presented at the International Mathematics Union’s DML work-
shop can be found at http://ada00.math.uni-bielefeld.de/mediawiki-1.18.1/index.php/, updated
April 23, 2013.

OCR for page 8

INTRODUCTION 11
STRUCTURE OF THE REPORT
This report consists of five main chapters and several appendices. The
rest of this chapter discusses previous digital mathematics library efforts,
the universe of mathematical information, relevant conceptual tools, and
current mathematical resources. Chapter 2 discusses what is missing from
the mathematical information landscape and what gaps the DML would
fill, and elaborates on the desired DML capabilities from a user’s perspec-
tive. This includes a discussion of what types of features would make the
mathematical literature and current resource capability more meaningful
to a mathe atical researcher. Chapter 3 discusses some of the broad issues
m
that the DML would face during development, including developing partner-
ships, managing large data sets, navigating open access, and planning for
system and data maintenance. Chapter 4 provides a strategic plan for the
development of the DML, including a discussion of fundamental principles,
the constitution of a governing organization, steps toward initial develop-
ment, and resources that would be needed. Chapter 5 discusses some details
of entity collections and technical considerations for the DML that will be
needed to make the features and capabilities discussed in Chapter 2 a reality.
In preparing this report, the committee reviewed many existing digital
resources for mathematics, as well as relevant initiatives in some other sci-
ences. A brief discussion of these tools is given in Appendix C.
PREVIOUS DIGITAL MATHEMATICS LIBRARY EFFORTS
The idea of a comprehensive digital mathematics library has been
around for decades, and there have been several incarnations of the idea
with different foci. The first step in this vision was retrospective digitization
of the older parts of the literature that did not already exist in digital form,
and this has largely been achieved (though the quality, and hence utility, of
these converted materials varies widely, ranging from simple page scans to
carefully proofread markups).
The Cornell University Digital Mathematics Library Planning Project
was funded by the National Science Foundation from 2003 to 2004 as
a step “toward the establishment of a comprehensive, international, dis-
tributed collection of digital information and published knowledge in
mathematics.”5 Its vision statement reads as follows:
In light of mathematicians’ reliance on their discipline’s rich published
heritage and the key role of mathematics in enabling other scientific disci-
5 Cornell University Library, Digital Mathematics Library. S.E. Thomas, principal investi-
gator, R.K. Dennis and J. Poland, co-principal investigators, http://www.library.cornell.edu/
dmlib/, last updated December 2, 2004.

OCR for page 8

12 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
plines, the Digital Mathematics Library strives to make the entirety of past
mathematics scholarship available online, at reasonable cost, in the form
of an authoritative and enduring digital collection, developed and curated
by a network of institutions.
A follow-up report from the International Mathematical Union (IMU,
2006) shared this vision of a distributed collection of past mathematical
scholarship that served the needs of all science, and it encouraged math-
ematicians and publishers of mathematics to join together in implementing
this vision. However, it was clear within a few years that this vision was not
going to become a reality soon. As David Ruddy of Project Euclid wrote
(Ruddy, 2009):
The grand vision of a Digital Mathematics Library, coordinated by a group
of institutions that establish policies and practices regarding digitization,
management, access, and preservation, has not come to pass. The project
encountered two related problems: it was overly ambitious, and the ap-
proach to realizing it confused local and community responsibilities. While
the vision called for a network of distributed, interoperable repositories,
the committee approached and planned the project with the goal of build-
ing a single, unified library.
At the time of this study, there has been some progress in this vision of
a single, unified library in the form of the European Digital Mathematics
ibrary (EuDML) project.6 The EuDML project, funded from 2010-2013 by
L
the European Commission, created a network of 12 European repositories
acquiring selected mathematical content for preservation and access and
made progress in establishing a single distributed ibrary with a collection
l
of about 225,000 unique items, spanning 2.6 million pages. The EuDML
succeeded in creating a unified metadata framework7—which includes items
about a document such as the title, uthors, abstract, comments, report
a
number, category, journal reference, direct object identifier, Mathematics
Subject Classification (MSC), and Assoiation for Computing Machinery
c
(ACM) computing classification—that is shared by these repositories and
providing a single point of access to publications in these repositories, albeit
with limited rights to search the full text from some sources. Impressive as
the EuDML is, when compared to the full size and scope of the universe
of published mathematics (described in the next section), and given the
6 T. Bouche, Université de Grenoble, “From EuDML to WDML: Next Steps,” Presentation
to the committee on November 27, 2012.
7 European Digital Mathematics Library, “Appendix, EuDML Metadata Schema (Final)/
Tagging Best Practices,” in EuDML Metadata Schema Specification (v2.0-final), https:// roject.
p
eudml.org/sites/default/files/d36-appendix_uncropped.pdf, accessed January 16, 2014.

OCR for page 8

INTRODUCTION 13
essenial requirement to integrate with copyrighted materials and the clear
t
desirability and cost-effectiveness of leveraging existing repositories and
services, the EuDML experience only emphasizes the difficulties inherent in
aiming for a single, centrally managed and truly comprehensive collection of
digitized mathematics as the cornerstone for a comprehensive DML. With
the advent of recent advances in technology and the advantage of experience
gained on EuDML and other projects, the study committee concluded that
a more effective approach going forward would be to partner with exist-
ing content providers and focus instead on the innovations and elements
of shared infrastructure and knowledge management that are not being
adequately addressed by other entities (i.e., rather than on central harvest-
ing and aggregation of primary content). The committee believes that this
vision is consistent with the original vision of the EuDML, although it was
not realized by that project.
Another example of an online resource that helps users connect with
knowledge is the National Science Digital Library (NSDL).8 NSDL is an on-
line educational resource for teaching and learning, with current emphasis
on the sciences, technology, engineering, and mathematics. NSDL does not
hold content directly—instead, it provides structured metadata about Web-
based educational resources held on other sites by providers who contribute
this metadata to NSDL for organized search and open access to educational
resources via NSDL.org and its services.
A discussion of many other efforts and current digital resources can be
found in Appendix C.
The Alfred P. Sloan Foundation supported a World Digital Mathe
matics Library workshop in June 2012,9 which was planned by the IMU’s
Committee on Electronic Information and Communication. This workshop
provided a wealth of information to the committee on the current state of
the art and research efforts aimed at making the World Digital Mathe atics
m
Library a reality.
Much of the straightforward work of assembling digital mathematics
libraries has been done (e.g., digitizing material, aggregating it into small to
medium-sized collections). The difficulties that the EuDML faced in creat-
ing a single large aggregation of mathematics literature and the difficulty
of other World Digital Mathematics Library efforts in gaining community
support indicates that these challenges are unlikely to be overcome soon.
The committee notes that there has been sizable ongoing investment from
publishers (both commercial and noncommercial) to retrospectively digi-
8 National Science Digital Library, http://nsdl.org/, accessed January 16, 2014.
9 International Mathematics Union, “The Future World Heritage Digital Mathematics
Library: Plans and Prospects,” updated April 23, 2013, http://ada00.math.uni-bielefeld.de/
mediawiki-1.18.1/index.php/Main_Page.

OCR for page 8

14 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
tize historical runs of their copyrighted journals and also, in many cases,
even earlier historical materials that are now out of copyright, in order to
capture comprehensive representations of their journals. However, broad
services such as Google Scholar now provide much of the functionality that
many of these specialized efforts had hoped to achieve in building compre-
hensive and coherent collections of the mathematical literature. Such ser-
vices achieve this functionality by searching across a range of repositories,
rather than trying to collect all of the material in one (or a very few) reposi-
tories. In the committee’s view, efforts to build centralized comprehensive
resources are reaching a point of diminishing returns.
Finding: The construction of mathematical libraries through centralized
aggregation of resources has reached a point of diminishing returns,
particularly given that much of this construction has been coupled with
retrospective digitization efforts.
While there is still a substantial amount of historical (mostly out of
copyright) mathematical literature that would benefit from retrospective
digitization, or higher quality digitization than has currently been done,
the committee does not believe that there is justification for a major new
program and investment in this area. In particular, although there is value
in modest, sustained investment in existing efforts, these will make only
incremental contributions. While the fundamental importance of the heri-
tage literature remains, its size, as a fraction of the overall mathematics
literature, is diminishing steadily. No amount of additional retrospective
digitization will result in a fundamental change in the way that the math-
ematical literature can be used in new ways or evolved to meet new research
needs. Moreover, while the historical (e.g., out of copyright) segments of
the mathematical literature are valuable, any genuinely meaningful large-
scale change in accessing the mathematical literature and knowledge base
must encompass not only heritage but also current literature. Thus, the
committee believes that a very different set of investments (as described in
this report) is where the transformative opportunities await.
The next section provides some more detailed information on the exist-
ing landscape of mathematical literature and how much has been digitized.
THE UNIVERSE OF PUBLISHED MATHEMATICAL INFORMATION
Mathematics shares more with the arts than the sciences, in that its
primary data are human creations, perhaps representations of ideas in a
platonic realm, rather than data derived by observation or measurement of
the physical universe. Mathematical information is primarily mined from its
own literature or derived by computation. This section describes the state of

OCR for page 8

INTRODUCTION 15
mathematical publishing and the world of mathematical objects that exist
within the publications.
Digital Mathematical Publications
Most of the mathematics literature of the 20th century is now available
digitally. Through the Jahrbuch Electronic Research Archive for Mathemat-
ics10 project and the independent efforts of publishers and others, much
of the most important mathematical research of the last half of the 19th
century also has been digitized. Appendix C provides an overview of the
many sources for digitized mathematical source material, including reposi-
tories and many other types of sources, whether freely accessible or behind
paywalls (and thus only accessible to subscribers). A large part of the math-
ematics literature in electronic form consists of papers written in the past
20 years. This portion of the literature is searchable and navigable by any
user of a library with access to the main subscription services controlled by
libraries and publishers.
In addition, a considerable body of the heritage literature in mathe
matics has been digitized over the past 15 years. The most comprehensive
listing of the retro-digitized mathematics literature is Ulf Rehmann’s list
of Retro igitized Mathematics Journals and Monographs,11 which is a
d
list of titles of serials and books that have been digitized without meta
data.12 Much of this metadata has found its way into indexes maintained
by Google, athSciNet, and Zentralblatt (zbMATH).13
M
The digital corpus of mathematics literature is extensive. The
MathSciNet14 database includes approximately 2.9 million publica-
tions from 1940 to the present, with direct links to 1.7 million of them.
M
athSciNet currently indexes more than 2,000 journal/serial titles and
contains about 100,000 books (post 1960). Of the items currently avail-
able on MathSciNet, 2.6 million of them are from the 1970s or later, and
1.7 million are from 1990 onward. The American Mathematical Society has
kept track of new journal titles in the field since 1997, and there has been
an average growth of about 40 new journal titles per year in mathematics.
10 TheJahrbuch Project, Electronic Research Archive for Mathematics, last modified Octo-
ber 31, 2006, http://www.emis.de/projects/JFM/.
11 DML: Digital Mathematics Library, http://www.mathematik.uni-bielefeld.de/~rehmann/
DML/dml_links.html, accessed January 16, 2014.
12 Metadata are broadly defined as data about data. In the case of a typical mathematics
journal digital publication, metadata may include information such as author, journal name
and volume, date of publication, time of file creation, size of file.
13 zbMATH, http://zbmath.org/, accessed January 16, 2014.
14 American Mathematical Society, MathSciNet, http://www.ams.org/mathscinet/, accessed
January 16, 2014.

OCR for page 8

16 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
zbMATH (1931- resent) contains more than 3 million publications and
p
currently indexes approximately 3,500 journals. The annual production of
mathe atics papers is more difficult to quantify. There has been a steady
m
increase in the number of math papers added to arXiv15 over the past
5 years (shown in Table 1-1), although it is not clear from these data if this
shows an increase in mathematics publications or an increase in mathemati-
cians’ willingness to post their papers. Annual entries on MathSciNet and
the number of mathematics papers listed in Web of Science16 have both
remained relatively constant around 90,000 and 20,000, respectively (see
Tables 1-2 and 1-3).
Components of the digitized corpus of mathematics are increasingly
included in a variety of stable, well-curated repositories, although access
to much of this corpus remains limited by copyright or other intellectual
rights restrictions. For example, in terms of retrospectively digitized works
cataloged under the subject heading (or subheading) of “mathematics,”
the HathiTrust Digital Library17 includes approximately 40,000 biblio-
graphically distinct resources.18 Of these, only 6,800 were digitized from
public-domain works; the rest were digitized from copyrighted originals.
These numbers are a mix of monograph titles and serial titles (a serial title
in HathiTrust typically encompasses a complete run of a journal, edited
series, or conference publication series). Each serial run could be expected
to include tens or even hundreds of issues, with each issue containing at
least several articles or papers. In terms of pages, using the HathiTrust
repository-wide ratio of pages per bibliographic resource to estimate, this
translates to a rough estimate of 25.5 million pages of retrospectively digi-
tized mathematics in HathiTrust with approximately 17 percent (6,800 out
of 40,000) digitized from public-domain sources.
The basic trends seem clear: more and more of the corpus of math-
ematical literature will be in digital form, including some with high-quality
markup, specifically those items that are “born” digital or retro-digitized
to be in a machine readable format and that use typesetting such as LaTeX
or MathML (as opposed to page images of publications). As mentioned
before, the fraction of the overall corpus that is pre-1970 is rapidly dimin-
ishing due to the relative explosion in the annual rates of publication in
recent decades (however, this should in no way be seen as diminishing the
fundamental importance of heritage literature).
15 arXiv,http://arxiv.org/, accessed January 16, 2014.
16 Thomson Reuters, “Web of Science Core Collection,” http://thomsonreuters.com/web-of-
science/, accessed January 16, 2014.
17 HathiTrust Digital Library, http://www.hathitrust.org/, accessed January 16, 2014.
18 Current as of September 2013.

OCR for page 8

INTRODUCTION 17
TABLE 1-1 Number of Mathematics Papers Added to arXiv Annually
Between 2008 and 2012
Year Mathematics Papers Added to arXiv
2008 14,373
2009 16,319
2010 18,765
2011 21,287
2012 24,176
SOURCE: arXiv, http://arxiv.org/, accessed January 16, 2014.
TABLE 1-2 Number of Articles in Research Journals in MathSciNet
Annually Between 2006 and 2012
Publication Year Entries in MathSciNet
2006 76,187
2007 81,638
2008 86,533
2009 87,279
2010 87,162
2011 89,638
2012 92,191
NOTE: A steady growth of about 3 percent per year is seen.
SOURCE: American Mathematical Society, MathSciNet, http://www.ams.org/mathscinet/,
accessed January 16, 2014.
TABLE 1-3 Mathematics Papers Listed in Web of Science Annually
Between 2008 and 2012
Year Mathematics Papers Listed in Web of Science
2008 20,908
2009 22,390
2010 22,079
2011 22,716
2012 23,760
SOURCE: Thomson Reuters, “Web of Science Core Collection,” http://thomsonreuters.com/
web-of-science/, accessed January 16, 2014.

OCR for page 8

18 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
Objects in the Mathematical Literature
Information found in the mathematical literature is diverse but largely
falls into two main categories:
1. Bibliographic information, such as
a. Documents (e.g., articles, books, proceedings, talks, diagrams,
homepages, blogs, videos);
b. People (e.g., authors, editors, referees, reviewers);
c. Events (e.g., discoveries, publications, conferences, talks, births,
deaths, degrees, awards);
d. Organizations (e.g., universities, publishers, journals, libraries,
service providers);
e. Subjects (e.g., major branches of mathematics—algebra,
g
eometry, analysis, topology, probability, statistics—as well
as their intersections and interactions and their various sub-
branches, down to even finer topics and including ubiquitous
mathematical terms like “number,” “set”)
2. Mathematical concepts (e.g., axioms, definitions, theorems, proofs,
formulas, equations, numbers, sets, functions) and objects (e.g.,
groups, rings).
Collecting and aggregating mathematical bibliographic information
has been the path many digital libraries and digital resources have taken
in the past (Chapter 2 and Appendix C discuss many of these efforts to
date). While there are many challenges in collecting this information, the
even more difficult work lies in collecting mathematical concepts, which
lack the standardization that most bibliographic information has acquired.
However, an ability to explore these mathematical objects within the litera-
ture offers the potential to uncover currently under-explored connections
in mathematics.
The recent National Research Council report The Mathematical Sci-
ences in 2025 (NRC, 2013) discusses the importance of mathematical struc-
tures, which are part of the larger mathematical concepts described above:
A mathematical structure is a mental construct that satisfies a collection
of explicit formal rules on which mathematical reasoning can be car-
ried out. . . . What is remarkable is how many interesting mathematical
structures there are, how diverse are their characteristics, and how many
of them turn out to be important in understanding the real world, often
in unanticipated ways. Indeed, one of the reasons for the limitless pos-
sibilities of the mathematical sciences is the vast realm of possibilities for
mathematical structures. . . . A striking feature of mathematical structures
is their hierarchical nature—it is possible to use existing mathematical

OCR for page 8

INTRODUCTION 19
structures as a foundation on which to build new mathematical structures
. . . . Mathematical structures provide a unifying thread weaving through
and uniting the mathematical sciences. (pp. 29-30)
Given the size, diversity, and inherent nature of mathematics informa-
tion in categories 1 and 2 above, it is clearly not sufficient to simply pro-
vide undifferentiated access to the universe of mathematics monographs,
journal articles, and conference papers. Instead, the online research litera-
ture of mathematics must be organized into a well-structured network of
resources linked together based on a variety of attributes—bibliographic
and topical, of course, but also linked in a highly granular fashion on com-
monalities of mathematical structures and the shared use of mathematical
objects, reasoning, and methodologies. The committee believes that the
greatest potential for the DML lies in providing mathematicians access to
a well-structured network of information and building services that both
enhance and utilize this data. In the context of today’s Web environment,
a well-structured network implies adherence to the Semantic Web19 and
linked open data principles and to community-endorsed standards and best
practices. While the foundation for such a well-structured network of digi-
tal research mathematics exists in established repositories and component
digital libraries, the underlying thesauri and ontologies of mathematical
objects do not yet exist (or have not yet been given permanence and formal
identity), and the agreements on best practices for interoperability and the
implementation of linked open data principles in the context of research
mathematics repositories have not yet been reached.
CONCEPTUAL TOOLS
General conceptual tools that are used to structure, organize, represent,
and share knowledge include the closely related ideas of ontologies, tax-
onomies, and vocabularies. There is considerable debate about the precise
definitions and differences among these tools, although ontologies (most
commonly viewed as a tool for defining some classes of objects—the attri-
butes that these objects may have and the way in which these objects may
be related to each other) are usually seen as the most general formulation
(Gruber, 2009). Taxonomies are specific, usually hierarchical, collections
of terms that can be used to describe or classify objects in some contexts—
examples of these include subject headings or the naming schemes used in
biological systematics. “Controlled” vocabularies are collections of values
that can be used to populate specific instances of object attributes within
an ontology; in a certain sense, they are equivalent to taxonomies in that
19 W3C, “Semantic Web,” http://www.w3.org/standards/semanticweb/.

OCR for page 8

20 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
they can be used to classify. However, controlled vocabularies are often
“flat,” without other internal structure among the possible values, whereas
taxonomies commonly include very rich internal hierarchical structure.
Ontologies, vocabularies, and taxonomies work together. As a simple ex-
ample, a part of an ontology might define a specific class of objects called
documents; each of these has attributes that include subjects and languages.
One might have a list of possible language values (a controlled vocabulary)
associated with the ontology and also a tree structure of subject headings
(a taxonomy, though it could also viewed as a simple vocabulary).
For instance, within the mathematical sciences, the widely accepted
Bibliographic Ontology20 provides a fairly adequate accounting of the many
common relations between objects in categories 1a through 1e listed above.
The BibTeX21 schema that describes the structure of BibTeX ecords defines
r
a similar ontology. The Citation Typing Ontology (CiTO)22 is an ontology
for description of the citation relation between documents. The Mathematics
Subject Classification (MSC2010)23 provides a very well thought out, largely
hierarchical taxonomy for the classification of mathematical documents by
subject, and thence for the subjects themselves. OpenMath,24 discussed fur-
ther in Chapter 5, offers a potential standard for representing the semantics
of mathematical objects that is very relevant to the DML’s goals.
The application of such ontologies to a mathematical objects data set
can create graphical structures of information that can provide new in-
sights. For instance, citations generate a citation graph, and collaborations
generate a collaboration graph. Such graphical structures are commonly
embedded in the structure of hyperlinked webpages, thereby connecting
literature that was not obviously related otherwise.
Development of new ontologies is a complex process requiring a high
level of community effort for consensus, even for limited sets of relations.
The committee expects that when communities start to curate various
digital collections of records of mathematical entities, there will be some
“bottom up” development of at least minimal ontologies for these entities,
as has already occurred with MSC2010 and OpenMath. The structure of
these ontologies will be reflected in the necessary schemas25 for description
of the objects they involve, and the graphical relations induced by these
20 The Bibliographic Ontology, “Bibliographic Ontology Specification,” dated November 4,
2009, http://bibliontology.com/specification.
21 BibTeX, http://www.bibtex.org/, accessed January 16, 2014.
22 CiTO, the Citation Typing Ontology, dated March 7, 2013, http://purl.org/spar/cito/.
23 Encoded by the Mathematics Subject Classification (MSC2010), American Mathematical
Society, http://www.ams.org/mathscinet/msc/msc2010.html, accessed January 16, 2014.
24 OpenMath Society, OpenMath, http://www.openmath.org/, accessed January 16, 2014.
25 A schema is broadly defined as a representation of a plan or theory in the form of an
outline or model.

OCR for page 8

INTRODUCTION 21
ontologies will be of potentially great interest in the process of extracting
information and knowledge from mathematical publications.
CURRENT MATHEMATICAL RESOURCES
The management of formal representations of mathematical concepts
is known as mathematics knowledge management (Carette and Farmer,
2009). In this report, this issue is viewed more broadly as the management
of mathematical information and concepts, both formal and informal, in-
cluding the bibliographic information and mathematical concepts categories
of objects introduced in the previous section, only the latter of which can
be usefully regarded as part of mathematics itself.
Bibliographic Resources in Mathematics
Several general bibliographic resources exist, and some of these are
d
escribed in Appendix C. Among them, mathematicians typically use
Google26 and Google Scholar27 most often, although CrossRef28 is “ nder u
the hood” whenever a user navigates from one publisher’s site to another
by a reference link. While many mathematicians heavily utilize these gen-
eral information services because of their power and ubiquity, some math-
ematicians prefer the discipline-specific abstracting and indexing services
provided by MathSciNet29 and zbMath.30 This discipline-specific service
preference is partly for historical reasons and partly because the focus
and quality of metadata provided by these services in mathematics makes
it asier to find publications of interest. Both services offer bibliographic
e
ntries in BibTeX,31 which is machine-readable and reusable, for prepara-
e
tion of reference lists for LaTeX32 documents, and, with more technical
ffort, for publication of online bibliographies in HTML33 or JSON.34
e
U
sing search engines with access to well-curated bibliographic metadata
and full-text indexing is how most mathematicians find mathematical pri-
mary sources today.
26 Google, https://www.google.com/, accessed January 16, 2014.
27 Google Scholar, http://scholar.google.com/, accessed January 16, 2014.
28 CrossRef, http://www.crossref.org/, accessed January 16, 2014.
29 American Mathematical Society, MathSciNet, http://www.ams.org/mathscinet/, accessed
January 16, 2014.
30 zbMATH, http://www.zentralblatt-math.org/zmath/, accessed January 16, 2014.
31 BibTeX, http://www.bibtex.org/, accessed January 16, 2014.
32 LaTeX—A document preparation system, last revised January 10, 2010, http://www.
latex-project.org/.
33 “HTML,” Wikipedia, http://en.wikipedia.org/wiki/HTML, accessed January 16, 2014.
34 “Introducing JSON,” http://www.json.org/, accessed January 16, 2014.

OCR for page 8

22 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
Services such as MathSciNet, zbMATH, and Google Scholar provide
complementary and somewhat overlapping services. One distinct difference
is that MathSciNet is organized chronologically and referentially, while
Google Scholar is based on “importance” as qualified by page ranks or
some variant thereof. Both are important and are used in literature searches.
MathSciNet is great for tasks such as listing all articles by an author and
listing all articles in a specific mathematical field, and it has high-quality
metadata that are needed for many purposes. Its search capabilities are
limited because it only searches over metadata. Google Scholar is often
better for searches because it searches over full text, including reference
lists, and has better ranking or returns for most purposes. One issue that
some mathematicians have with Google Scholar is that it is not possible to
limit searches to math or subfields of math. MathSciNet, zbMATH, and
Google Scholar combined do a good job providing conventional discovery
over the corpus of traditionally published mathematical literature, but no
services currently provide a finer-grain search capability that allows a user
to search for mathematical objects or ideas that cannot be easily defined
by text search, such as an equation or the evolution of a specific notation.
Ideally, a mathematician should have the best of both capabilities through
a single interface, but this is challenging because neither MathSciNet nor
Google Scholar currently allow their data to be merged with the other’s.
Mathematicians also make extensive use of arXiv as a platform for
sharing preprints and keeping up with current research developments.
Mathematicians strongly support arXiv in part because the full text is
largely indexed and exposed to the Web through search engines. How-
ever, arXiv items are not indexed through services such as MathSciNet
or zbMATH, which would help connect these items to the rest of the
literature. Search tools associated with distinct subsets of the literature,
such as arXiv, publisher-based repositories, library catalogs, and academic
institutional repositories provide overlapping access to the mathematical lit-
erature. Unfortunately, the present configuration of these discipline-specific
tools does not provide a single information source where mathematicians
can find and access information from diverse sources, and the more general
information sources often lack the mathematical metadata and details that
make mathematics literature easy to search and browse.
Combining data from multiple information resources (e.g., Google,
MathSciNet, zbMATH) is complicated. Partnering organizations would
have to allow their data to be collected, reused, or recombined on a large
scale, which many services are hesitant to do. Even seemingly open re-
sources (such as arXiv) may have legal restrictions on outside data aggrega-
tion, depending on what is done with the data. This collaboration would
have to be negotiated between potential partners with the goal of creating

OCR for page 8

INTRODUCTION 23
a unified view of the mathematics literature. Some approaches toward
developing partnerships and relevant examples are discussed in Chapter 3.
Given the central importance of bibliographic data searches and the
repeated use of bibliographic information by researchers in preparation
of research articles, it is essential for the DML to provide adequate biblio-
graphic support tools with access to the best available bibliographic data in
mathematics and related fields. Ideally, it should support advanced biblio-
graphic data processing to detect and identify the structure of networks of
papers, authors, topics, and the like. The foundations of such bibliographic
data processing are provided by the larger existing bibliographic services
in mathematics and beyond, especially MathSciNet, zbMATH, and Google
Scholar, which are the most commonly used by mathematicians. At resent,
p
none of these services provides an application programming interface (API)
for programmatic access, and none of them allow their data to be down-
loaded in bulk, except with severe restrictions on what can be done with
it. To provide the greatest benefit to users of a DML, that would have to
change. Both EuDML and Microsoft Academic Search provide steps in a
positive direction with more or less open bibliographic data stores with an
API for access, which allows tools and services to be built over the corpus.
To seriously engage the mathematics world with a digital library system,
extensive coverage of mathematical information is essential. The commit-
tee considered whether the DML could initially focus on out-of-copyright
material, but it concluded that there would not be community support or
interest in this approach because it is too limited. On the other hand, much
progress has been made in digitizing heritage content, and it is essential that
this be integrated with the rest of the math literature base.
Specialized Mathematical Information Resources
General bibliographic services provide limited support for navigating
and searching mathematical literature below the top five bibliographic
classes (documents, people, events, organizations, subjects) discussed above.
Beyond these five universal classes, information storage and retrieval for
math-specific entities is fragmented and typically does not have links or
references to the main indexing services.35
Research mathematics literature includes a diverse range of special
o
bjects—e.g., theorems, lemmas, functions, sequences—that are not repre-
sented adequately, or sometimes at all, in full-text indexing and rticle-level
a
subject classification systems. Currently, these objects are computationally
35 MathSciNet and zbMATH share the MSC2010 subject classification, which provides
some basic filtering of bibliographic data by subject. ArXiv uses a coarser classification, which
is however easily mapped to sets of top-level MSC 2010 categories.

OCR for page 8

24 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
expensive and difficult to recognize through machine-based methods alone.
Ontologies of objects—such as reference volumes that enumerate classes of
functions, sequences, and other objects—have been developed and curated
by mathematicians for centuries. These resources include mathematical
handbooks, some of the most famous being the following:
• Abramowitz and Stegun (1972) and the subsequent Digital Library
of Mathematical Functions,36
• The Bateman Manuscript,37
• Gradshteyn and Ryzhik (2007),
• Borodin and Salminen (2002), and
• The Princeton Companion to Mathematics (Gowers et al., 2008).
There are also examples of more recently developed resources that
provide collections of some mathematical objects, including the following:
• Propositions: Wikipedia’s List of Theorems,38 Mizar39;
• Proofs: Proofs from the Book (Aigner and Ziegler, 2010), Mizar,
Coq,40 and others41;
• Numbers: A Dictionary of Real Numbers (Borwein and Borwein,
1990);
• Sequences: The On-Line Encyclopedia of Integer Sequences (OEIS)42;
• Functions: Digital Library of Mathematical Functions,43 Wolfram
MathWorld,44 Wolfram Functions Site45;
• Groups, rings, and fields: Wikipedia’s List of Simple Lie Groups,46
Wikipedia’s List of Finite Simple Groups,47 Centre for Inter
36 NIST Digital Library of Mathematical Functions, 2013, http://dlmf.nist.gov/.
37 “Bateman Manuscript Project,” Wikipedia, last modified July 24, 2013, http://en.
wikipedia.org/wiki/Bateman_Manuscript_Project.
38 “List of Theorems,” Wikipedia, last modified December 9, 2013, http://en.wikipedia.org/
wiki/List_of_theorems.
39 Mizar Home Page, last modified January 8, 2014, http://mizar.org/.
40 The Coq Proof Assistant, http://coq.inria.fr/, accessed January 16, 2014.
41 “Category:Proof assistants,” Wikipedia, last modified September 21, 2011, http://en.
wikipedia.org/wiki/Category:Proof_assistants.
42 On-Line Encyclopedia of Integer Sequences® (OEIS®) Wiki, https://oeis.org/wiki/ elcome,
W
accessed January 16, 2014.
43 NIST Digital Library of Mathematical Functions, 2013, http://dlmf.nist.gov/.
44 Wolfram MathWorld, http://mathworld.wolfram.com/, accessed January 16, 2014.
45 Wolfram Research, Inc., The Wolfram Functions Site, http://functions.wolfram.com/,
accessed January 16, 2014.
46 “List of Simple Lie Groups,” Wikipedia, last modified March 30, 2013, http://en.wikipedia.
org/wiki/List_of_simple_Lie_groups.
47 “List of finite simple groups,” Wikipedia, last modified December 18, 2013, http://
en.wikipedia.org/wiki/List_of_finite_simple_groups.

OCR for page 8

INTRODUCTION 25
disciplinary Research in Computational Algebra: Finite Fields,48
Sage’s Finite Fields49;
• Identities: Piezas50; Petkovsek et al. (1996);
• Inequalities: Wikipedia’s List of Inequalities,51 DasGupta (2008);
and
• Formulas: Springer LaTeX Search,52 Hijikata et al. (2009), Kohl-
hase et al. (2012).
From a review of these lists, as well as the resources discussed in
Appen ix C, it is clear that authors and editors continue to be motivated to
d
create and publish lists of various kinds of mathematical objects. Some of
these lists, especially ones like tables of integrals and lists of sequences, pro-
vide very useful tools for mathematicians and other users of mathe atics,m
especially when combined with computational resources. Wikipedia cur-
rently plays a key role in supporting distributed creation and maintenance
of numerous lists of serious interest to mathematicians.
Lists and tables have been an essential part of mathematical research
throughout history, and the vast majority of working mathematicians have
made use of appropriate tables (or, more recently, the equivalent numerical
or symbolic software) in the course of their research. The most basic are
numerical tables (e.g., values of logarithms, trigonometric functions, vari-
ous special functions, zeros of the zeta function, integer sequences). More
sophisticated are lists of mathematical objects (e.g., indefinite and definite
integrals, finite simple groups, Fourier transforms, partial differential equa-
tions and their solutions). Or, at even a higher level, lists of theorems,
concepts, etc.
At their most basic, tables provide a simple mechanism for speeding
up research. Once one identifies that an object under investigation appears
in a table, one can make use of prior knowledge about said object, thereby
facilitating either applications or new advances in theory. Compiling a table
is an important research contribution in its own right, helping codify the
knowledge in a field, point out gaps therein, and inspire new research to fill
in and extend what is known. Scanning a table often enables one to spot
48 CIRCA, “GAP Instructional Material,” January 2003, http://www-circa.mcs.st-and.ac.uk/
gapfinite.php.
49 Sage Development Team, “Finite Fields,” http://www.sagemath.org/doc/reference/rings_
standard/sage/rings/finite_rings/constructor.html, accessed January 16, 2014.
50 T. Piezas III, A Collection of Algebraic Identities, https://sites.google.com/site/tpiezas/
Home/, accessed January 16, 2014.
51 “List of Inequalities,” Wikipedia, last modified November 28, 2013, http://en.wikipedia.
org/wiki/List_of_inequalities.
52 Springer, LaTeX Search, http://www.latexsearch.com/, accessed January 16, 2014.

OCR for page 8

26 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
otherwise obscure patterns, leading to new theorems and new directions
of research.
Sara Billey and Bridget Tenner wrote that a database for mathemati-
cal theorems would “enhance experimental mathematics, help researchers
make unexpected connections between areas of mathematics, and even im-
prove the refereeing process” (Billey and Tenner, 2013, p. 1093). Extensive
lists could also enhance search and retrieval of mathematical information
and allow for connections to be made between mathematical topics and
objects.
Currently, there are no satisfactory indexes of many mathematical
objects, including symbols and their uses, formulas, equations, theorems,
and proofs, and systematically labeling them is challenging and, as of yet,
unsolved. In many fields where there are more specialized objects (such as
groups, rings, fields), there are community efforts to index these, but they
are typically not machine-readable, reusable, or easily integrated with other
tools and are often lacking editorial efforts. So, the issue is how to identify
existing lists that are useful and valuable and provide some central guidance
for further development and maintenance of such lists.
Chapter 2 of this report discusses some of the user features that could
advance mathematics research by increasing connections, and Chapter 5
discusses what collections of entity lists could start making these features
and this connectivity a reality.
REFERENCES
Abramowitz, M., and I.A. Stegun, eds. 1972. Handbook of Mathematical Functions with
Formulas, Graphs, and Mathematical Tables. Dover Publications, New York.
Aigner, M., and G.M. Ziegler. 2010. Proofs from THE BOOK. 4th edition. Springer-Verlag,
Berlin. doi:10.1007/978-3-642-00856-6.
Billey, S.C., and B.E. Tenner. 2013. Fingerprint databases for theorems. Notices of the AMS
60(8):1034-1039.
Borodin, A.N., and P. Salminen. 2002. Handbook of Brownian Motion—Facts and Formulae.
2nd edition. Probability and Its Applications book series. Birkhäuser Verlag, Basel.
doi:10.1007/978-3-0348-8163-0.
Borwein, J., and P. Borwein. 1990. A Dictionary of Real Numbers. Wadsworth and Brooks/Cole
Advanced Books and Software, Pacific Grove, Calif. doi:10.1007/978-1-4615-8510-7.
Carette, J., and W.M. Farmer. 2009. A review of mathematical knowledge management. Pp.
233-246 in Intelligent Computer Mathematics. Springer.
DasGupta, A. 2008. A collection of inequalities in probability, linear algebra, and analysis.
Pp. 633-687 in Springer Texts in Statistics. Springer, New York. doi:10.1007/978-0-387-
75971-5 35.
Gowers, T., J. Barrow-Green, and I. Leader, eds. 2008. The Princeton Companion to Math-
ematics. Princeton University Press, Princeton, N.J.
Gradshteyn, I.S., and I.M. Ryzhik. 2007. Table of Integrals, Series, and Products. 7th edition.
Elsevier/Academic Press, Amsterdam. Translated from the Russian, Translation edited
and with a preface by A. Jeffrey and D. Zwillinger.

OCR for page 8

INTRODUCTION 27
Gruber, T. 2009. Ontology. Encyclopedia of Database Systems (L. Liu and M. Tamer Özsu,
eds.). Springer-Verlag. http://tomgruber.org/writing/ontology-definition-2007.htm.
Hijikata, Y., H. Hashimoto, and S. Nishida. 2009. Search mathematical formulas by math-
ematical formulas. Pp. 404-411 in Lecture Notes in Computer Science. Volume 5617.
doi:10.1007/978-3-642-02556-3 46.
International Mathematics Union. 2006. “Digital Mathematics Library: A Vision for the
Future.” http://www.mathunion.org/fileadmin/IMU/Report/dml_vision.pdf. Accessed
August 20, 2006.
Kohlhase, M., B.A. Matican, and C.-C. Prodescu. 2012. MathWebSearch 0.5: Scaling an open
formula search engine. Pp. 342-357 in Lecture Notes in Artificial Intelligence. Volume
7362. Springer, Berlin, Heidelberg. doi:10.1007/978-3-642-31374-5.
National Research Council. 2013. The Mathematical Sciences in 2025. The National Acad-
emies Press, Washington, D.C.
Petkovsek, M., H. Wilf, and D. Zeilberger. 1996. A = B. A.K. Peters, Ltd., Wellesley, Mass.
Ruddy, D. 2009. The evolving digital mathematics network. Pp. 3-16 in DML 2009 Towards
a Digital Mathematics Library Proceedings (P. Sojka, ed.) Conferences on Intelligent
Computer Mathematics, CICM 2009, Grand Bend, Ontario, Canada.