2

Potential Value of a Digital Mathematics Library

**WHAT IS MISSING FROM THE MATHEMATICAL INFORMATION LANDSCAPE?**

The current mathematical information landscape is complex and diverse, as described in Chapter 1 and Appendix C. Current digital mathematical resources provide services such as electronic access to papers (often with advanced features capable of searching and sorting based on key words, subject areas, text searches, and authors), platforms for discussion, and improved navigation across multiple data sources. What they do not do is allow a user to systematically explore the information captured within the literature and forums and readily explore connections that may not be obvious from looking at the material alone.

This inability to easily explore the mathematical ideas that exist within a mathematical paper, which cannot easily be searched for, is a detriment to the mathematical community. There is a largely unexplored network of information embedded in the connections of mathematical objects, and formalizing this network—making it easy to see, manipulate, and explore—holds the potential to vastly accelerate and expand currently mathematical research. This network would consist of information from traditional resources, such as research papers published in journals, and content dispersed in other Internet-based resources and databases. Initial development of the DML could begin immediately with the aim of providing a foundational platform on which most of the capabilities discussed in this report might imaginably be achieved in a 10- or 20-year time frame. This report discusses how the Digital Mathematics Library (DML) can make this network of information a reality.

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 28

2
Potential Value of a
Digital Mathematics Library
WHAT IS MISSING FROM THE MATHEMATICAL
INFORMATION LANDSCAPE?
The current mathematical information landscape is complex and diverse,
as described in Chapter 1 and Appendix C. Current digital mathematical
resources provide services such as electronic access to papers (often with ad-
vanced features capable of searching and sorting based on key words, subject
areas, text searches, and authors), platforms for discussion, and improved
navigation across multiple data sources. What they do not do is allow a user
to systematically explore the information captured within the literature and
forums and readily explore connections that may not be obvious from look-
ing at the material alone.
This inability to easily explore the mathematical ideas that exist within a
mathematical paper, which cannot easily be searched for, is a detriment to the
mathematical community. There is a largely unexplored network of informa-
tion embedded in the connections of mathematical objects, and formalizing
this network—making it easy to see, manipulate, and explore—holds the
potential to vastly accelerate and expand currently mathematical research.
This network would consist of information from traditional resources, such
as research papers published in journals, and content dispersed in other
Internet-based resources and databases. Initial development of the DML
could begin immediately with the aim of providing a foundational platform
on which most of the capabilities discussed in this report might imaginably be
achieved in a 10- or 20-year time frame. This report discusses how the Digital
Mathematics Library (DML) can make this network of information a reality.
28

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 29
WHAT GAPS WOULD THE
DIGITAL MATHEMATICS LIBRARY FILL?
The real opportunity is in offering mathematicians new and more irect
d
ways, through the Web, to discover and explore relationships between math-
ematical concepts (such as axioms, definitions, theorems, proofs, formulas,
equations, numbers, sets, functions) and objects (such as groups, rings) and
broader knowledge (such as the evolution of a field of study; and relation-
ships between mathematical fields, concepts, and objects). Improved dis
covery and interaction in the proposed DML would make it possible to find
and examine material on a much finer scale than what is currently possible,
making connections easier to find, shortening the needed start-up time for
new research areas, and formalizing some of the logic that mathematicians
are already using in their research.
In Probability Theory: The Logic of Science, E.T. Jaynes discusses
the reasoning that many mathematicians go through when approaching
their work. He describes the strong form of reasoning as variations on
the followng: “If A is true, then B is true. A is true; therefore, B is true.”
i
Weaker forms are assertions, such as “If A is true, then B is true. B is true;
therefore, A becomes more plausible.” Jaynes states that
[George] Pólya showed that even a pure mathematician actually uses these
weaker forms of reasoning most of the time. Of course, when he publishes
a new theorem, he will try very hard to invent an argument which uses
only the first kind; but the reasoning process which led him to the theorem
in the first place almost always involves one of the weaker forms (based,
for example, on following up conjectures suggested by analogies). The
same idea is expressed in a remark of S. Banach (quoted by S. Ulam, 1957):
“Good mathematicians see analogies between theorems; great mathemati-
cians see analogies between analogies.” (Jaynes, 2003, p. 3)
The DML could help make these analogies easier to find and use.
Box 2.1 provides an example of how a mathematics researcher would
start looking into a new topic, using Gröbner bases as a specific illustra-
tion. It shows some of the initial resources that are typically used and how
their information varies from, complements, and supplements the other
resources. It also shows how useful it would be to be able to pull much of
this information into a unified source and make additional connections to
other, lesser known resources and aspects of the literature.
The DML could aggregate and make available collections of ontolo-
gies, links, and other information created and maintained by human con-
tributors and by curators and specialized machine agents with significant
editorial input from the mathematical community. The DML could afford
functionalities and services over the aggregated mathematical literature.

OCR for page 28

30 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
BOX 2.1
How a Mathematics Researcher May Currently
Approach Information Gathering
Gröbner bases were first introduced by Bruno Buchberger for solving a range
of problems in computational algebra and became an essential component of
computer algebra software (Buchberger, 2006). Suppose a mathematician wanted
to find out about this topic, perhaps because it was needed for a particular prob-
lem. First, when one types “Grobner basis” into MathSciNet, a list of around 2,400
chronologically ordered items appears, most of which are specialized papers. This
is a potentially good resource for a specialist but is probably not ideal for the nov-
ice. If a similar search is done via Google Scholar, a list of research articles and
books on the subject appear and are ordered by “popularity,” which usually reflects
some version of page ranking. While some of the references provided by Google
Scholar can be viewed, including some books on Google books, others are behind
paywalls or are books that must be purchased before reading. In Google itself, the
top five links are to Wikipedia,1 MathWorld,2 Scholarpedia,3 Mathematica code,4
and a survey article by Bernd Sturmfels.5
The Wikipedia article is limited and only contains four references but includes
the book of Cox, Little, and O’Shea (1997), which is widely recognized and a
premier introductory text on the topic. Wikipedia also offers suggested further
reading and external links. Sturmfels’s article, from the “What is . . .” section of
the Notices, is terse and contains only three references, but one of them is the
aforementioned book. MathWorld’s article is short and lacks any specifics, but
it contains a significantly longer list of references, survey articles, and several
links to Amazon for buying books (and at least one dead link). The Scholarpedia
article, written by Bruno Buchberger and Manuel Kauers, is more comprehensive
and includes many illustrations, a wide range of applications, and a long list of
references, including a Gröbner bases bibliography compiled by Buchberger and
his coworkers at the Research Institute for Symbolic Computation.6 Unfortunately,
no links are supplied in the Scholarpedia article to the other references. In many
ways, Scholarpedia, which bills itself as a “peer-reviewed open-access encyclo
pedia,” could serve as one possible model for some aspects of the proposed DML.
All of these resources combined, along with the tenacity to pursue the variety
of resources, can result in a good start in understanding Gröbner bases. How-
ever, suppose the researcher was working in an area that led to questions that
Gröbner bases could be profitably used in, but, not being an algebraist, he/she
did not know that they existed or even how to start to query any of the standard
tools. Vice versa, suppose the researcher works in Gröbner basis theory and find
results that could lead to advances in an area that he/she is not familiar with; how
would the researcher know?
Here’s a real example: Although not well known, in fact, the theory of Gröbner
bases was essentially discovered in 1910-1913 by an obscure Georgian math-
ematician, N.M. Gjunter, in his study of the integrability of overdetermined systems
of partial differential equations (Renschuch et al., 1987). It is not immediately obvi-
ous through reference searching or the standard literature that Gröbner bases are
continued

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 31
BOX 2.1 Continued
of importance in partial differential equations (although the Scholarpedia article
does mention some applications to ordinary differential equations). Moreover,
the latter area has resulted in the refined and potentially very useful concept of
an involutive basis. This particular gap could be filled by editing the above men-
tioned articles, but this is simply one of innumerable similar cases. Making such
u
nexpected links is not currently easy but could become so with a fully functioning
DML, therefore increasing the serendipitous-like discovery of connections, which
plays a role across research.
1 “Gröbner Basis,” Wikipedia, http://en.wikipedia.org/wiki/Gr%C3%B6bner_basis, accessed
January 16, 2014.
2 E.W. Weisstein, “Gröbner Basis,” MathWorld—A Wolfram Web Resource, http:// athworld.
m
wolfram.com/GroebnerBasis.html.
3 B. Buchberger and M. Kauers, Groebner basis, Scholarpedia 5(10):7763, 2010.
4 “GroebnerBasis,” built-in Mathematica symbol, Wolfram Mathematica 9, last modified in
Mathematica 6, http://reference.wolfram.com/mathematica/ref/GroebnerBasis.html.
5 B. Sturmfels, What is a Gröbner Basis? Notices of the AMS 52(10), 2005, http://math.
berkeley.edu/~bernd/what-is.pdf.
6 B. Buchberger and A. Zapletal, Gröbner Bases Bibliography, http://www.ricam.oeaw.ac.at/
Groebner-Bases-Bibliography/search.php.
While it would have to store modest amounts of new knowledge structures
and indices, it would not have to generally replicate mathematical literature
stored elsewhere.
The committee identified a number of basic desired library capabili-
ties, including aggregation and documentation of information, annotation,
search and discovery, navigation, and visualization and analytics. Properly
implemented across the domain of mathematics research literature, these
capabilities and resulting enhanced functionalities would not only facilitate
better and more efficient search and discovery, but also allow mathemati-
cians to interact with the research literature in new ways and at new levels
of granularity. The proposed DML is much more than an indexing service
and aims to create meaningful connections between topics by utilizing
lists of entities and providing coherent access to a range of tools that can
speed up mathematical discovery: for example, comprehensive encyclopedia
articles and review articles, lists of mathematical objects, implication dia-
grams, and annotated bibliographies, informal annotations, and comments
on articles. These tools and others are discussed in Chapter 5.
The DML would not only result in new efficiencies, thereby freeing up
researcher time, but also enable experimentation with new approaches to

OCR for page 28

32 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
using and getting the maximum benefit out of the mathematics research
literature. The remainder of this section describes each of these desired
capabilities and illustrates how resultant improved functionality could ad-
vance mathematics research.
Aggregation and Documentation
Mathematicians want to be able to make searchable and sharable collec-
tions or lists of various kinds of mathematical objects easily, including bibli-
ographies of the mathematical literature, perhaps with annotations. This is
an area where it should be very easy to make rapid progress. The ssues of
i
mathematical object representations are mostly about who is allowed to cre-
ate, view, and update various lists and about resource management. Many
of these types of lists (such as those mentioned in Chapter 1) currently exist,
some with connections to the literature, but their existence is often tied to
the survival of the curator’s personal website. Providing a stable platform
for housing and connecting these lists would also allow for this information
to be incorporated in the collective knowledge of the DML.
The availability and interconnection between these lists would allow a
larger network of mathematical information to be developed. This would
be on a finer scale than what is currently available and facilitate higher-level
features of advanced search and navigation. The world of mathematical
knowledge goes much deeper than the level of research papers; it goes down
into the content that is discussed within the papers, the knowledge that is
assumed already to be understood by the reader, and the connections that
exist between this information. If the DML could draw on this information,
it would have a much more meaningful view of mathematics.
Lists of Mathematical Objects in New Areas
While many books contain fairly comprehensive descriptions of the-
orems relating to a specific subject and substantial stand-alone lists of
theorems have been prepared, the committee is not aware of any truly
comprehensive list of theorems in any branch of mathematics. Moreover,
“lists” as embodied in books are not necessarily designed to enable all the
functionality envisioned for the DML. There have also been several efforts
to establish a formal computer-aided proof capability (Wiedijk, 2007), but
it has not had much impact on the larger mathematics community. Mizar
has published the largest such collection of about 50,000 formally checked
theorems.1 New mathematical theorems and lemmas are proven and pub-
1 Mizar Home Page, last modified January 8, 2014, http://mizar.org/.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 33
lished on a routine basis.2 There have also been efforts to identify and list in
order the most important mathematical theorems (such as the list presented
by Paul and Jack Abad in 19993) based on assessments of their place in the
literature, each theorem’s proof, and the unexpectedness of the result. Even
if all existing theorems and lemmas were indexed and organized in some
way, there needs to be a way to continually update this list with new work.
Although even a list of theorems would be valuable, or a collection of
text articles about each theorem, modern knowledge representation tech-
niques offer more ambitious possibilities. For example, collections of rep-
resented facts such as DBPedia4 or Freebase5 permit retrieval of data about
the real world, such as populations or areas of nations and towns. Library
and museum catalogs are being converted to formal Resource Description
Framework (RDF)6 statements. Having a rigorous description of a theorem
enables logical deduction and comparison of that theorem with others. The
generality of mathematics is one of its beauties, and when the same form
appears under two different names, it implies an unsuspected applicability
of each theorem.
It appears within the grasp of modern information management tools
to develop a machine-readable repository of mathematical theorems and
definitions in which theorems are expressed as statements about terms,
terms are linked to definitions, and definitions are constructed from logical
statements about other terms. This is certainly very challenging, but the
first steps in this direction have been made by Wolfram|Alpha for continued
fractions, with a formalism for canonical representations of theorems that
appears simple and flexible enough to be more widely adopted and used
for purposes of search, retrieval, and linking. The Mizar Project also has a
large database of formal theorem statements and formal proofs, although
this is much less easily accessible to a working mathematician. How to do
this on a large scale is still an open problem, but there are indications that
efforts of this kind should be rewarding (Billey and Tenner, 2013).
Only the definitions and the theorem statements need to be machine-
readable—the proofs can be LaTeX or a citation. Technologies like RDF
2 In his 1998 biography of Paul Erdös, Paul Hoffman reports that mathematician Ronald
Graham estimated that upwards of 250,000 theorems were being published each year at that
time (Hoffman, 1987).
3 P. Abad and J. Abad, The Hundred Greatest Theorems, 1999, http://pirate.shu.edu/~kahlnath/
Top100.html.
4 DBpedia, About, last modification September 17, 2013, http://dbpedia.org/About.
5 Freebase, http://www.freebase.com/, accessed January 16, 2014.
6 RDF is a standard model for data interchange on the Web and facilitates data merging even
in the case of differing underlying schemas. See WC3 Semantic Web, “Resource Description
Framework (RDF),” last modified March 22, 2013, http://www.w3.org/RDF/.

OCR for page 28

34 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
and OWL7 may be useful for encoding the theorems’ statements and the
definitions. These technologies are flexible enough to allow users to extend
the ontology, while encouraging reuse of existing terms. The markup lan-
guages used by automatic theorem provers could also be useful because they
are sufficiently flexible to encode many important theorems, but they might
not do enough to encourage reuse of terms.
The theorem and lemma repository would benefit from being accessible
to programs via an application programming interface, which is a protocol
used to allow software components to easily communicate with each other
and may include specifications for routines, data structures, object classes,
and/or variables.
Researchers will likely submit their theorems through a Web-based
interface if it helps them to get citations and to stake a claim to hav-
ing proved it first. There are a lot of famous cases where theorems were
proven independently by multiple individuals using different terminology.
A machine-readable repository could detect duplicate terms and theorems
so that researchers can focus on new results rather than proving what is
already known. The main benefit, however, may come from granting pro-
grams access to the latest mathematical results through user submissions.
Another data type worthy of consideration in a DML is problems.
Good problems spur research advances. Problem lists have been created
and maintained at various times, most famously Hilbert’s list of problems
around the beginning of the 20th century. Some recent efforts at curation of
problem lists are the Open Problem Garden8 and the the American Institute
of Mathematics’ Problem Lists.9 A community feature encouraging creation
and maintenance of problem lists with adequate links to the literature and
indications of status could be an important component of the DML.
Annotation
Mathematicians want to be able to annotate mathematical documents
in various ways and share these annotations with collaborators or students
and, in some cases, publish these annotations for the benefit of a wider but
closed group (a set of collaborators, or a seminar, or a cohort of doctoral
students) or the general public. The ability to easily share notes could im-
prove the learning curve for researchers in new areas, provide opportunities
to learn from other researchers interested in similar topics, elucidate logic
7 W3C, “OWL Web Ontology Language Overview,” February 10, 2004, http://www.w3.org/
TR/owl-features/.
8 Open Problem Garden, http://www.openproblemgarden.org/, accessed January 16, 2014.
9 American Institute of Mathematics, AIM Problem Lists, http://aimpl.org/, accessed Janu-
ary 16, 2014.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 35
that is not explicitly stated in papers, allow authors to post corrections, and
overall enrich the research discussion. Some mathematicians prefer to keep
comments limited to a smaller group, while others are more comfortable
posting openly. Either way, this enhancement to the traditional research
paper could quicken the path toward understanding and at the same time
enhance the DML’s capability to traverse the literature.
The ability to see others’ annotations as well as create new annotations
would make reading a paper not only easier, but potentially more interest-
ing. Some links could point to other items residing in the digital library,
while others point to popular sites such as MathOverflow and Wikipedia
or other sites outside the DML. For researchers setting out in a new direc-
tion or for researchers in an isolated location, it is often difficult to get
involved in a lively conversation with fellow researchers. Links to discus-
sions and comments on research papers and theorems could be a way to
expand research discussions to a new level. Senior mathematicians could
provide some general background information to research papers, such as
a basic prerequisite for understanding the paper and some suggested read-
ings; this would assist students and people starting out in a new direction.
It should be possible for individual users to tailor the writing and reading
of comments. It could also be useful to be able to select or prioritize, in
several possible ways to be set by each user individually, the comments that
appear on one’s screen while searching (e.g., so as to see most prominently
the comments from other members in an existing collaboration group or
from a commenter one has experienced earlier as particularly insightful on
a particular topic).
An important component of successfully providing an annotation fea-
ture within the DML is separating unhelpful comments and deciding which
annotations will be kept in the system. Nearly every system that allows pub-
lic comments also has a way to flag unconstructive comments and responses
as inappropriate for that platform. A system such as this may need to be
developed for the DML and refined based on the kinds of comments and
feedback that the DML receives. One example of this is how MathOverflow
deals with user input that its established users deem to be spam, offensive,
or in need of attention for any other reason.10 Elected community moni-
tors are established within MathOverflow, and experienced users are able
to flag comments and posts for a moderator’s attention. The moderator can
then decide what action is needed (deleting spam, closing off-topic posts,
removing poorly rated posts, and so on). A system like this may work well
for the DML.
10 MathOverflow, Help Center, Reputation and Moderation, “Who are the site moderators,
and what is their role here?,” http://mathoverflow.net/help/site-moderators, accessed Janu-
ary 16, 2014.

OCR for page 28

36 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
General support for the creation of basic text annotations has been
available for some time, including for mathematics literature made avail-
able as PDF or in HTML format. Support for more sophisticated forms of
semantic annotation and for the sharing of annotations across disparate
content repositories is rapidly maturing through technology from other
domains,11 but these technologies have yet to be customized for use in
mathematics. Adapting these technologies to the mathematical community
requires adequate support for mathematical markup. Some Web services
are expanding into mathematical markup. For example, Authorea12 uses a
robust source control system in the backend (git) and an engine to under
stand LaTeX, Markdown, and most Web formats. Authorea lets users write
articles collaboratively online, and it renders them in HTML5 inside a Web
browser. Authorea is a spin-off initiative of Harvard University and the
Harvard-Smithsonian Center for Astrophysics.
There are numerous other tools available that provide for “wiki-like”
structured discussions with attribution dates and hierarchical organization,
such as PBWorks.13 There are also tools for highlighting, summarizing, pro-
viding video and audio annotation, mapping documents, and collaborative
reading; some are specialized to particular document formats, and some
are not. The Mellon project on Digital Research Tools14 has a list of more
than 500 tools, of which nearly 80 are tagged as annotation systems. Some
are automated (e.g., part of speech tagging), but most are tools for use by
readers or writers, either individually or in groups.
Adding this capability to the readily available digital literature should
not be overly complicated. There would need to be conventions established
for where the annotations are stored and who is responsible for storing
them, and the best default setting for privacy and sharing would also need
to be established. These annotations can provide a bridge to community-
sourced markup of objects or a way to pass information to editors (human-
or software-based) that curate the collection, thereby further enriching the
DML. This is just one way in which user and community input would play
a role in the DML; many others are listed elsewhere in this report. Commu-
nity support for the new digital library will be essential for its success and
11 World Wide Web Consortium (W3C) Open Annotation Community Group (http://www.
w3.org/community/openannotation/), Domeo (life science domain, http://swan. indinformatics.
m
org/), Shared Canvas (humanities domain, http://www.shared-canvas.org/), Maphub (annotation
of maps, http://maphub.github.io/), Pundit (annotation of Web content, http://www.thepund.it/),
and LoreStore/Aus-e-Lit (collaborative annotation of literary works, http://www.itee.uq.edu.au/
eresearch/projects/aus-e-lit), all accessed January 16, 2014.
12 Authorea, https://www.authorea.com/, accessed January 16, 2014.
13 PBWorks, http://pbworks.com/, accessed January 16, 2014.
14 Andrew W. Mellon Foundation, Bamboo DiRT, http://dirt.projectbamboo.org, accessed
January 16, 2014.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 37
also an essential way in which it could be much more than just a collection
of mathematical information and links to other repositories and services.
Recommendation: A primary role of the Digital Mathematics Library
should be to provide a platform that engages the mathematical com-
munity in enriching the library’s knowledge base and identifies connec-
tions in the data.
Search and Discovery
Mathematicians want to be able to understand mathematical objects—
such as an equation, theorem, or hypothesis—more effectively and with
greater ease. This quest can be aided by having the ability to specify a
mathematical object either in natural language or more formal notation
and get information on where other uses of the object appeared in the lit-
erature, definitions of the object, or related objects of interest. For example,
consider questions of the form: “Given a hypothesis, what theorems involve
this hypothesis?” or “Given a partial list of hypotheses and some conclu-
sion, what additional hypotheses are known to imply the conclusion?”
The ability to ask and receive meaningful information about questions
such as these is largely out of reach of current technology. It will require
considerable research and investment to get even partway there. But the
committee sees first steps toward realizing such capabilities in the innova-
tive work of Wolfram|Alpha in the restricted domain of continued frac-
tions.15 Wolfram|Alpha prototyped and built a technological infrastructure
for collecting, tagging, storing, and searching mathematical knowledge of
continued fractions and presents it through a Wolfram|Alpha-like natural
language interface. The main types of knowledge provided in this work are
theorems, mathematical identities, definitions and concepts, algorithms,
visualizations and interactive demonstrations, and references.
The committee believes there are many other subdomains within math-
ematics where significant advances on such very difficult problems may
be possible with some mixture of modern methods of natural language
processing and machine learning, expert human analysis of the literature
of the subdomain (aided by computer), and knowledge representation ap-
proaches. Beyond hints of broad feasibility, the Wolfram|Alpha experience
suggests the following:
15 M. Trott and E.W. Weisstein, “Computational Knowledge of Continued Frac-
tions,” WolframAlpha Blog, May 16, 2013, http://blog.wolframalpha.com/2013/05/16/
computational-knowledge-of-continued-fractions/.

OCR for page 28

38 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
• Key characteristics may be identified to make specific subdomains
more feasible;
• It is possible to understand which of those subdomains are likely to
be valuable to mathematicians, if they are appropriately captured
and represented; and
• It is possible to understand how to encode knowledge so it is not
specific to a single computing platform.
From here, one could imagine funded investments to encode specific math-
ematical subdomains in parallel to investment in work on the more general
problem. Such subdomain-specific campaigns could be carried out as part
of larger literature analysis efforts in the subdomain, which would build up
or enrich the ontology and the link databases of the DML.
Intelligent information extraction and transfer are needed. For instance,
it would be helpful if a user could just highlight a formula and then click
on a button that submits the formula to a DML service that responds to
some obvious questions, such as the following:
• Is this a well-known formula?
• Is it close to one in some curated list of formulas?
• Does it have a name? A homepage?
• Can it be parsed directly into a rigorous format for computation?
If not, can the user be provided with some indications of the am-
biguities encountered in parsing, and make choices as to which
meaning is intended?
Moreover, it would be useful to be able to do this for more complex objects
such as theorems and hypotheses. The committee does not wish to be too
prescriptive about exactly how such capabilities and services might develop.
In some specific domains, such as special functions and integer sequences,
the necessary database of mathematical information is largely already con-
structed. The remaining issues are as follows:
• Social—Getting data to where they can be machine processed for
development of services, and
• Technical—Building an adequate human-computer interface to
enable users to interact with such databases in their everyday
mathematical work.
The committee sees enormous potential for developments in this area by
some concerted research effort involving a team of people with complemen-
tary expertise in machine learning, natural language processing, human-
computer interaction, and mathematical knowledge representation. The

OCR for page 28

44 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
tion p for a generic permutation of a finite number of items is less likely
to be confused with 3.141592… when the context is already identified as
pertaining to combinatorics rather than analysis or geometry. In practice,
it may well be that once sufficient statistics on usage have been collected,
such disambiguation could be done based on only statistical data.
Additional problems are posed by the historical literature: as a field
evolves, notations and terminology change, making connections to older
literature treating the same mathematical objects even more daunting.
Historically, notation conventions may vary where change is a reflection of
increased complexity and deeper understanding. In mathematics literature,
the gradual evolution in terminology and notation includes disputes that
usually (but not always) get resolved on what to call and how to represent
concepts, theorems, objects, etc. The evolution also reflects the integration
of work spanning many languages and cultures, each with their own idio-
syncrasies. Mapping back to earlier representations and concepts may not
be straightforward or direct.
In order to provide the careful typesetting modern mathematicians
require, precise typesetting and document preparation systems have been
developed, of which the most widely used is TeX,18 together with its
escendants LaTeX, LaTeXe, etc.19 TeX and its derived systems lead to
d
nicely typeset formulas (all the examples in Figures 3-1, 3-2, and 3-3 were
realized this way), and they have become an indispensable tool for math-
ematicians (most of whom do their own typesetting for papers they submit
for publication). At first sight, the LaTeX source code for a formula could
be thought a good candidate for an international mathematical formula
identifier. However, LaTeX is a presentation format, and equations in
L
aTeX cannot be easily converted to a semantic representation that can be
used in other contexts. As a simple example of this problem, finding a string
in italics might mean, depending on the context and style, that it is a journal
title or a foreign word; to present the document in a different format or cre-
ate metadata, one needs to know the semantic significance underlying the
typographic display. Often, there is no one-to-one correspondence between
a mathematical formula as it appears on the printed page and the LaTeX
instructions leading to it; this nonuniqueness is even more pronounced if
one takes into account small variations in spacing (or changes of names
of variables, as illustrated above) that would not affect the reading of the
mathematical meaning of the formula by a mathematician. In this sense,
the LaTeX code for a formula would seem to fall short as a direct template
for a putative international mathematical formula identifier (as discussed
18 “TeX,” Wikipedia, last modified January 7, 2014, http://en.wikipedia.org/wiki/TeX.
19 LaTeX—A document preparation system, last revised January 10, 2010, http://www.
latex-project.org/.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 45
earlier in this section). However, the National Institute for Standards and
Technology Digital Library of Mathematical Functions20 uses metadata
embedded in the LaTeX code used to typeset the formulas to enable formula
and notation search. This LaTeX metadata search, while not quite a LaTeX
formula search, is fairly successful in dealing with dynamic notation and
terminology change in the literature of special functions.
An option for semantic representation of mathematical formulas can be
provided by MathML,21 which allows for mathematics to be described for
machine-to-machine communication and is formatted so that it can easily
be displayed in webpages. There have already been some research efforts
along the lines suggested above, and there are a limited number of both
experimental and production systems available that involve some kind of
formula search. In particular,
• There is some level of formula search in EuDML, using MIaS/
WebMIaS (Math Indexer and Searcher),22 a math-aware, full-text-
based search engine developed by Petr Sojka and his group (Sojka
and Líška, 2011).23 An approach based on Presentation MathML
using similarity of math subformulae is suggested and verified by
implementing it as a math-aware search engine based on the state-
of-the-art system Apache Lucene.24
• Some type of characterization of formulas is inherent to the searches
underlying the Wolfram|Alpha engine. As part of a roject in seeing
p
whether mathematicians would find it useful to be able to search
the literature for formulas, Michael Trott and Eric Weinstein of
Wolfram|Alpha implemented some characterization of formulas
for the research literature on continued fractions (essentially pro-
gramming it “manually”). This small, fairly contained body of
literature was chosen because most of the relevant papers are
now in the public domain. However, the field of continued frac-
tions is not very active at this point, and it may be hard to get a
good sample basis of users to assess whether this search capability
would lead mathematicians to new ways of using or searching the
20 National Institute for Standards and Technology (NIST), Digital Library of Mathematical
Functions, Version 1.0.6, release date May 6, 2013, http://dlmf.nist.gov/.
21 W3C, “Math Home,” updated November 26, 2013, http://www.w3.org/Math/, accessed
January 16, 2014.
22 EuDML@MU, “MIaS/WebMIaS,” last change October 28, 2013, https://mir.fi.muni.cz/
mias/.
23 See also Petr Sojka’s webpage at Masaryk University, Brno, last updated December 3,
2013, http://www.fi.muni.cz/usr/sojka/.
24 Apache Software Foundation, “Apache Lucene Core,” http://lucene.apache.org/core/,
accessed January 16, 2014.

OCR for page 28

46 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
literature. It should be noted that the Mathematica-based formula
characterization/search underlying Wolfram|Alpha is proprietary,
in contrast to the completely nonproprietary nature of the InChI,
which would also be desirable for an international mathematical
formula identifier.
• Springer LaTeX Search25 allows researchers to search for LaTeX-
formatted equations in all of Springer’s journals. In an issue of
“Author Zone,”26 Springer’s eNewsletter for authors, Springer
reveals that this free tool, which searches over a corpus of 120,000
Springer articles in mathematics and related fields, was created by
8 months of engineering a process that normalizes LaTeX equa-
tions. An open tool such as this would be valuable to the DML and
to other mathematical indexing services.
Finding: While fully automated recognition of mathematical concepts
and ideas (e.g., theorems, proofs, sequences, groups) is not yet possible,
significant benefit can be realized by utilizing existing scalable methods
and algorithms to assist human agents in identifying important math-
ematical concepts contained in the research literature—even while fully
automated recognition remains something to aspire to.
Navigation
Mathematicians want the ability to navigate and explore the corpus
of mathematical documents available to them, be it through institutional
library services or through free services. This goes well beyond accessing
electronic versions of papers by following citations. The ability to click on
an object in a document and be able to quickly find additional information
about that object might help a mathematician decide whether to exam-
ine it further. Such additional information on an object might include the
following:
• Other articles discussing the same object, or perhaps slightly more
general or specific objects (and not necessarily with the same names);
• A description of when and where that object was first defined in
the literature;
• A list of reference resources (textbooks, encyclopedia entries, sur-
vey articles) with information about the object; and
25 Springer, LaTeX Search, http://www.latexsearch.com/, accessed January 16, 2014.
26 Springer, “LaTeXSearch.com: Introducing the latest Springer eProduct in the field of Math-
ematics,” http://www.springer.com/authors/author+zone?SGWID=0-168002-12-693906-0,
accessed January 16, 2014.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 47
• Different representations of the object (such as a LaTeX fragment
or as Mathematica® code).
This is an area where it should be possible to make rapid progress, given a
foundational DML investment in ontologies and links.
Improved navigation of the mathematics literature would enhance re-
search capabilities in several ways. It would allow a researcher to find
different resources and publications more easily and to find seemingly
unrelated but relevant topics within the literature. It would also help a
researcher to address the simply stated but inherently complex question,
“Has this been done before?” Being able to answer this question would save
valuable research time and simplify the problem-solving track, all while
making the existing literature’s structure more transparent and easy to use.
The Citation Graph
Research articles can be viewed as the vertices in a large directed graph
in which article A “points to” article B if A cites B. This citation graph is
mostly tree-like: references are typically to older articles, although there
are certainly cases of more or less contemporaneous articles that cite each
other; some larger loops probably exist as well. Researchers interested in
learning about a new direction or subject typically explore this graph; they
start reading a particular research paper of interest and then climb back
along the branches, reading some of its references and then some of the
references of those papers, and so on. The creation of a citation index,
as provided by MathSciNet within mathematics and by Google Scholar,
Scopus,27 and Web of Science across many more fields, allows the user to
traverse the graph in the reverse direction, that is, to find for each paper
all the articles that cite it. This very useful search tool makes it possible to
easily find recent developments based on a paper of interest. Users would
then be able to easily integrate or compare such information with whatever
could be provided by other indexing services.
Making such comparisons or aggregations is at present very difficult.
An expert user can do it in a few clicks by cutting and pasting from one
browser window to another, but it is a few clicks for each resource, per-
haps 12 clicks to compare returns from all three of these services. But
with modern browser extension capabilities, such as those provided by
Scholarometer,28 which harvests data from Google Scholar, it is straight-
forward to write a dedicated browser extension for mathematical search
27 Elsevier B.V., Scopus, http://www.scopus.com/home.url, accessed January 16, 2014.
28 Indiana University, Scholarometer, http://scholarometer.indiana.edu/, accessed January 16,
2014.

OCR for page 28

48 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
and retrieval that would take a reference string from almost any source.
The committee sees this kind of on-the-fly querying and aggregation of data
from multiple services as the solution to the vexing compartmentalization
problem for indexing services.
A DML navigating tool could incorporate some mechanism for sort-
ing and prioritizing the references it produces. A desirable feature of an
open service is that such algorithms for ranking could be adjusted if so
desired by the user, based on some special search criterion tailored by the
searcher right then, or possibly influenced by the searcher’s past preference
behavior that is recorded by the system. Other basic questions that can be
addressed by integration of DML data with data from various more-or-less-
cooperative search service providers include the following:
• Which articles are cited in this paper? (This information is typically
provided in the paper’s list of references.)
• Which articles cite this paper? (This is a search that looks forwards
in time, looking for papers that list this paper as a reference.)
• Which articles cite both papers A and B?
• Which articles are cited in both A and B?
Techniques for data analysis using methods such as bibliographic cou-
pling and citation analysis are well established, and available software
could be deployed for the benefit of DML users. A significant amount
of citation data in mathematics and related fields is already more or less
openly available from various open-access sources.
It should be possible to assemble accessible enhanced visualizations
and graphical displays that capture features of a bibliographic data set that
are not easy to find in a textual representation, and to make these features
useful for search. Interactions between objects in a data set can be revealed
by graphical displays within a browser (MacGillivray, 2013). Search re-
sults can be visualized in open formats, such as Scalable Vector Graphics
(SVG),29 and can be obtained from open search systems such as Lucene30
or ElasticSearch.31 Because today’s widespread availability of all kinds of
data is increasing attention on the need for better visualization tools, the
committee anticipates that greatly improved open-source tools for graphical
displays will become widely available and easily deployable to demonstrate
interesting and novel features of the graphical relations in bibliographic
29 “Scalable Vector Graphics,” Wikipedia, http://en.wikipedia.org/wiki/Scalable_Vector_
Graphics, accessed January 16, 2014.
30 Apache Software Foundation, “Welcome to Apache Lucene,” http://lucene.apache.org/,
accessed January 16, 2014.
31 Elasticsearch, http://www.elasticsearch.org/, accessed January 16, 2014.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 49
data sets, not just those derived from citation graphs, but also those from
collaboration graphs32 and other graphs associated with relations between
mathematical entities, such as implications or similarities.
As more data about the citation and collaboration graphs in various
disciplines have become available, they have also been used as a tool for
ranking the impact of specific scholarly journals over time and have begun
to be factored into the evaluation of individual researchers within the
tenure and promotion process, where enthusiasm about their quantitative
and “objective” nature has increasingly overcome very real concerns about
their limitations and inaccuracies as a measure of the impact of a given
scholar. A good deal of work has been done proposing various so-called
alternative metrics (“alt-metrics”)33 for scholarly impact both at the article
level and aggregated to characterize the contributions of a scholar (e.g.,
the h-index34). Analytics of these sorts are more likely to be useful to track
topics than to measure the worth of theorems, journals, or individuals
because often they are easy to manipulate and do not accurately reflect the
community’s view of importance (Arnold and Fowler, 2011; López-Cózar
et al., 2013).
There is also real interest among working scholars in the possibility
of tracking the evolution of these graphs (probably in conjunction with
other data, such as popularity of articles) in order to help allocate precious
reading time by identifying emergent, potentially high-impact or high-
interest articles within or across specific subdisciplines, and a hope that
article-based metrics can be developed to assist with this. The availability
of citation and collaboration graph data, in combination with other infor-
mation provided by the DML, would be an important step in advancing
these research programs.
Tracking Article-to-Article Reading
Beyond simply exploring the citation graph, it may be desirable to
obtain and exploit information about what other users of the DML have
found useful as they explored the graph. For instance, what is the answer to
the question, “Which articles did readers like, who are (like me) interested
in A1, A2, and A3?” This way, one could find papers that do not specifi-
cally reference each other but concern the same topic. (This type of linking
32 Collaboration graphs are already attractively viewable on Microsoft Academic Search
with the proprietary Microsoft Silverlight software.
33 Altmetrics, “Altmetrics: A Manifesto,” v 1.01, September 28, 2011, http://altmetrics.
org/manifesto/.
34 The h-index is an index that attempts to measure both the productivity and impact of the
published work of a researcher based on the set of his/her most cited papers and the number
of citations that they have received in other publications (Hirsch, 2005).

OCR for page 28

50 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
is a routine task, practiced by many online stores: “others who liked this
also liked. . . .”) It does, however, rely on a large user base to traverse the
various graphs involved. Such a user base could be developed only with
strong incentives for users to participate, such as superior navigation and
search tools, so it is to be expected that such methods will be useful only
late in DML development.
Recommender systems, like the one described in the previous para-
graph, based on user tracking or ones based on “liking” a paper or topic
within a system, are not new and are currently employed by Google Scholar
and Elsevier, among others. They could also be developed within other
information resources such as arXiv and MathSciNet.
These methods also raise privacy issues as users navigate a network of
DML information. Concerns about privacy issues can often be addressed
with customizable privacy settings (e.g., private navigation without login,
public navigation with some anonymization of users, and possibly public
navigation with public identity). It is important that the different models for
maintaining user privacy are examined and assessed, and that a meaningful
approach toward privacy be established for the DML.
Widely available machine learning algorithms can be used to predict the
preference rating of as-yet-unseen articles by a customer for whom only a
very partial profile is available, based on (often equally partial) profiles of
other customers. A highly publicized recent success was achieved through
the Netflix Prize competition in which Netflix “sought to substantially
improve the accuracy of predictions about how much someone is going
to enjoy a movie based on their movie preferences.”35 The final winning
algorithm in that contest was an intelligent combination of strategies that
alone produced insufficient improvement. This demonstrated that substan-
tial progress can be achieved by combining different approaches that may
be less spectacular when evaluated independently of one another. Such
incremental improvements may not be very interesting from the perspective
of machine learning research, but they are potentially useful in production
applications of machine learning algorithms that the DML could provide.
The Mathematical Concept Graph
Mathematical research can also be aided by considering mathematical
objects other than papers, through exploration of their connections in a
directed graph. For instance, in the answer to the question, Which theorems
or papers use theorem T?, the different links would likely be references to
classical results and to later improvements that were made since theorem
T first appeared. The committee imagines both supervised and unsuper-
35 Netflix, “Netflix Prize,” http://www.netflixprize.com/, accessed January 16, 2014.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 51
vised learning approaches to these problems. In supervised learning, the
machine starts from a list of known concepts, say functions or theorems,
and then attempts to identify various instantiations of that concept. This is
similar to automated library cataloging with a fixed structure of categories.
Unsupervised learning is instead a process of clustering of instances—for
example, deciding which theorems are essentially the same. At the level of
LaTeX encoded formulas, some version of this capability, and a consequent
search-and-discovery mechanism, is already achieved by Springer’s LaTeX
Search capability.
As further motivation for such efforts, which may be very challenging,
the committee notes that Don Swanson identified useful public, yet undis-
covered, knowledge in the biomedical domain by examining under-explored
connections between clinical observations (Swanson, 1986, 1987). Despite
efforts over the past few decades to automate the discovery of new scientific
hypotheses based on literature analysis, insight from a human researcher is
still needed. Ganiz et al. (2005) suggested that domains other than medicine
should be explored. The committee believes that similar “literature discov-
ery” methods could lead to interesting (and underexploited) connections
between different mathematical fields or results.
Visualization and Analytics
One way to help mathematicians learn from the large, complex, and
rapidly growing and evolving literature base is to employ tools that are
being developed to analyze data in a wide variety of settings, including
both visualization tools and other analytical and statistical approaches.
These tools could exploit the natural graphical structure of co-authorship
and citation graphs and the relations among various kinds of mathemati-
cal objects and the parts of the literature that discuss these objects (as
described in the previous section). The availability of an ontology for
mathematical objects is important, and new tools are being developed that
perform visualization guided by both an ontology and a set of data tagged
according to the ontology (such as a collection of papers, or theorems, in
a mathematical scenario). Note that in most cases, the committee expects
that general-purpose graph analysis and visualization tools will be used, not
tools developed by the DML.
The DML’s role would be to help mathematicians find the right tools
and ensure that data from the mathematical literature and knowledge base
are available in forms and formats, and through interfaces, that make it
easy to use these general purpose tools. Presumably, progress in this area
would be quick, given the availability of the DML’s underlying ontology
and link collections, because it can build on other large investments that
are under way already.

OCR for page 28

52 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
The committee does not expect the DML to be a contributor, but
rather a testbed, for deploying methods for visualizing data. There are
many widely deployed methods that can be applied to bibliographic data
on the scale envisioned for the DML, which is modest compared to many
big data projects. Microsoft Academic Search36 already provides attractive
displays of the collaboration graphs across its corpus using its propri-
etary Silverlight™ software. While open-source alternatives would be more
a
ttractive, either the DML or other agents could easily offer such displays
over DML data as soon as they are collected. This would provide an ad-
vantage over the quality of text data displays offered by the mathematical
reviewing services. Similar displays could easily be provided for navigation
and indication of relations between subjects at the level of MSC2010, which
would greatly improve on past efforts.
Computational Capabilities
The committee wishes to promote cooperation between the DML and
computational service providers to allow users functionality, such as being
able to cut a formula out of a mathematical document and paste it into a
computing environment. This can already be done to some extent for simple
formulas by cutting, massaging, and pasting a formula into Wolfram|Alpha,
which uses natural language processing methods to match natural language
queries with more formal knowledge representations.
The mathematics community uses a variety of simulation software—
both numerical (such as Matlab,37 Octave,38 Python,39 R,40 Origin41) and
symbolic (such as Maple,42 Mathematica,43 Sage44). Most software tools
have different formatting requirements, and these would have to be taken
into account when transporting formulas to and from them.
36 Microsoft Academic Search, http://academic.research.microsoft.com/, accessed January 16,
2014.
37 MathWorks, MATLAB, “Overview,” http://www.mathworks.com/products/matlab/,
accessed January 16, 2014.
38 GNU Octave, http://www.gnu.org/software/octave/, accessed January 16, 2014.
39 Python Software Foundation, “Python Programming Language—Official Website,” http://
www.python.org/, accessed January 16, 2014.
40 R Project for Statistical Computing, http://www.r-project.org/, accessed January 16, 2014.
41 OriginLab Corporation, “Origin,” http://www.originlab.com/index.aspx?go=Products/
Origin, accessed January 16, 2014.
42 Maplesoft, “Maple 17,” http://www.maplesoft.com/products/maple/, accessed January 16,
2014.
43 Wolfram, Mathematica, http://www.wolfram.com/mathematica/, accessed January 16,
2014.
44 Sagemath, homepage, http://www.sagemath.org/, accessed January 16, 2014.

OCR for page 28

POTENTIAL VALUE OF A DIGITAL MATHEMATICS LIBRARY 53
Recommendation: The Digital Mathematics Library should rely on
citation indexing, community sourcing, and a combination of other
computationally based methods for linking among articles, concepts,
authors, and so on.
Other Useful Features
Application programming interfaces, which allow for add-on applica-
tions to be built by independent users and groups, are useful for experimen-
tation with the processing of and understanding of mathematics. There are
likely other tools that the DML could support that would be useful to the
mathematics community.
For instance, there is still a need for a good pdf reader for mathe
matics. Most mathematicians still print out papers they really want to read,
even if they own and mostly use an e-book reader for their other reading
needs. When asked why they prefer reading mathematics from a print-out,
researchers told the committee that they want to be able to flip back and
forth, have difficulty concentrating on an electronic version, and miss the
ability to annotate the paper with a pen or pencil. The DML could provide
an environment to try out experimental readers.
Even prior to the existence of the DML, one could gain experience and
better understanding of the feasibility and value of these technologies with
the help of testbed platforms. These could serve as a framework for research
programs to explore promising technologies and services, including extrac-
tion and identification of mathematical objects and applications of tagging
or classification (including, perhaps, community-sourced approaches).
Experiments with structuring math knowledge into Wolfram|Alpha
have been very promising and provocative. These are worth extending into
other areas to gain additional understanding of effectiveness and limits. It
would be of interest to select areas that are of active research interest. A key
issue here, however, is understanding how to extend or share this beyond
just Wolfram|Alpha and to make the investment reuseable in other settings.
REFERENCES
Arnold, D.N., and K.K. Fowler. 2011. Nefarious numbers. Notices of the AMS 58(3):434-437.
Billey, S.C., and B.E. Tenner. 2013. Fingerprint databases for theorems. Notices of the AMS
60(8):1034-1039.
Bosch, A., A. Zisserman, and X. Muoz. 2007. Image classification using random forests and
ferns. Pp. 1-8 in IEEE 11th International Conference on Computer Vision. doi:10.1109/
ICCV.2007.4409066.
Buchberger, B. 2006. Bruno Bucherger’s PhD thesis: 1965: An algorithm for finding the basis
elements of the residue class ring of a zero dimensional polynomial ideal. Journal of
Symbolic Computation 41(3-4):475-511.

OCR for page 28

54 DEVELOPING A 21ST CENTURY MATHEMATICS LIBRARY
Cox, D., J. Little, and D. O’Shea. 1997. Ideals, Varieties, and Algorithms: An Introduction
to Computational Algebraic Geometry and Commutative Algebra. Springer, New York.
Ganiz, M.C., W.M. Pottenger, and C.D. Janneck. 2005. Recent Advances in Literature-Based
Discovery. Technical Report LU-CSE-05-027. Lehigh University, Bethlehem, Pa.
Heller, S., A. McNaught, S. Stein, D. Tchekhovskoi, and I. Pletnev. 2013. InChI the worldwide
chemical structure identifier standard. Journal of Cheminformatics 5(7).
Hoffman, P. 1987. The man who loves only numbers. Atlantic Monthly 260(5):60.
Hirsch, J.E. 2005. An index to quantify an individual’s scientific research output. Proceedings
of the National Academy of Sciences U.S.A. 102(46):16569-16572.
Jaynes, E.T. 2003. Probability Theory: The Logic of Science. Cambridge University Press.
Lee, R., and D.M. Wilczyński. 1997. Representing homology classes by locally flat surfaces of
minimum genus. American Journal of Mathematics 119:1119-1137.
Linnaeus, C. 1758. Systema Naturae per Regna Tria Naturae, secundum Classes, Ordines,
Genera, Species, cum Characteribus, Differentiis, Synonymis, Locis [System of Nature
through the Three Kingdoms of Nature, according to Classes, Orders, Genera and Spe-
cies, with Characters, Differences, Synonyms, Places]. 10th edition. http://www.biodi-
versitylibrary.org/item/10277.
López-Cózar, E.D., N. Robinson-Garcia, and D. Torres-Salinas. 2013. “The Google Scholar
Experiment: How to Index False Papers and Manipulate Bibliometric Indicators.” http://
arxiv.org/abs/1309.2413.
Lowe, D. 2004. Distinctive image features from scale-invariant keypoints. International Jour-
nal of Computer Vision 60(2):91-110.
MacGillivray, M. Open Citations—Doing Some Graph Visualisations. Open Citations
blog. Posted on March 28, 2013. http://opencitations.wordpress.com/2013/03/28/
open-citations-doing-some-graph-visualisations/.
Renschuch, B., H. Roloff, and G.G. Rasputin. 1987. Contributions to Constructive Polyno-
mial Ideal Theory XXIII: Forgotten Works of Leningrad Mathematician N.M. Gjunter
on Polynomial Ideal Theory. Wiss. Z. d. Pädagogische Hochschule Potsdam 31:111-126.
Translated by Michael Abramson, ACM SIGSAM Bulletin 37(2), June 2003.
Sojka, P., and M. Líška. 2011. Indexing and searching mathematics in digital libraries. Pp.
228-243 in Intelligent Computer Mathematics. Lecture Notes in Computer Science,
Volume 6824. Springer Berlin Heidelberg.
Suzuki, N. 2013. “The Chern Character in the Simplicial de Rham Complex.” http://arxiv.
org/abs/1306.5949.
Swanson, D. 1986. Fish-oil, Raynauds Syndrome, and undiscovered public knowledge. Per-
spectives in Biology and Medicine 30(1):718.
Swanson, D. 1987. Two medical literatures that are logically but not bibliographically con-
nected. Journal of the American Society for Information Science 38(4):228233.
Ulam, S. 1957. Masian Smoluchowski and the theory of probabilities in physics. American
Journal of Physics 25:475-481.
von Ahn, L. 2011. “Massive-scale Online Collaboration.” TED Talk (video). http://www.ted.
com/talks/luis_von_ahn_massive_scale_online_collaboration.html.
Wiedijk, F. 2007. The QED manifesto revisited. Studies in Logic, Grammar, and Rhetoric
10(23):121-133. http://mizar.org/trybulec65/8.pdf.
Zwegers, S. 2008. “Mock Theta Functions.” http://arxiv.org/abs/0807.4834.