Like most areas of scholarship, mathematics is a cumulative discipline: new research is reliant on well-organized and well-curated literature. Because of the precise definitions and structures within mathematics, today’s information technologies and machine learning tools provide an opportunity to further organize and enhance discoverability of the mathematics literature in new ways, with the potential to significantly facilitate mathematics research and learning. Opportunities exist to enhance discoverability directly via new technologies and also by using technology to capture important interactions between mathematicians and the literature for later sharing and reuse.
In most scientific disciplines, including mathematics, Web-based access to digital resources representing the disciplinary literature is now mature and quite effective. Through a mixture of open and proprietary tools, mathematicians are able to search the enormous and very rapidly growing literature using attributes such as subjects, titles, authors, dates, and keywords; they can follow chains of citations among works backward and forward in time. While much information is contained in individual items in the mathematical literature, a greater amount of information is represented by the way they are linked. This is not just via references but through the interrelation of concepts, insights, and techniques as they are developed, refined, and spread from one mathematical discipline to another. For example, if mathematicians were able to search the literature for instances where a specific equation was used or solved, it would allow them to consider alternative approaches toward solving their own research questions. This search capability could be facilitated through the use of a database
of machine-generated and human-cultivated information about the mathematical literature and allow for a variety of other capabilities to be built.
This report discusses how information about what the mathematical literature contains can be formalized and made easier to express, encode, and explore. Many of the tools necessary to make this information system a reality will require much more than indexing and will instead depend on community input paired with machine learning, where mathematicians’ expertise can fill the gaps of automatization. The Committee on Planning a Global Library of the Mathematical Sciences proposes the establishment of an organization; the development of a set of platforms, tools, and services; the deployment of an ongoing applied research program to complement the development work; and the mobilization and coordination of the mathematical community to take the first steps toward these capabilities.
Mathematics today has the opportunity to expand and redefine the way in which mathematical knowledge is represented and used, the character of the mathematical literature and how it evolves, and the way that mathematicians interact with this collection of knowledge. This new relationship with the literature and the mathematical knowledge corpus goes beyond new forms of access and analytical tools; it must also include the tools and services to accommodate the creation, sharing, and curation of new kinds of knowledge structures.
To be clear, what the committee proposes builds on the extensive work done by many dedicated individuals under the rubric of the World Digital Mathematical Library,1 as well as many other community initiatives.2 Comparing desired capabilities going forward with what has been achieved by these efforts to date, the committee concludes that there is little value in new large-scale retrospective digitization efforts or further aggregations of mathematical science publications (both traditional journal articles and newer preprint, blog, video, and similar resources) beyond the federation of distributed repositories already achieved through existing search services. Nor is another bibliographically based secondary indexing service needed at this time. Necessary incremental improvements will likely continue to occur in these areas, but they do not require an initiative on the scale of what is being called for in this report.
The real opportunity is in offering mathematicians new and more direct ways to discover and interact with mathematical objects and mathematical knowledge through the Web. The committee’s consensus is that by some
1 The World Digital Mathematics Library rubric has been used by a variety of organizations for many distinct projects. A history of many of these efforts and the current state-of-the-art can be found on the wiki page from the International Mathematics Union’s Digital Mathematics Workshop in June 2012, http://ada00.math.uni-bielefeld.de/mediawiki-1.18.1/index.php/.
2 Examples include the Encyclopedia of Integer Sequences, the NIST Digital Library of Mathematical Functions, and the Guide to Available Mathematical Software.
combination of machine learning methods and community-based editorial effort, a significantly greater portion of the information and knowledge in the global mathematical corpus could be made available to researchers as linked open data3 through a central organizational entity—referred to in this report as the Digital Mathematics Library (DML).
The DML would aggregate and make available collections of ontologies, links, and other information created and maintained by human contributors, curators, and specialized machine agents, with significant editorial input from the mathematical community. The DML would enable functionalities and services over the aggregated mathematical information that go well beyond simply making publications available, to include capabilities for annotating, searching, browsing, navigating, linking, computing, and visualizing both copyrighted and openly licensed content. While the DML would store modest amounts of new knowledge structures and indices, it would not generally replicate mathematical literature stored elsewhere. Instead, it would strive to represent the mathematical knowledge presented within a publication and illustrate how it is connected with other resources.
While the committee believes that the DML could begin development soon, it notes that this work would need to be complemented by an ongoing research program to fill in gaps, improve quality and performance, increase the robustness of available technologies, and increase the automation of processes that still rely heavily on human intervention.
The DML would facilitate discovery of and interaction with mathematical information from diverse sources with varying levels of copyright. The committee envisions the DML as a growing corpus of public-domain and openly licensed mathematical information, Web services, and software agents, which would coexist with present mathematical publishing and indexing services for the foreseeable future.
A key early issue for the DML organization is how to establish constructive and effective partnerships with existing publishers, Web services, and other resources, both those specific to mathematics and those serving the much broader scholarly community. Some of these partnerships might be challenging because of copyright concerns. However, establishing fruitful partnerships is essential to the success of the DML. While the DML would sometimes provide services and functional features that overlap with existing services and tools provided by both commercial and not-for-profit
3 Broadly defined, linked open data are structured data that are published in such a way that makes it easy to interlink them with other data, therefore making it possible to connect them with information from multiple sources. These connected data can provide a user with a more meaningful query of a subject by consolidating relevant information from a variety of places—e.g., in different research papers—and pulling out specific components that the user might be particularly interested in.
entities, the committee suggests partnering with current service providers whenever possible rather than replicating capabilities of existing resources.
For example in MathOverflow,4 a question-and-answer website for research mathematicians, research articles and papers are often referenced in answers given. While the DML would not want to replicate the interface and social networking features of MathOverflow, it would be wholly appropriate for the DML to instigate and participate in a multi-party collaboration with MathOverflow and publishers of research mathematics to automatically capture citations entered in MathOverflow answers and republish them as linked open data annotations. In this scenario, the DML could help broker standard practices for interoperability and help maintain the software agents and annotation repositories that would allow publishers to make mathematicians coming to their websites aware of MathOverflow discussions potentially relevant to the papers they are viewing. The converse could also be supported. Posts on MathOverflow could be automatically annotated when errata or other commentary is added to the publisher’s website for an article mentioned in the MathOverflow post. This illustrates the potential for chains of annotations as a new mode of scholarly discourse (Sukovic, 2008). To visualize how an annotation chain might come about, begin by assuming that a post in MathOverflow referencing a particular article is automatically added as an annotation to this article on the publisher’s website. A subsequent reply to this annotation made by a reader of the publisher website is then automatically added to the thread on MathOverflow. A new reply subsequently added to the thread on MathOverflow is then automatically added as a further annotation on the publisher’s website, and so on. This would allow users of two disparate services—i.e., one scholar using MathOverflow and the other using only the publisher’s website—to nonetheless carry on a substantive discourse about published mathematics research in spite of the fact that each is using a different utility to access the publication being discussed.
Similarly, MathSciNet and Zentralblatt Math (zbMath) already classify research papers according to the Mathematics Subject Classification (MSC)5 schedule. The DML would not want to replicate this indexing. However, it might be beneficial for the DML to provide complementary indexing on other dimensions—e.g., by the occurrence in articles of well-known special functions (hierarchies of which are maintained by the National Institute for Standards and Technology (NIST)6 and by Wolfram
Research7). Used in concert, one could then envision a collaboratively built interface that allows refinement of an initial MSC search via attributes such as which special functions are used in the articles that appear in the results from the MSC search.
Such partnerships and collaborations are essential. It is vital that users see a well-integrated interface that incorporates both the DML services and commercial services for those affiliated with institutions that have access to the commercial services. The committee envisions the resources, services, and tools offered by the DML as coexisting with, and often enhancing, the offerings from existing players in the mathematical information landscape. The committee hopes that relevant organizations will contribute to the work of the DML in various ways, such as by providing financial support, allowing appropriate access to their content and services, or by participating in the collaborative development, with the shared goal of enhancing the value of the mathematics literature. Building these partnerships would likely require significant negotiations and collaborations, and the DML organization would have to allocate much time and effort to their planning and execution.
The biggest challenge, however, will be in establishing the technical, organizational, and community-coordinating capabilities to deliver on the construction of the resources, services, and tools described earlier in this summary and then planning and implementing the development and deployment of the necessary systems. Some of the technologies required to build the requisite tools and services do not exist today or are not sufficiently mature. The committee sees the DML as having a minimal direct research role; rather, the committee believes that the establishment of the DML needs to be complemented by a long-term (5 to 10 years) commitment to a focused and applied research program that would encompass both needed technology, tools, and services and (to a lesser extent) independent research to understand how the DML is being used and how well it is working. Ideally, the commitment to fund this program could come in parallel with the commitment for the initial funding for the DML itself (whether from one or multiple sources). These research programs need to be well connected to the work of the DML. This could be achieved either by ensuring that the DML is deeply involved in the development of the calls for proposals and the subsequent proposal evaluation or by actually placing the DML in the role of a re-granting organization (although the committee sees some potential bureaucratic complications with the latter option).
ORGANIZATION AND RESOURCES NEEDED
The committee’s vision of an incremental development of the DML starts with the creation of a small nonprofit organization, referred to here as the DML organization. The DML organization will need a small and dedicated paid staff, including a well-respected mathematician in a senior role, to ensure its development and growth. Other staffing needs may become necessary as the needs and status of the DML evolve, although much of the software development and operations could be contracted out. Ideally, the DML would be attached to and draw support from some host institution (a university, a research laboratory, or other organization) in order to facilitate sharing of services and to reduce overhead. The DML organization could be governed ultimately by the mathematical sciences community through organizations such as the International Mathematical Union and, thence, through their member organizations.
The first and foremost challenge that the DML will face is finding a set of primary funding sources that could support its initial development and early operations (a period of between 5 and 10 years). It is the committee’s hope that the DML would become a self-sustaining entity once some of its key capabilities are established and a potential sustainable business model is chosen from among options.8
For the first few years, perhaps the best approach would be to split operational governance from high-level, longer-term policy governance, because these two tasks will be quite distinct. Both in the short and the longer term, appropriate connections are needed between funding and revenue sources and governance, and these connections may well need to shift over time. Particularly in the early days, a light and agile governance mechanism is crucial. Upon launching the DML effort, there would likely be a coalition of partners with a commitment to the DML concept.
Like other scientific disciplines, mathematics is now completing a complex multi-decade transition from print to a digital system that closely emulates print for authors and readers. The mathematics community is thus at an inflection point where it has the opportunity to think about how its collective knowledge base is going to be constructed, used, structured, managed, curated, and contributed to in the digital world and how that knowledge base will be related to the existing literature corpus, to authoring practices in the future, and to the social and community practices of doing
8 here are many lessons on sustainability to draw upon, including experiences with digital libraries (such as arXiv) and open or community source software as well as work on research data curation.
and learning mathematics. Colleagues in other disciplines—astronomy, molecular biology, genomics, chemistry—are in many cases well advanced in formulating their own disciplinary-specific answers that take into account disciplinary practices (such as the mix of experimental, observational, theoretical, and computational approaches) and the conceptual models that underlie disciplinary thinking.
Mathematics is unusual in many ways; it maintains a healthy and constructive relationship with its past, as documented in the literature of the field going back hundreds of years, and some of its literature has a long “shelf life.” The committee believes that investments in refreshing and restructuring the corpus of mathematical literature and abstracting it into a knowledge base for future centuries is a valid and sound investment in the future of mathematical scholarship. The DML proposed in this report provides a platform and a context to achieve this and also offers a critical point of focus for the mathematical community in a genuinely digital environment to engage in discussions about the creation, curation, and management of mathematical knowledge.
Sukovic, S. 2008. Convergent flows: Humanities scholars and their interactions with electronic texts, Library Quarterly 78(3):263-284, doi.org/10.1086/588444.