This chapter proposes a strategic plan for incremental and modular development of the Digital Mathematics Library (DML), with the aim of providing the mathematical community with at least some of the specific capabilities described in Chapter 3, as well as some further capabilities that should follow as corollaries of the basic development. The committee’s strategic plan contains the following elements:
- Fundamental principles of the DML vision;
- Constitution of a nonprofit organization committed to development of the DML collection and services, called the DML organization;
- Initial development;
- Priorities for collections and service development;
- Technical considerations; and
- Resources needed.
Each of these elements is discussed in detail in the following sections.
The committee envisions the next step in advancing mathematics to go beyond traditional mathematical publications and take advantage of the mathematical information and knowledge stored in those publications to create a network of information that can be easily explored and manipulated. There is a compelling argument that through a combination of machine learning methods and editorial effort by both paid and vol-
unteer editors, a significant portion of the information and knowledge in the global mathematical corpus could be made available to researchers as linked open data through the DML. The DML would help index and make discoverable collections of information created and maintained by distributed editors and specialized machine agents—much as Google now indexes and makes available information drawn from across the Web—but without the centralized processing and caching. But the DML would also need to engage substantial editorial input from the mathematical community. The DML would afford functionalities and services over the aggregated information, including capabilities for searching, browsing, navigating, linking, computing, and visualizing and analyzing, over both copyrighted and openly licensed content.
Some, but by no means all, of the proposed additional services and knowledge management utilities will rely on analysis of full content, done in a coordinated fashion. Other services will rely on analysis of metadata, which are often accessible with fewer or no restrictions. The committee feels that today—through reliance on a broad, distributed community, adherence to emerging standards and best practices, the use of new distributed collaboration and editing workflow models, and reliance on the affordances of emerging technologies such as linked open data and machine learning methods—these content and metadata analyses can be accomplished successfully in a distributed fashion—that is, without having to acquire, process, or store the entire universe of all mathematics publications centrally. While the approach outlined would require the central (or at least centrally coordinated) maintenance of key concept vocabularies and ontologies, large-scale, centralized processing and storage of mathematical publications would not be necessary.
The committee has identified a compelling opportunity for the following:
- The DML as a large, open collection of mathematical bibliographic information and mathematical concepts (e.g., axioms, definitions, theorems, proofs, formulas, equations, numbers, sets, functions) and objects (e.g., groups, rings) aggregated from diverse sources;
- Integrating and organizing the DML with existing repositories of publications and with indexing and computing services (as discussed in the Chapter 3 section on “Developing Partnerships”);
- Encouraging, facilitating, and supporting the development and promulgation of novel Web and desktop services, including annotation, collection, and collaboration tools, and tools for search and literature-based discovery, that can be utilized within the DML;
- Supporting experimental and production applications of machine learning methods for the extraction of various mathematical entities, including topics, formulas, equations, and theorems, by data
mining and large-scale data analysis of suitable portions of the mathematical corpus; and
- Supporting a combination of community input and traditional editorial workflows for validation of outputs of such machine processing and contribution of such outputs to the DML.
The committee believes that it is necessary for the people and organizations involved in the DML to adopt some basic principles to guide the DML to reach its full potential.
Adherence to Best Practices and Standards
The proposed DML would benefit from adhering to broad technical standards and built-in interoperability, both for encouraging partnerships and taking advantage of non-mathematics-specific Web technologies that become available (Aalbersberg and Kähler, 2011; Gill and Miller, 2002). The DML would benefit from being developed with a modular architecture, allowing various technical development efforts to proceed in parallel with minimal coordination. There should be some initial agreements in principle about the nature of inputs and outputs of various components and Web services. One illustration of the importance of technical standards in mathematics is the value of Tex (and LaTeX), which standardized mathematical typesetting and revolutionized research mathematics publications.
The DML architecture should adhere as much as possible to contemporary and evolving Web architecture standards for all its services, especially the standards of linked open data for publishing structured data on the Web so that it can be interlinked and become more useful. Linked open data allow a webpage to dynamically pull relevant information from related websites. For example, a website that displays local weather could pull information from an unrelated local traffic monitoring site to alert users to delays or road conditions, and it could pull from the local school district’s website to alert users about potential closures. Linked open data are particularly valuable within the proposed DML because much of the value of the information comes from its connections with outside existing data. If these connections can be strengthened, the network of mathematical information will solidify, providing a clearer picture of the realm of mathematical research.
The DML, as proposed, would not be collecting large amounts of copyrighted material; however, it would be amassing its own data collection of connections and understanding of mathematical information. These data (i.e., vocabularies, ontologies, annotations) and the DML-developed/ supported software would benefit from being open source so that other researchers and developers could build upon it. The DML would need to
respect and recognize copyright limitations and work with publishers to make sure these can stay in place even while having minimal impact on the ability of users to discover and learn about resources.
The DML would also benefit from adhering to accepted norms for citations and evaluations. This may take the form of systematic application and support of the San Francisco Declaration on Research Assessment (American Society for Cell Biology, 2012) about emerging practices related to the evaluation of research articles.
Recommendation: The Digital Mathematics Library should serve as a nexus for the coordination of research and research outcomes, including community endorsements, and encourage best practices to facilitate knowledge management in research mathematics.
Competition and Cooperation with Other Organizations
To the greatest extent consistent with its goals and principles, the DML should seek to cooperate with and not to compete with existing information services and communication and desktop tools that are widely used by the mathematical community. Cooperation would include the following:
- Agreements on the structure of suitable data schemas for representation of bibliographic and mathematical information, including standards for representation of mathematics on the Web (MathML, MathJaX, etc.);
- Agreements on systematic use of identifiers and openly accessible Web services supported by other organizations (e.g., DOIs, Handles,1 ORCIDs, MR and ZMATH identifiers, OCLC identifiers) instead of replication of these identifiers and associated services by the DML;
- Provision of agreements and conversion services, as needed, to ensure metadata interoperability and aggregation of data from various services; and
- Support for interfaces between the DML and existing information resources listed in Appendix C—for example, bibliographical, encyclopedic, content, social environments.
This cooperation also applies to arXiv, Wikipedia, MathSciNet, zbMATH, Google, Microsoft, and the general abstracting and indexing services, as well as to various companies with proprietary interests in mathematical communication and computation whose products the DML should seek to
enhance and make more openly accessible and reusable. This list of companies includes the following:
- Springer (with large amounts of mathematical information in SpringerLink2 and its proprietary LaTeX search);
- Wolfram (with large amounts of mathematical information embedded in Mathematica and Wolfram|Alpha);
- Elsevier; and
- Maplesoft, a subsidiary of Cybernet Systems Co. Ltd. in Japan and a provider of software tools for engineering, science, and mathematics, especially Maple,3 a powerful mathematical computation engine.
In areas where data standards are well established, such as for basic bibliographic data elements, such cooperation may be achieved by the DML organization with different data sources and services individually. For more complex data objects, especially those representing mathematical concepts, a community process, such as those commonly conducted by the World Wide Web Consortium,4 should be involved in the selection and adoption of data standards by the DML. It is recognized that such data standards may typically start as ad hoc standards that eventually become codified and formalized through widespread use (e.g., Microformats Wiki5). The committee recognizes that some existing agents may be reluctant to cooperate with the DML in either development of data schemas, sharing of data, or both. In those cases, the DML should not allocate administrative effort on negotiating cooperation but rather find alternative agents who are willing to cooperate in providing the needed data or services in a manner consistent with DML principles.
Collection from Diverse Sources
The DML should commit to support curation and management of mathematical information from diverse sources and facilitate access to mathematical information even though the sources are stored in different organizations. Similarly, CrossRef6 currently tells users how to find items
- Previously unindexed or partially indexed information about mathematical publications—including traditional journal papers, books, and other electronic resources—and their contents, such as their reference lists, names of their sections or chapters (table-of-contents data), their formulas, equations, theorems, and conjectures;
- Information relating to the relations of such data elements within various publications and the relations of these elements to various standardized lists of such elements; and
- Information from mathematicians’ homepages, blogs, and discussion forums.
The DML should accept inputs of such data from all sources, commercial and noncommercial, subject only to copyright and licensing requirements indicated earlier, the judgment of DML-appointed editors that the material is suitable for inclusion in the DML, and the resources to process the data for ingestion into the DML. In particular, the DML should invite contributions of such content from both copyrighted and open-access sources. In all cases, the DML should commit to appropriate acknowledgement of the source and to inclusion of agreed indications of provenance in its data records.
Support for Multiple Formats, Conversion Tools, and Best Practices
In many instances the cost of negotiating cooperation in schema standards may greatly exceed the potential reward of doing so. In such cases, it will be best for the DML to move ahead with lowest-common-denominator standards that are good enough for most applications and to which it is possible to map data from multiple alternative formats. Current examples of such standards are BibTeX, or slight enhancements thereof like BibJSON and BibXML, to which it is possible to map almost any reference text string that can be recognized as such by a human. A somewhat higher standard is provided by the European Digital Mathematics Library (EuDML) metadata schema specification9 for typical mathematical article metadata
9 European Digital Mathematics Library, EuDML Metadata Schema Specification (v2.0-final), https://project.eudml.org/eudml-metadata-schema-specification-v20-final, accessed January 16, 2014.
supplied by a cooperative publisher. The DML should research and support multiple tools and services for the acquisition of data in diverse native formats and its conversion to higher-quality bibliographic formats such as those mentioned above. It should also provide guidance for best practices in managing various data and metadata formats and support basic communication spaces, such as an email list or help desk for data managers encountering issues in cleaning and converting diverse data sets of interest to the DML community. Examples of conversion tools for bibliographic data that are already very useful, although relatively unknown, are pdftotext,10 MREF,11 EJP-ECP Reference List Formatter,12 inSPIRE-HELP,13 BibSonomy Scapers,14 Google Refine,15 and Beautiful Soup.16
The creation of such data-conversion tools is typically a fairly straightforward programming task in which the difficulty depends on the complexity of the tool. However, such tools and their derivatives do impose a progressive maintenance burden to keep them compliant with changing data formats and expectations for both inputs and outputs, and with new versions of underlying software libraries and implementations. But the maintenance of such low-cost, high-reward data conversion and cleaning services, or links to the best maintained of these services and documentation of how to use them for DML purposes, is among the things the DML should commit to supporting.
Flexibility and Extensibility of Schemas and Services
Recognizing the systemic compartmentalization problems caused by traditional database schemas and implementations, all DML schemas should adhere to current and emerging best principles of flexibility and extensibility. In particular, DML architecture should allow and encourage the following:
- Inclusion of data in a virtual collection from an essentially unlimited number of disparate and distributed resources of greatly varying
13 INSPIRE, “Generating Your Bibliography,” http://inspirehep.net/info/hep/tools/bibliography_generate?ln=en, accessed January 16, 2014.
sizes. Examples would include data stored on individual webpages and marked up with information, as is done with CoINS,17 the emerging standards of schema.org or similar math-specific standards that might be developed by the DML community, or data available from various data providers via application programming interfaces (APIs) or periodic data dumps; and
- Creation of new features, tools, and services over DML data by individual and organizational participants, such as those outlined in Chapter 3, or by yet unimagined services that will develop in the future.
Relation of the DML to Computer Algebra Systems and Formalization of Mathematics
There is a community of mathematical knowledge management, built largely around the development of formal theorem provers and reasoners (Carette and Farmer, 2009).18,19 This community proposed an ambitious program of formalization of mathematics, following earlier efforts by Whitehead and Russel (1910, 1912, 1913), Hilbert’s program,20 and others. Some notable successes of this school are computer automated proofs of a number of important mathematical theorems, such as the famous four-color theorem. The committee anticipates further advances in this field, and perhaps some eventual synthesis of computer algebra systems (Mathematica, Maple, Sage, etc.) with the theorem provers. However, progress in this area has been slow, and there are deep cultural impediments, principally the fact that the dominant computer algebra systems are proprietary and likely to remain so for the foreseeable future.
Summary of Principles
Consistent application of the principles in this section to the representation of mathematical information and conceptual knowledge in the World Wide Web will enable the mathematical community to achieve the most effective instantiation of the DML as an openly navigable representation of the universe of mathematical concepts, formulas, and relations. To achieve this, the DML would be just as accessible to human users as Wikipedia is today, with the same open license for text contributions and
a public domain license for bibliographic and mathematical facts; it would be properly structured for machine access and reuse in discovery services; and it would be connected directly, through desktop software and Web services, to the mathematical research literature, current and future abstracting and indexing services, computational services such as Wolfram|Alpha, and desktop programs such as Mathematica, Maple, and Sage.
The first step in this process is creating an organization that can manage and encourage the creation of a knowledge-based library of mathematical concepts and advocate for the needs of the mathematical community. The committee believes the DML effort would benefit from being spearheaded by a small centralized agent to avoid the project failing because of competing time commitments of its founders, which has happened in several cases mentioned in Appendix C. It is hoped that the DML can reach beyond this initial startup hurdle and ultimately succeed because of its core of dedicated staff, collaborators, and funders, and to ultimately create a strong, stable, and meaningful resource that is worthy of continued investment from the mathematics community.
Recommendation: A Digital Mathematics Library organization should be created to manage and encourage the creation of a knowledge-based library of mathematical concepts such as theorems and proofs.
Recommendation: The Digital Mathematics Library organization should be an advocate for the mathematics community and help develop plans for development and funding of open information systems of use to mathematicians.
The DML organization would benefit from being a small organization with minimal central agency and control. It is also important that the DML be able to operate in an environment of much larger organizations with big budgets and capability for sustained legal actions to achieve their ends. To survive as a small operation in a big information universe, it is important that the DML be organizationally nimble, quick to initiate pilot projects, and generally quick to learn from the experiences of both successful and unsuccessful efforts, both its own and those of others aiming to develop domain-specific knowledge bases. The DML could be largely reliant on other organizations to provide hosting for such organizational essentials as
- Basic computing and networking infrastructure, support, and services;
- Archiving (to be achieved in collaboration with existing scientific data and library archiving organizations); and
- Office space and administrative and support services of all kinds.
Management overhead can be minimized, for example, by making the executive director of the DML an employee of a supporting institution, most likely a major university library, whose time is funded either completely or in large part by a grant from the initial DML funder to that university. A modest number of initial staff positions could be funded similarly. This could be a good approach for the DML because many of the technical skills it will need are specialized and may be needed only on a part-time or fluctuating basis as various projects are taken on by the DML. The DML could at least initially avoid the management responsibility of having a large number of employees, but rather work on a contractual basis with staff employed by a variety of partner organizations with a commitment to various aspects of the DML effort.
The DML organization may also wish to consider other names before finalizing its constitution, both for itself as an organization, and for the collection and services it plans to create. One of the early administrative efforts of the DML organization would be to evaluate a number of legal and economic considerations involving branding and trademarks related to the choice of name. The committee envisions the DML organization as a coalition of member partners with commitment to the DML concept—the creation of a substantial digital representation of an open collection of mathematical information and knowledge—and to the DML development principles. The DML organization could be governed by the mathematical sciences community through an organization such as the International Mathematics Union (IMU) and, thence, through the member organizations of that union.
The DML constitution can support the general principles outlined above by including the following elements:
- Acquire and maintain a collection of digital representations of mathematical objects (e.g., theorems, functions, sequences) in machine processable formats;
- Advance mathematics by provision of useful information services over the collection;
- Maintain the DML data collection with stable URLs, an underlying Web-based open architecture, and APIs so new tools can be contributed, linked, and shared;
- Support development of a large community of users who will also help curate and contribute to the collection and its services;
- Support a community of developers of tools and services over the collection; and
- Collaborate with publishers and information providers to provide superior mathematical and information services built over the collection.
Governance of the DML could be overseen by an organization such as the IMU, with invitations to representatives from partner organizations. Initial funding of the DML for a 10-year period would be beneficial, during which long-term models for sustainable operations could be examined. The DML may benefit from including as many of the relevant organizations as are willing to participate. Some examples include MathSciNet (American Mathematical Society), the Society of Industrial and Applied Mathematics, the International Council for Industrial and Applied Mathematics, the European Mathematical Society, the Cornell University Library, Fiz Karlsruhe/ Springer, Wolfram, MicroSoft, Google, Wikipedia, OEIS, EuDML, Elsevier, and Thomson Reuters. Publishers and volunteers will see the DML as more accurate and more tailored than other services and should recognize the gains possible from a coordinated approach to merging mathematical knowledge. As the DML grows, the community will accord respect to the volunteers who help build it. To protect itself from legal obligations regarding copyright infringement, the DML could consider a variety of approaches, including not claiming copyright on any DML material and requiring of contributions to be licensed by the contributor, or using a creative commons license.
The first step to confirm feasibility of this DML concept is to announce a proposal to the community, confirm that enough parties are willing to participate in the DML by contribution of data, expertise, or services to make the project viable, and, if so, support a meeting to resolve a basic constitution for the organization to establish its legal status in a suitable location.
Initial development of the DML would benefit from focusing on recruiting partners with potential data sources and resources, beginning a collection of mathematical entities to achieve some of the desired capabilities described in Chapter 3, and providing a foundational platform on which most of these capabilities might imaginably be achieved in a 10- or 20-year time frame.
The committee sees value in separate groups working on the technological infrastructure and on the administration of these projects, because they require different kinds of technical expertise, community input, and project management for their success.
The DML cultivation of partnerships would benefit from being strategic more than opportunistic. As a first step, the DML will need to assess potential partnerships in terms of the potential of the partnership to help the DML meet its goals, the likely incentives on both sides for the partnership, the maturity and stability of any technical standards required to make the partnership work, and the likely obstacles to consummating the partnership. A diversity of partnerships will be important. The advice and help of existing elements of community infrastructure could be valuable in this; for example, the IMU (in particular the CEIC) and its member societies, FIZ Karlsruhe, European Mathematical Society, the Association of Research Libraries (ARL) and similar organizations outside North America, existing mathematics digital libraries (such as HathiTrust and the EuDML), and prominent and influential mathematicians who have expressed an interest in the mission of the DML. Finally, while keeping in mind long-range goals and objectives, it is important to identify and pursue high-likelihood, high-potential-benefit, low-risk, near-term partnerships and agreements, even if somewhat limited in scope, as long as such partnerships can help illustrate the longer-term potential of partnering with DML. For example, a productive, beneficial partnership with arXiv might be achievable in relatively short order and at the same time be useful to illustrate some of the potential benefits of DML partnerships between the DML and content providers.
Even in advance of construction of a central repository, work could proceed immediately on development of adequate object classes for description and discovery of mathematical content in ways that complement existing capabilities—for example, at finer granularity—and on the aggregation of the lists of object instances for inclusion in the DML. The committee believes that the following mathematical objects and bibliographic entities are good targets for early DML development (each of which is discussed in more detail in Chapter 5):
- Mathematical objects: subject topics, sequences, functions, transforms, identities, symbols, formulas, and assorted mathematical media; and
- Bibliographic entities: people, homepages, journals, books, and bibliographies.
The committee recognizes that progress on aggregation, cleaning, and deduplication of these various lists will move at very different rates. Some of
these lists may be completed quickly, while others that require input from many sources will mature slowly over time, and some might never be regarded as truly complete. Those in data rich areas may be ripe for initial developments. Still, the committee believes that the difficulty of completing some of these lists should not deter contributors from starting them or from converting what is already available into machine-readable formats, which can then feed various linking, navigation, search, and discovery services. Chapter 5 outlines which entity types should be targeted, at least initially, and gives some indication of the efforts required for each.
Planning for More Complex Entities
Planning should start for the development of more complex lists where possible. These lists are outlined in Chapter 5, and some may be difficult to create and maintain. Wolfram|Alpha has a significant start on this with its continued fractions project. The potential rewards in terms of discovery and cross-linking are greatest if these mathematical objects can be adequately formalized and managed, even on a modest scale. These lists may benefit from starting small and growing slowly, to reduce the maintenance challenges before they become too burdensome, and by development of machine learning techniques for extraction of these entities from the literature.
The committee anticipates a fairly loose structure in cooperation with Wikipedia, with input from the Wolfram experience with continued fractions and others in managing problem lists.
Initial effort is best invested in choosing an adequately flexible and extensible data structure, which needs to be easily expressible and exportable to handle diverse types of objects. The experience of Wolfram|Alpha, EuDML, and others working with metadata standardization will be essential input for this process. It is important to quickly codify the workflow for initiation of new lists of this kind and to gain a realistic assessment of the incremental cost of developing and maintaining new lists of various sizes and complexities. The intention is to lower the barriers to creation and maintenance of such lists to a point where there is substantial community enthusiasm for the activity. Simple user interfaces for the input of new entries and editing of existing ones consistent with schema restrictions are an essential requirement. The interface should be generic, much the same for all object classes, with customization as necessary for particular classes.21
21 Prototype interfaces are provided by BibSonomy (http://www.bibsonomy.org/, accessed January 16, 2014), Zotero (http://www.zotero.org/, accessed January 16, 2014), and various library catalog tools.
Growth and Cross-linking
Some initial effort will need to be expended on planning for eventual cross-linking of a substantial number of entries in different lists through semantic relations, such as connections between lists of authors and journals or mathematical symbols and equations. This initial step is not intended to build a complete ontology of mathematics, but obvious semantic links will need to be supported to the greatest extent possible. This would aid in the creation of a Web of mathematical information that supports further processing by modern methods of graphical data analysis and may yield unexpected visualizations and insights into the structure of the mathematical universe. The proposed development would likely benefit from starting small, demonstrating the successful ingestion of data and exposure of various facets incrementally, leveraging available ontologies and services, and building new ones as needed.
A workflow is a sequence of connected steps where each step concludes immediately before the next step begins. Workflow management systems in computer systems manage and define a series of tasks to produce a final outcome or outcomes. Once the task is complete, the workflow software ensures that the individuals responsible for the next task are notified and receive the data they need to execute their stage of the process. These systems can also automate redundant tasks and ensure that uncompleted tasks are followed up, as well as reflect the dependencies required for the completion of each task.
The DML could provide support for schema development and production software for editorial workflows involved in creation and maintenance of structured lists. It is to be expected that these workflows will evolve over time as different data sources and editorial agents become involved, and that somewhat different workflows may be required for different lists.
As discussed throughout the report, the DML will require a small paid staff, technical infrastructure, a funded research portfolio to support relevant projects, and a governing board to ensure that the DML’s components continue to function and develop properly. This section describes what is needed in each of these areas.
While it is difficult to accurately assess the necessary financial resources for the DML at this early stage, this section gives a general sense of the scale of the necessary human and technical resources. Some of these resources might be shared, too, depending on the particular arrangement developed for the DML. However, the amount of financial resources necessary is obviously an important component of evaluating the future development of the DML, and the committee provides the following recommendation for evaluating these resources before DML development.
Recommendation: The initial DML planning group should set up a task force of suitable experts to produce a realistic plan, timeline, and prioritization of components, using this report as a high-level blueprint, to present to potential funding agencies (both public and private).
The cost of development and upkeep for the DML will not be trivial but is currently too uncertain to be specified in this report. For some perspective on operating costs, arXiv may provide a reasonable example. In calendar year 2012, arXiv spent nearly $800,000 in expenses relating to the following:22
• Personnel costs (including benefits)—totaling $492,061
—User support (2.70 full-time equivalent and 0.36 student)
—Programming and system maintenance (2.13 full-time equivalent)
—Management (0.50 full-time equivalent)
• Nonpersonnel costs—totaling $71,807
—Servers (physical and virtual), hardware maintenance, storage and backup—$24,240
—Network bandwidth and telephony—$10,867
—Staff computers, software, and supplies—$2,700
—Staff and arXiv Board travel—$34,000
• Indirect and in-kind costs—$208,631
—College and department administration, staff support (26 percent of direct costs)—$146,606
—Facilities (11 percent of direct costs)—$62,025
—arXiv moderation (130+ moderators, varying time commitments)—volunteer efforts
22 Cornell University, “Arxiv Projected Budget—Calendar Year 2012,” August 29, 2012, https://confluence.cornell.edu/download/attachments/127116484/arXiv2012budget.pdf.
To give some perspective of potential costs of developing capabilities, the committee would like to draw attention to some of the resources recently devoted to developing and deploying the Wolfram|Alpha continuous fractions work23 discussed in the Chapter 2 section “What Gaps Would the Digital Mathematics Library Fill?” Wolfram received a 1-year grant from the Alfred P. Sloan Foundation to prototype and build a technological infrastructure for collecting, tagging, storing, and searching a representative subset of mathematical knowledge (including definitions and theorems) and presenting it through a Wolfram|Alpha-like natural language interface. This work required some 3,000 hours of work from a team consisting of four professionals and one intern.
The subject of continued fractions was selected for this project because much of the relevant literature is older (therefore more representative of the type of content that can be utilized in a future system such as the DML) and is distinct from Wolfram’s main computational expertise (as to lessen the bias in the results). The individuals who worked on it had no detailed prior knowledge about continued fractions, which made the work go slower than it would if it were performed by an expert in the field, but this example is likely representative of how the DML would be approached. However, three of the four team members have written multi-volume books about mathematics, as well as websites each having more than 10,000 pages, so they had some experience in covering a wide range of mathematics.
There was not enough time in a 1-year project to cover the 100,000 pages of printed continued fraction literature, so the team tried to explore and cover various content and presentation aspects to see what might be possible in future efforts. In most ways, this project succeeded in meeting its objectives but in some parts, especially fully computational representations of the content, the system still needs improvement.
In addition to having qualified people, two software infrastructure components were important in carrying out this project: Mathematica and Wolfram|Alpha. Mathematica allowed the team to check the mathematics and to generalize it, and Wolfram|Alpha allowed them to collect the information in such a way that one can access it through free-form language inputs and deliver the information in various formats, from Web to TeX.
This project is a meaningful example of how various DML features can be developed within a larger infrastructure. The following sections draw some specifics of needed human and technical resources to make the rest of the DML possible.
23 M. Trott and E.W. Weisstein, “Computational Knowledge of Continued Fractions,” WolframAlpha Blog, May 16, 2013, http://blog.wolframalpha.com/2013/05/16/computational-knowledge-of-continued-fractions/.
Lastly, the committee would like to note the importance of a sustained investment and commitment from its potential funders. The committee believes that a ramped investment pattern, starting as a prototype and scaling up, may be more beneficial than a large initial investment. The DML will require a long and sustained effort to be successful.
A small paid staff will work to develop the DML vision, address issues that arise, pursue fruitful partnerships, and manage the day-to-day operations of the DML. The following is a list of staff functions that the committee sees as essential during the initial phases of the DML. These staffing needs will change as the DML grows and matures. The committee believes it is essential to include a distinguished mathematician in the senior management of the DML to provide credibility to the academic mathematics community and to gain startup funding and respectability in the nonprofit world.
- Academic director. A well-respected leader in both the technical and social aspects of the DML who is able to make editorial decisions and can engage and appoint editors and curators for their domain knowledge and reputation. This could be part-time position (e.g., half-time of a senior mathematician).
- Executive director. A manager with knowledge of large-scale data methods and digital libraries. This person would be responsible for directing the project manager, budget allocations, promotion of the project, and negotiations with partners, and also consulting with the academic director about priorities.
- Project manager. This person would be in charge of the creation of the DML. He/she would interface with programmers, contracting organizations, and technical partners.
- System manager. This person would be responsible for setting up adequate server infrastructure for day-to-day DML operations and for expanding operations as needed.
- Data wrangler. The person would work on an ongoing stream of specific data conversion projects and provide documentation of best practices. He/she would engage and oversee other volunteer, or possibly paid, staff and also set up and experiment with crowdsourcing tasks and implementations.
- Rights management and legal. This person would provide guidance on critical licensing and copyright choices for both data and software, and for possible negotiations of agreements with data and service providers. This may be a consultant position.
- Research analyst. This person would be responsible for keeping abreast of emerging technologies, researching solutions for identified problems, assisting the executive director with technology choices, and preparing white papers to explain proposals and processes.
- Community liaison. This person would be responsible for community building, advocacy of the project, intelligent responses to incoming emails, blog development, negotiations to engage and persuade partners to contribute data, and other such activities. This would likely be a full-time staff person or contractor.
A mathematics digital library requires a technical infrastructure. This infrastructure needs to support storage, backup, search, retrieval, and at least some support for analysis and visualization. Storage is needed for some documents, the software component, and the management data on the system. In general, a different storage solution will be needed for each type of data due to differences in usage associated with size, security issues, speed required, and level of backup needed. Security, in particular, will need to be carefully planned and assessed throughout the DML development to ensure that the data it stores will be well protected. Storage can be handled in-house by purchasing a number of servers or outsourced to server farms or cloud storage. The key is to plan for growth and to consider, for the operations of interest, whether it is more cost-effective to store data in multiple formats to facilitate search or to minimize storage and do data conversion on the fly. At this point it is not clear which option will be more economical.
Other resources required include machines for developing and testing software, backup facilities, and machines for monitoring and managing the system. For development, the key resource needs include high-end desktop computers, access to storage devices, Internet connectivity, backup facility, and a test bed environment for trying new features before they are launched. For managing and maintaining the ongoing system, handling the business tasks and associated financial issues, basic desktop machines with Internet connectivity, printers and associated fax, and backup to machines off the Internet for security are needed.
Necessary Research Areas
There are many technological aspects of the proposed DML that are not currently possible. To help accelerate needed technological developments, the committee believes that several research areas can be targeted by the DML organization.
Recommendation: The Digital Mathematics Library needs to build an ongoing relationship with the research communities spanning mathematics, computer science, information science, and related areas concerned with knowledge extraction and structuring in the context of mathematics and to help translation of developments in these areas from research to large-scale application.
Some of these players include the following:
- National Institute of Standards and Technology,
- Cornell University Library (both Project Euclid and arXiv),
- American Mathematical Society (MathSciNet),
- Wolfram, and
- European technical partners in EuDML, including FIZ Karsruhe (zbMATH).24
These organizational partners would be the employers of some people engaged in DML work, funded by contracts approved by DML central administration and funded through some arrangement with DML funding sources. It may be best for the DML to collaborate with its partners to complete such work, rather than directly employing large numbers of its own people. All of the above partners have existing capabilities and services of this kind, which should not be threatened, but rather enhanced, by DML developments.
Aalbersberg, I.J.J., and O. Kähler. 2011. Supporting science through the interoperability of data and articles. D-Lib Magazine 17(1/2), doi:10.1045/january2011-aalbersberg.
American Society for Cell Biology. 2012. The San Francisco Declaration on Research Assessment (DORA). http://am.ascb.org/dora/.
Carette, J., and W.M. Farmer. 2009. A review of mathematical knowledge management. Pp. 233-246 in Intelligent Computer Mathematics. Springer.
Gill, T., and P. Miller. 2002. Re-inventing the wheel? Standards, interoperability and digital cultural content. D-Lib Magazine 8(1). http://www.dlib.org/dlib/january02/gill/01gill.html.
Whitehead, A.N., and B. Russell. 1910-1973. Principia Mathematica, 3 volumes. Second edition, 1925 (Volume 1), 1927 (Volumes 2, 3). Abridged as Principia Mathematica to *56, 1962. Cambridge, U.K.: Cambridge University Press.
24 FIZ Karsruhe is a nonprofit corporation and the largest nonuniversity institution for information infrastructure in Germany (http://www.fiz-karlsruhe.de). FIZ (together with EMS and Springer) is one of the joint owners/controllers of zbMATH (http://zbmath.org/).