Formal Abstracts – a Vision

The Formal Abstracts (FABSTRACTS) project will establish a formal abstract service that will express the results of mathematical publications in a computer-readable form that captures the semantic content of publications.

Specifically, the service will

  • give a statement of the main theorem of each published mathematical paper in a language that is both human and machine readable.
  • link each term in theorem statements to a precise definition of that term (again in human/machine readable form), and
  • ground every statement and definition in the system in some foundational system for doing mathematics (the Lean theorem prover).

What is the long-term vision for Formal Abstracts (FABSTRACTS) in mathematics?

FABSTRACTS enables machine learning in math

Many mathematicians dream that computers will someday fundamentally
transform how mathematics is practiced, and FABSTRACTS expresses
in concrete terms how that transformation might be achieved.

Automation of mathematics has been a major aim of artificial
intelligence from its inception. This broad aim includes all aspects
of mathematical activity such as conjecturing, generating examples,
finding counterexamples, computing, learning,
collaborating, proving, generalizing, refereeing and checking,
searching, exploring, and transforming the literature.

For many of these activities, it is important or even essential to
have a machine-readable semantic representation of mathematical
objects. This is achieved by FABSTRACTS.

We foresee tools for exploration such as a Google Earth for
mathematics, providing an intuitive visual map of the entire world of
mathematics, combining formal abstracts, computation, and other
networked content — all displayed at user-selected resolution.

It gets easier

Just as the FABSTRACTS project will enable AI in mathematics, the
development of AI will enable FABSTRACTS.

Looking forward beyond the challenging initial groundwork to get
FABSTRACTS up and running, the generation of formal abstracts will
become nearly fully automated. The production of formal abstracts
will become easier and more efficient with every passing year.

Tools for the natural language processing of mathematics are under
development. As these tools develop, automation will replace the
manual preparation of formal abstracts. Automation will make it feasible
to construct hundreds of thousands (and eventually millions)
of formal abstracts.

For example, Kofler and Neumaier have developed a powerful parser for common mathematical language, and Neumaier is currently rewriting some of his
published math books in multilingual editions so as to be parsed and
translated by their software.  In other promising work,
Ganesalingam’s thesis gave a brilliant linguistic analysis of the
language of mathematics. The current Naproche (Natural language
Proof Checking) system accepts a rich subset of mathematical language
including TeX-style formulas.

We expect mathematicians to be able to supply their own formal abstracts
in a structured English format, which will be parsed and translated to the Lean language.

We foresee future cooperation between FABSTRACTS and a major math
abstract service (such as zbMATH or MathSciNet).
The abstract service will provide statements of
theorems from publications in a form suited for fast formal
abstraction, and with FABSTRACTS offering enhancements to those services.

Progress in document analysis will eventually permit the automated
extraction of formal abstracts directly from the documents (building
on the work of M. Suzuki and collaborators at Kyushu University).

FABSTRACTS as a permanent service

FABSTRACTS will become integrated with the workflow of
several tens of thousands of mathematicians.

We foresee a TeX package that allows mathematicians to cite formal
definitions and theorem statements for the convenience of readers, and
for the automatic processing of texts:

Let $M$ be a \formal{banach manifold}[fabstracts.org/10.1109/5.771073]. %TeX

We foresee further integration of FABSTRACTS with computation (in the
style of the current SageTeX package, which performs SageMath
computation and graphics from within a TeX document).

We foresee the translation of formal abstracts into many languages
and formats. Lewis has constructed a two-way bridge between Mathematica and Lean. Computer programs already exist that give automated translations between some of the major proof assistants. The work of Kohlhase, Rabe, and
their coworkers develops general infrastructure for the translation of
formal mathematics between different systems. The point is
that before we translate, we first need any formal representation,
and that is achieved by FABSTRACTS.

There are many expected applications,
including training sets for machine learning, enhancement of automated
theorem proving technologies, machine translation of mathematical
texts into foreign languages, detecting errors in the mathematical
literature, and intelligent mathematical search
tools.

Finally, we foresee that FABSTRACTS will inspire a broader movement
for the increased precision in science, with mathematicians taking up
the challenge to formalize the proofs of formal abstracts, and
scientists finding precise ways to give formal expression to the
objects of their research.