Bezout’s Theorem and an Irrelevant Point

In the texbook I’m using for a first course in algebraic geometry, the proof of Bezout’s theorem is awful. Looking around, I find an abundance of awful proofs. A good proof is one that I would want to commit to memory. Here is a good proof of Bezout’s theorem, which is due to Gurjar and Pathak.

Bezout’s theorem states that a curve of degree m in a projective plane meets another projective curve of degree n in mn points, counted with multiplicity (over an algebraically closed field), assuming the curves have no irreducible component in common.

The idea of the proof is as follows. The projective plane is defined as the set of lines in 3-space through the origin. Bezout’s theorem can then be interpreted as saying that two homogeneous functions f,g of degrees m and n in three variables X,Y,Z have zero set given by the union of mn lines counted with multiplicity.

Let L be the multiset of lines determined by f=g=0. To prove Bezout, we must show that the multiset L contains mn lines counted with multiplicity. By applying a linear transformation if necessary, may assume without loss of generality that the plane Z=0 is not a factor of the polynomial fg and that no line of L lies in the plane Z=0. In three-dimensional affine space, each line of L meets the plane Z=0 at a unique point: the origin (0,0,0), which is known as the irrelevant point. Thus, the number of lines in L is equal to the number of intersections of L with Z=0, counted with multiplicity.

This intersection is given by the system of equations f(X,Y,0)=0, g(X,Y,0)=0 in the XY-plane: Z=0. The polynomial f(X,Y,0) is a homogeneous polynomial of degree m. By the fundamental theorem of algebra, it factors as a product of m linear factors. Similarly, the polynomial g(X,Y,0) is a homogeneous polynomial of degree n that factors as product of n linear factors. Thus, the given system of equations is described as the intersection of a multiset of m lines with a multiset of n lines, for a total of mn intersections, counted with multiplicity. Thus, the number of lines in L is mn, which proves Bezout’s theorem.

The four intersection points of two ellipses in the projective plane can be viewed as four lines through the irrelevant point in 3D affine space.

Here is the pdf version of the post, which goes into more detail.

Reference: RV Gurjar and AK Pathak, “A short proof of Bezout’s theorem in P2”, Communications in Algebra, 38:7, 2010, pp.2585-2587.

Robert A. Millikan, Japanese internment, and Eugenics

A new preprint to the arXiv discusses Caltech’s decision to repudiate the Nobel laureate Robert Millikan. The preprint takes the California Institute of Technology (Caltech) to task for its decision. What follows is a digest of that 64-page preprint.

Nobel laureates Michelson, Einstein, and Millikan.

Nobel Laureate in physics

Robert A. Millikan (1868-1953) was the second American to win the Nobel Prize in physics. At the peak of his influence, no scientist save Einstein was more admired by the American public.

Millikan’s greatest scientific achievement was the isolation the electron and the measurement of its charge. Millikan was awarded a Nobel Prize in 1923 for the famous oil-drop experiment, which measured the electron charge. These measurements also led to reliable estimates of Avogadro’s number.

His second greatest scientific achievement was the experimental verification of Albert Einstein’s photoelectric equation, which was also recognized by the Nobel Prize. Einstein’s and Millikan’s Nobel Prizes are closely linked: Einstein’s for the theoretical derivation the photoelectric equation and Millikan’s for the experimental verification. (Contrary to what some might expect, Einstein’s Nobel Prize was not for the theory of relativity.) As a bonus, Millikan obtained what was then the best experimental value of Planck’s constant.

The photoelectric effect gave one of the first strong indications of wave-particle duality. Even today, wave-particle duality poses puzzling philosophical questions. Einstein hypothesized the existence of what is now called a photon – a hypothesis that flew in the face of the settled science of wave optics. This unsettling concept of light motivated Millikan in his experiments, and he eventually confirmed Einstein’s equation. The photon and the electron were the first in the timeline of many elementary particles. Millikan’s student Anderson later won a Nobel prize for the discovery of the positron.

Millikan on the cover of Time in 1927

Head of Caltech

Millikan, the head of Caltech during its first 24 years, oversaw its rapid growth into one of the leading scientific institutions of the world. Millikan was appointed the chief executive of Caltech in 1921, the year after an unexceptional manual training school was renamed the California Institute of Technology. He was offered the title of president but instead chose an organizational structure that made him the chairman of an eight-member executive council.

When Millikan first arrived at Caltech, “the faculty and graduate students were still a small enough group so that at the first faculty dinner we could all sit around a single long table in the basement of the Crown Hotel.” Through Millikan’s vigorous leadership, Caltech swiftly grew. “Millikan was everywhere planning, deciding, admonishing.” Linus Pauling wrote, “Millikan became a great public figure, who in the minds of the people of the country represented the California Institute of Technology….” The book jacket to Goodstein’s history of Caltech states, “To the public in the early days, Caltech soon became known as ‘Millikan’s school’ for Millikan, who functioned as the school’s president, brought to Pasadena the best and the brightest from all over the world….” According to Goodstein, “by 1930, Caltech was ranked as the leading producer of important physics papers in the country.”

Millikan’s School

In education, Millikan and Gale’s “First Course in Physics” is perhaps the best selling English-language physics textbook of all time. Including all editions and title variations, Millikan and Gale sold over 1.6 million copies between 1906 and 1952. It is particularly remarkable that these numbers were achieved in the first half of the twentieth century, when the physics textbook market was much smaller than it is now. In the international textbook market, “The Feynman Lectures on Physics” (another Caltech icon) holds the record with more than 1.5 million English-language copies and many more in foreign translation.

Demands for social justice

In response to demands for social justice following the murder of George Floyd, Caltech launched an investigation into Millikan. Caltech reached a decision to strip Millikan of honors (such as the library named after him), following accusations from various sources that he was a sexist, racist, xenophobic, antisemitic, pro-eugenic Nazi sympathizer. In short, Millikan’s School threw the book at Millikan.

Japanese internment camps

An article in Nature about Caltech’s decision accused Millikan of collaborating with US military to deprive Japanese Americans of their rights during their forced relocation to internment camps during the Second World War. The Caltech report on Millikan draws a similar conclusion. An examination of original historical sources shows that this accusation is false. On the contrary, Millikan actively campaigned during the war to promote the rights of Japanese Americans. The preprint traces the stages of misrepresentation that led to current false beliefs about Millikan.

Here is some relevant context.

  • After the end of World War II, the US Army, Navy, and Air Force organized large expeditions to Japan to evaluate Japan’s scientific capabilities. The physicist Waterman (remembered today for the Waterman Award) asked Millikan for names of Japanese scientists and engineers who had studied at Caltech to assist in the scientific expeditions to Japan. The Nature article and the Caltech report completely misunderstood the historical context of the scientific expeditions, conflating the scientific expeditions to Japan after the war with Japanese internment in the United States during the war. On the basis of this glaring misunderstanding, Caltech unjustly criticized Millikan for his treatment of Japanese-Americans.
  • In fact, during the war, Millikan was an officer in the Fair Play Committee, an organization that was formed in Berkeley to defend the rights of Japanese Americans. The day after the proclamation from FDR that ended internment, the chairman of the Japanese community council at one of the internment camps wrote a personal thank-you note to Millikan, saying “We realize that you played no small part in realizing this very important move” (of lifting the restriction on Japanese). Millikan received the Kansha award, which recognizes “individuals who aided Japanese Americans during World War II.”

Eugenics

The preprint also treats Caltech’s central accusation against Millikan: he lent his name to “a morally reprehensible eugenics movement” that had been scientifically discredited in his time. The preprint considers the statements in the Caltech report purporting to show that eugenics movement had been denounced by the scientific community by 1938. In a dramatic reversal of Caltech’s claims, all three of Caltech’s scientific witnesses against eugenics – including two Nobel laureates – were actually pro-eugenic to varying degrees. Based on available evidence, Millikan held milder views on eugenics than Caltech’s witnesses against him. The preprint concludes that Millikan’s beliefs fell within acceptable scientific norms of his day.

The history of eugenics has become a significant academic discipline, and a few paragraphs cannot do justice to the topic. The word eugenics evokes many connotations; a starting point is the Oxford English Dictionary definition of eugenics: “the study of how to arrange reproduction within a human population to increase the occurrence of heritable characteristics regarded as desirable.” From this starting point, the definition diverges in many directions. In California, eugenic practice took the form of a large sterilization program. The state government ran several hospitals for the care of those with mental illness or intellectual disabilities. During the years 1909-1979, over twenty thousand patients at those hospitals were sterilized. The “operations were ordered at the discretion of the hospital superintendents,” as authorized by California law. About three-quarters of sterilizations in California in 1936 were performed by request or consent of the patient or guardian (but the legal and ethical standards of informed consent fall short of what they are today).

The preprint devotes several pages to Millikan’s relationship with the eugenics movement.

  • Where did Millikan fit into the eugenics movement? Millikan was a bit player. His name does not appear in the definitive histories of eugenics, and biographies of Millikan do not mention eugenics. Caltech emeritus professor Daniel J. Kevles, who is a leading authority on both Millikan and eugenics, did not mention Millikan in his history of eugenics. Overwhelmingly, Millikan’s most significant contribution to the biological sciences was his part in establishing Caltech’s division of biology. Caltech was his legacy.
  • As reported in the Los Angeles Times, Millikan was strongly opposed to eugenics in 1925. He denounced the race degeneration theories of Lothrop Stoddard. Millikan maintained, “We can’t control the germ plasm but we can control education” and consequently, education was the “supreme problem” and a “great duty.”
  • The only other sentence in Millikan’s own words about eugenics appeared in 1939 in an article in which he made forecasts about how science might change “life in America fifty or a hundred years hence.” He wrote, “I have no doubt that in the field of public health the control of disease, the cessation of the continuous reproduction of the unfit, etc., big advances will be made, but here I am not a competent witness, and I find on the whole those who are the most competent and informed the most conservative.” There is no mention of specific methods such as sterilization. There is no suggestion of the use of coercion. He speaks in the future tense, “advances will be made” in the coming fifty or hundred years. He is cautious. He humbly professes his lack of expertise.
  • That is all that we have in the form of direct statements from Millikan on eugenics. The 1925 and 1939 statements from Millikan had not yet surfaced when Caltech issued its report. These direct statements from Millikan supersede the indirect case that was made in the Caltech report.
  • Millikan joined the board of a Pasadena-based pro-sterilization organization called the Human Betterment Foundation in 1938. His reason for joining is not known. Millikan did not attend board meetings. His non-participation in the organization is documented in the form of signed proxy-vote slips that are in the Caltech archives for those years that the board met. There is no evidence that he read or endorsed the pamphlets issued by the foundation. Board members were free to disagree with the pamphlets, and sometimes they did. When the foundation closed down in 1942, its assets were donated to Caltech. It was Millikan who redirected the funds away from sterilization research.
  • The section about eugenics in the Caltech report contains major historical falsehoods. The report used fabricated quotations from scientists of that era to make it appear that eugenics had been abandoned by the scientific community by the 1930s. To give one example, the report claims that in 1931 the Nobel laureate H. J. Muller denounced eugenics as an “unrealistic, ineffective, and anachronistic pseudoscience.” He made no such denunciation. In fact, Muller’s message was quite different: he said, it “is over course unquestionable” “that genetic imbeciles should be sterilized.” His 1936 book “Out of the Night” is a classic in eugenic writing. Muller continued to make a public case for eugenics until 1967, shortly before his death. In brief, Caltech got its history very wrong.

To be clear, nobody proposes a return to the California sterilization practices of the 1930s. According to science historian Alex Wellerstein, sterilization rates in California declined sharply in the early 1950s. The practice died not with a bang but a whimper. “No one took credit for killing the practice, and no one at the time appears to have noticed that it had ended.” “The horror we attach to sterilizations today, and to eugenics in general, did not become widespread until the 1970s, with the rise of interest in patient autonomy, women’s rights,” among other reasons (Wellerstein). During earlier decades, the largest institutional force in moral opposition to eugenics had been the Roman Catholic Church. The moral outrage at Caltech grew out of the Black Lives Matter movement and was directed toward the cause of “dismantling Caltech’s legacy of white supremacy.” The landscape has changed in other irreversible ways: birth control and genetic engineering have advanced far beyond the capabilities of the sterilization era.

Postscript on Diversity at Caltech

The preprint was written with a focus on Millikan. However, the larger message of the Caltech report is diversity. The word diversity and its inflections appear 78 times in the 77 page Caltech report. The word appears as many as nine times per page. The report, which is hosted by the Caltech diversity website, makes repeated reminders that Caltech has an “ongoing effort to forge a diverse and inclusive community.”

The Pulitzer Prize winning journalist Daniel Golden has written on admission practices at elite American universities. Caltech was unique among the most elite. Not long ago, Caltech boasted that on matters of admission, it made “no concessions to wealth, and it won’t sacrifice merit for diversity’s sake” (Golden, p278). David Baltimore, who was the president of Caltech, assured Golden that “Caltech would never compromise its standards. ‘People should be judged not by their parentage and wealth but by their skills and ability,… Any school that I’m associated with, I want to be a meritocracy'” (Golden,p284).

Never say never. The era of uncompromising standards at Caltech has come to an end. The Los Angeles Times reported on August 31, 2023 that Caltech is making historic changes to its admission standards. “In a groundbreaking step, the campus announced Thursday that it will drop admission requirements for calculus, physics, and chemistry courses for students who don’t have access to them and offer alternative paths….” “Data … showed a significant racial gap in access to those classes.” Caltech’s executive director of undergraduate admissions explained the new policy in these terms “‘I think that we’re really in a time where institutions have to decide if everything that they’ve been saying about diversity and inclusion is true,’ she said, noting that the challenge is especially acute now that the U.S. Supreme Court has banned affirmative action. ‘Is this something fundamental about who we are as an institution … or is this something that was just really nice window dressing.'” (LA Times, 2023).

The action against Millikan has been one campaign within a much larger political movement against standards of merit. Millikan himself had this to say about those who engage in mean-spirited attacks against America’s finest: “To attempt to spread poison over the United States with respect to the characters and motives of the finest, ablest and most public spirited men whom American has recently produced is resorting to a method which, it seems to me, all men of honesty and refinement can only abhor and detest.”

To be sure, Caltech has stirred up a hornets’ nest.

Nobel laureates Nernst, Einstein, Planck, Millikan, and Laue in 1931.

Operation Crossroads

Today I watched the film "Oppenheimer" and it piqued my curiosity about a collection of 8x10 photographs in my possession of the Operation Crossroads nuclear bomb tests that took place in July 1946 in the Bikini Atoll. These were the first nuclear bomb tests after the Trinity nuclear test, depicted in the film. The ABLE day explosion was a plutonium bomb, dropped on July 1. The BAKER day detonation was on July 25.

Vice Admiral William H.P. Blandy was the head of the operation. Blandy sent the photographs to Richard Kent Margetts, who was the commanding officer of USS Begor, a vessel that participated in the operation. From Margetts, the photographs passed into my family's hands.

Baker Day explosion from the top.

Photo from photographic tower on nearby island.

Ships of the Bikini are silhouetted in the glare an instant after the ABEL day bomb explodes.

Cauliflower head taken within several minutes of detonation of ABEL. This aerial view was one of the first to be released on July 5, 1946 and was flown to the United States for publication.

Mushroom cap photographed seconds after the ABLE day explosion.

Another patrol bomber photograph taken within several minutes of ABEL day explosion. One of the photographs released on July 5.

BAKER day explosion showing water and steam boiling out of the Bikini lagoon.

Automatic photo from photographic tower on a nearby island.

Late stage mushroom cloud shortly before its disintegration

BAKER day explosion from a plane almost directly overhead at the instant of detonation. The BAKER day explosion was the first underwater nuclear test.

The BAKER day blast as it comes down. Photo from photographic tower on a nearby island.

The Saratoga sank approximately seven and one-half hours after the BAKER day detonation. "After the war, Saratoga was considered surplus and was assigned to Operation Crossroads at Bikini Atoll to test the effect of the atomic bomb on naval vessels. Saratoga survived the first blast, an air blast, but was mortally damaged by a second blast, an underwater detonation." (source)

ABLE day explosion. This photograph taken from a Crossroads photo plane shows the radioactive cloud still boiling up toward its maximum height of approximately 35,000 feet. Note the shock wave circle sweeping out around the lagoon.

Rising column of water enters the first phase of characteristic mushroom.

BAKER day explosion from the air, taking the form of a derby hat.

Cloud formation at peak.

ABLE day bomb. The black band in the water is a pressure wave moving out from the explosion center. Fires can be seen aboard several ships.

Record of the atomic fireball, beginning micro-seconds after the ABEL day detonation. The pictures were taken by an electrically operated aerial camera.

ABEL day mushroom.

Another of the ABEL day photographs that was released in July 1946 for publication.

Conway’s sphere of infinite radius and Viazovska’s magic functions

Thomas Hales

Download scan of 1991 notes

The sphere packing problem is the second Fields medal problem I have worked on. The first was the Fundamental Lemma, solved by Ngô. Those who solve hard problems that I tried unsuccessfully to solve earn my undying admiration.

Inspired by Viazovska’s recent Fields medal, I searched my basement for any calculations of historical value. In fact, back in 1991, Conway suggested to me that to solve the sphere packing problem, we should view flat space as a sphere of infinite radius. I saw potential in the idea and made a few calculations between May and August that year.

I hesitate to show unpolished work that was never prepared for publication, having a distaste of publishing partial results. Nevertheless, here is a scan of a few handwritten notes from June 15, 1991, because they have a direct bearing on the eventual solution of the sphere packing problem in eight dimensions. I have not mentioned these notes to anyone until now. These many years later, the notes are in a rather presentable form. They anticipate the later work by Cohn-Elkies (1999-2003) that established the framework for Viazovska’s work. They describe the key role of the Fourier-Bessel transform (known also as the Hankel transform and equivalent to the Fourier transform for radial functions). The condition of the non-negativity of the Fourier-Bessel transform is stated. They study properties of functions q — now called Viazovska’s magic functions — and made a first guess of a formula for them in eight and twenty-four dimensions.

In the notes, the “guess for E8 is q(x) = sin(π x2)2/(π2 x4 (1-x)), but it doesn’t work.”

Adjusting by √2 to compensate for our different conventions, the incorrect guess of the magic function in dimension eight from 1991 has the right schematic form, as shown by the schematic diagram on page 8 in Cohn’s Fields medal laudatio for Viazovska. Of course, it was a long journey from Conway’s idea of a sphere of infinite radius to Viazovska’s exact formula for the magic functions.

Graph of the first guess q(x/√2). As with the true magic function, this function has zeros at √n, for n=2,4,…, is positive for x<√2 and is non-positive for x≥√2. Normalization is q(0)=1.

The idea of the notes is simple. Odlyzko and Sloane solved the kissing number problem in dimensions eight and twenty four by using positive definite spherical functions for a unit sphere. Rescaling the unit sphere to infinite radius (or equivalently, letting the radius of the spherical caps tend to zero), Odlyzko and Sloane translates into a bound on the density of sphere packings in Euclidean space.

An easy lemma shows that the bound on the density of sphere packings asserted in the scanned notes is the same as the bound given by the main result of Cohn-Elkies (Theorem 3.1). The derivation of the bound in the notes is incomplete.

Download scan of 1991 notes

Kepler’s laws of planetary motion

PDF version here

Erich Vogt used to present a slick proof to his introductory physics students at UBC of Kepler’s first law: every planet moves along an elliptical orbit having the sun for a focus.

We consider three variable quantities of a planet along its orbit (in blue): let r be the positive distance from the sun (or more precisely, the center of mass) to the planet, let v be the speed of the planet, and let h be the positive distance from the sun to the tangent line \ell of the planetary trajectory. The demonstration of Kepler’s first law is a one-sentence proof, based on the conservation of energy

(the first term on the right is the kinetic energy and the second term is the gravitational potential energy) and the conservation of angular momentum

L = mvh

The only thing that matters to us about energy E, angular momentum L, the gravitational constant G, and the masses M,m is that they are constants with L,G,M,m positive and E negative (for bounded orbits).

Here is the sentence: solve the angular momentum equation for the variable v = L/(m h) and use it to eliminate v from the energy equation, which after dividing both sides by the constant -E, takes the form of the so-called pedal equation of an ellipse:

where the physical constants

have been consolidated. This rearrangement is purely algebraic. No physics or differential equations are involved. Here a,b are positive constants, while h,r vary along the planetary orbit, related to each other by this pedal equation.

This is a splendid derivation of Kepler’s law if you already know the pedal equation for an ellipse. Otherwise, it leaves you wondering what the equation has to do with an ellipse. We give an elementary proof that the ellipse satisfies the pedal equation, to complete the derivation of Kepler’s law. The statements and proofs are so elementary, and the literature on conic sections so vast, it is probably futile to search for their historical source. Pedal equations first appeared during the 17th century (with Gilles de Roberval), and Besant’s book on conic sections in 1881 states and proves our main proposition.

We use standard notation for an ellipse as shown in the diagram. In particular, a,b are the semi-major and semi-minor axes, and a2=c2+b2, where 2c is the distance between the foci of the ellipse. Two key properties are used. If r1 and r2 are the distances from any point P on the ellipse to the two foci F1,F2, then r1+r2=2a. Also, a ray of light emanating from one focus reflects off the ellipse back to the other focus; that is, the angles θ at P between the tangent line \ell and the segments to foci F1,F2 are equal.

Proposition. Let P be any point on the ellipse. Let h1,h2 be the distances from the foci F1,F2 to the tangent line \ell at P as shown. Then

Proof. The proof of the proposition is based on the law of cosines and the double angle formula in the form \cos(2\theta)=-\cos(\pi-2\theta) = 1 - 2\sin^2\theta. Also \sin\theta = h_1/r_1 = h_2/r_2 (the sine is the opposite leg over the hypotenuse of a right triangle).

We apply the law of cosines at P for the triangle PF1F2 with edges 2c, r1, r2:

QED.

Corollary. The pedal equation holds for an ellipse.

Proof. Eliminate the two variables r2,h2 from the three equations b2 = h1 h2, r1+r2=2a, and h1/r1 = h2/r2 to yield the pedal equation

QED

Is the ellipse with parameters a,b the only curve that satisfies the pedal equation? Indeed, it is, when r lies in the open interval (a-c,a+c). If we fix F, then the pedal equation assigns to each point P in the plane at distance r from F, the direction of the tangent to the curve at P, up to a four-fold ambiguity in determining an angle θ from |\sin\theta|=h/r. A coherent choice to this ambiguity gives a unit-length vector field. By uniqueness of solutions to ordinary differential equations, the ellipse with parameters a,b through a given initial point P is the only solution (with the same four-fold ambiguity, generated by the reflectional symmetry through the line FP and reversal in the direction of traversal of the curve).

Kepler’s other laws are even more elementary, which we briefly review for the sake of completeness. Kepler’s second law asserts that a planetary orbit sweeps out equal areas in equal times. In other words, the law asserts that the area-sweeping rate A’  is constant. By the method of related rates, the rate A’  is the area of the triangle in the first diagram, whose base is the rate v directed along the tangent line \ell, whose apex is the focus F, and whose height is h. The area formula for a triangle (half the base times height) gives

which is indeed constant. In retrospect, Kepler’s second law is just a geometric way of expressing the conservation of angular momentum.

The third law states that the orbital period T is proportional to a^{3/2}. The proof is based on the area formula A = π a b of an ellipse. The total area A of an ellipse is equal to the area-sweeping rate A’  times the sweeping time T:

In the definition of a, b above, looking at the dependence of the consolidated constants a, b on the planetary parameters m,E,L, we see that a is proportional to m/|E| and b m/L is proportional to (m/|E|)^{1/2}. Combining these facts, we get the third law:

where \sim denotes proportionality. The constant of proportionality 2π/√GM is independent of the planet, but depends on the solar mass and universal constants.

PDF version here.

A Review of the Ferguson Imperial Model of COVID-19 Infection

Abstract

This is a review of the Ferguson Imperial model in mathematical epidemiology. This article begins with a familiar review of the political significance of the model. It then describes the general mathematical structure of the model and the computer code that implements it. Finally, it makes a detailed analysis of the bugs in a particular section of the computer code. This is an expanded version of a presentation to University of Pittsburgh’s Math Biology seminar on October 8, 2020.

In early March 2020, the U.K. government had decided to let the “virus run its course … with the exception of the elderly, who were to be kept indoors.” Policies were quickly reversed in mid-March, when on Tuesday March 17, major newspapers delivered the shocking news of a computational model predicting millions of deaths.

Model

The article gives a mathematical description of the Imperial model, and gives a detailed exposition of the equations.

It is a “stochastic, spatially structured, individual based discrete time simulation.”

Individual-based means that every individual from a country’s population (as given by census data) is represented as a separate individual in the model. If the US has a population of 330 million, then the US model has 330 million individuals.

Spatially-structured means that each individual lives at a particular location in the country, determined by realistic demographic data obtained from Oakridge National Laboratory with resolution at scale under one kilometer for the United States and Great Britain.

Stochastic means that any susceptible individual might become infected at any time, according to random variables in the model. Different runs of the software produce different outcomes. In one run the infection might fail to spread, and in another run there might be millions of deaths.

Discrete time (and space) means that there are no differential equations or other forms of continuous modeling.

It is a survival model in the sense of hazard theory.

The Imperial model is very far from being a simple SIR model. Indeed, the Imperial model is orders of magnitude more complex and depends on many more empirically determined parameters. SIR is deterministic, but Imperial is stochastic. Unlike SIR, the Imperial model has no differential equations: unlike the continuous functions in SIR, time and the number of individuals are discrete. However, just like the SIR models, the Imperial model partitions the population into three compartments: the susceptibles, the infected, and the removed. Like the SEIR model, there is a latent period during which the infected are not infectious. It is therefore useful to keep the SIR picture in mind when studying the Imperial model.

Bugs

I have thoroughly analyzed the bugs in one function (which is 168 lines of computer code in C) of the Imperial code. The function AssignHouseholdAges is part of the initialization code for the model. It takes an input parameter householdSize giving the number of individuals in the household and it assigns ages to each member of the household, subject to a long list of constraints. For example, parents must have a certain minimum age, parents must not be too different in age from each other, their children must not be too close in age to or too distant in age from the parents, and so forth.

I picked this function partly because of its manageable length, but also because of its relative independence from other parts of the code; I can analyze it in isolation. I have no reason to suspect that it has more bugs than other parts of the Imperial code base.

Serious bugs destroy the overall integrity of the code. A separate question is whether these bugs materially change the predictions of the code. I do not know. It is possible that other uncertainties are so extreme and that the model is so insensitive to its own internal details that the bugs are obscured.

An Argument for Controlled Natural Languages in Mathematics

What follows are the opening paragraphs of a pdf document giving an argument for controlled natural languages in mathematics.

At the recent Big Proof 2 conference in Edinburgh, I realized that a case must be made for developing a controlled natural language for mathematics. There is little consensus on this issue, and mathematicians and computer scientists tend to line up on opposite sides. Some are outright dismissive of the idea. Little research is happening. While making the case below (giving the mathematician’s side), I’ll also review and respond to some of the computer scientists’ objections.

Controlled Natural Languages (CNL)

By controlled natural language for mathematics (CNL), we mean an artificial language for the communication of mathematics that is (1) designed in a deliberate and explicit way with precise computer-readable syntax and semantics, (2) based on a single natural language (such as Chinese, Spanish, or English), and (3) broadly understood at least in an intuitive way by mathematically literate speakers of the natural language.

The definition of controlled natural language is intended to exclude invented languages such as Esperanto and Logjam that are not based on a single natural language. Programming languages are meant to be excluded, but a case might be made for TeX as the first broadly adopted controlled natural language for mathematics.

Perhaps it is best to start with an example. Here is a beautifully crafted CNL text created by Peter Koepke and Steffen Frerix. It reproduces a theorem and proof in Rudin’s Principles of mathematical analysis almost word for word. Their automated proof system is able to read and verify the proof.

Theorem [text in Naproche-SAD system] If x\in {\mathbb R} and y\in{\mathbb R} and x > 0 then there is a positive integer n such that n\cdot x > y.

Proof Define A=\{n\cdot x \mid n\ \hbox{is a positive integer}\}. Assume the contrary. Then y is an upper bound of A. Take a least upper bound \alpha of A. \alpha- x < \alpha and \alpha - x is not an upper bound of A. Take an element z of A such that not z \le \alpha - x. Take a positive integer m such that z = m\cdot x. Then \alpha - x < m\cdot x (by 15b).

\alpha = (\alpha -x) + x < (m\cdot x) + x = (m+1)\cdot x.


(m+1)\cdot x is an element of A. Contradiction. Indeed \alpha is an upper bound of A. QED.

In my view, this technology is undervalued. I feel like a stock advisor giving tips here, but I see it as an opportune time to invest heavily in CNLs.

The argument

Here is an outline of the argument that I will develop in the paragraphs that follow.

  1. Technology is still far from being able to make a semantic reading of mathematics as it is currently written.
    • Machine learning techniques (in particular, deep neural networks) are still far from a semantic reading of mathematics.
    • Linguistic approaches are still far from a semantic reading of mathematics as it is currently written.
  2. Mathematicians are still far from the mass adoption of proof assistants.
    • Adoption has been gradual.
    • Structural reasons hinder the adoption of proof assistants.
  3. There is value in bridging the gap between (1) and (2).
  4. CNL technology works now and can help to bridge the gap.

1. Technology is still far from making a semantic reading of mathematics as it is currently written

In my view, the current champion is WolframAlpha, in its ability to answer natural language queries about mathematics.

1a. Machine learning (neural networks) are still far from a semantic reading of mathematics

Recent advances in machine learning, such as AlphaZero, have been spectacular. However, some experts point to major unsolved technological problems.

I visited Christian Szegedy’s group at Google in Mountain View and April and attended their talks at AITP in Austria the same month. They have ambitions to solve math (to use their mind-boggling phrase) in the coming years. Their ambitions are inspiring, the resources and research methodologies at Google are grandiose, but I would not personally bet on their time frame.

Their current focus is to use deep neural networks to predict a sequence of tactics that will successfully lead to new formal proofs of lemmas. These are all lemmas that have been previously formalized by John Harrison in his HOL Light libraries on real and complex analysis. As things stand now, other long-established technologies (such as hammers) have had similar success rates in formal proof rediscovery. Thus far, Google is not ahead of the game, but they are progressing quickly.

David Saxton (at DeepMind) spoke at the Big Proof 2 conference in May. He described a new large machine-learning dataset of school-level problems in areas such as arithmetic, algebra, and differential calculus. Currently the machine learning success rates are poor for many of these categories. The dataset is meant to stimulate machine learning research on these topics.

Deep neural networks have not been successful thus far in learning simple algorithms. In particular, deep neural networks cannot currently predict with much accuracy the answer to elementary sums of natural numbers. What is 437+156? Can a machine learn the algorithm to add? School-level math problems require more than parsing mathematical text. Nevertheless, parsing of mathematical text remains an unsolved challenge. They write in the introduction to their recent paper:

One area where human intelligence still differs and excels compared to neural models is discrete compositional reasoning about objects and entities that ‘algebraically generalize’ (Marcus, 2003). Our ability to generalise within this domain is complex, multi-faceted, and patently different from the sorts of generalisations that permit us to, for example, translate new sentence of French into English.

Here is an example of a challenging problem for machine learning from their paper. What is g(h(f(x))), where f(x) = 2x + 3, g(x) = 7x - 4, and h(x) = -5x - 8?

Based on their paper, it seems to me that semantic parsing of research level mathematics is still years away.

1b. Linguistic approaches are still far from a semantic reading of mathematics as it is currently written.

The most widely cited work here is Mohan Ganesalingam’s thesis “The
language of mathematics.
” The thesis uses Discourse Representation
Theory from linguistics to analyze mathematical language.
Ganesalingam analyzes the language of mathematics as it is written in
real mathematical publications, in all of its organic complexity.

There is far too little research in this area. Here is his assessment.

Mathematics differs sufficiently from human languages that little linguistic theory can be used directly; most of the material must be rebuilt and reconstructed. But linguistics, at the very least, gives us general ideas about how we can formally analyse language: it gives us a mindset and a starting place. Thus, in very broad terms, our work may be regarded as `the application of linguistics to mathematical language’. This is almost untrodden ground; the only linguist to analyse mathematics, Aarne Ranta, notes that “Amazingly little use has been made of [mathematical language] in linguistics; even the material presented below is just an application of results obtained within the standard linguistic fragment […].” Ranta (1994)

Ganesalingam – Principia conference

Furthermore,

We demonstrate that no linguistic techniques can remove the ambiguity in mathematics. We instead borrow from mathematical logic and computer science a notion called the `type’ of an object… and show that this carries enough information to resolve ambiguity. However, we show that if we penetrate deeply enough into mathematical usage, we hit a startling and unprecedented problem. In the language of mathematics, what an object is and how it behaves can sometimes differ: numbers sometimes behave as if they were not numbers, and objects that are provably not numbers sometimes behave like numbers.

Ganesalingam – Principia conference

As I understand, Mohan’s work was mostly theoretical and although there were once plans to do so, it has not been implemented in any available software.

2. Mathematicians are still far from the mass adoption of proof assistants

2a. Adoption has been gradual


I have watched first-hand the gradual adoption of proof assistants by
mathematicians.

In 2001 when I first took up formalization, I knew only a few mathematicians who were even aware of proof assistants (Dan Bernstein, Carlos Simpson, Doran Zeilberger, …). That was the year I moved to Pittsburgh, and CMU already had a long tradition of formal proof in among logicians and computer scientists (Dana Scott, Peter Andrews, Frank Pfenning, Bob Harper, Jeremy Avigad, and so forth).

In 2008, Bill Casselman and I were guest editors of a special issue of the Notices of the AMS on formal proof that helped to raise awareness among mathematicians. This included a cover article by Georges Gonthier on the formal proof of the four color theorem. (All the authors of this special issue were together again at the BigProof2 conference.)

Homotopy type theory (HoTT) became widely known throug a special program at the Institute for Advanced study in 2012-2013, organized by V. Voevodsky, T. Coquand, and S. Awodey. A popular book on HoTT resulted from that program. Other programs followed, such as the IHP program on the semantics of proof and certified programs in 2014.

Major publishers in mathematics now publish on formal proofs. There have been notable success stories (formalizations of the Feit-Thompson theorem and the Kepler conjecture). Equally, there have been major industrial success stories (the CompCert verified compiler and the SeL4 microkernel verification).

In the last few years, several more mathematicians (or perhaps dozens if we include students) have become involved with Lean (but not hundreds or thousands).

Formal proofs are now regular topics online in places such as reddit and mathoverflow.

In summary, the landscape has changed considerably in the last twenty years, but it is still nowhere near mass adoption.


2b. Structural reasons hinder the adoption of proof assistants


There are reasons for lack of mass adoption. The reason is not (as some computer scientists might pretend) that mathematicians are too old or too computer-phobic to program.

If mathematicians are not widely adopting formalization, it is because they have already sized up what formalization can do, and they realizethat no matter its long term potential, it has no immediate use to them. Homotopy type theory and related theories are closely watched by mathematicians even if not widely used.

I realize that I am an exception in this regard, because I had a specific real-world situation that was solved through formalization: the referees were not completely able to certify the correctness of the Kepler conjecture and eventually they reached a point of exhaustion. It fell back on my shoulders to find an alternative way to certify a proof the Kepler conjecture.

Wiedijk has pinpointed the most important reason that proof assistants have not caught on among mathematicians: in a proof assistant, the obvious ceases to be obvious.

Kevin Buzzard has spoken about his frustrations in trying to get Lean to accept basic facts that are obvious to every mathematician. Why is 1\ne 2? He has faced further frustrations in dealing with mathematically irrelevant distinctions made by proof assistants. For example, different constructions of a localization of a ring can lead to canonically isomorphic (but unequal) objects, and it is no small matter to get the proof assistant to treat them as the same object.

3. There is value in bridging the gap between (1) and (2).

I’ll quote a few mathematicians (from Ursula Martin’s BigProof2 conference slides from the Breakthrough Prize interviews, 2014)

“I would like to see a computer proof verification system with an improved user interface, something that doesn’t require 100 times as much time as to write down the proof. Can we expect, say in 25 years, widespread adoption of computer verified proofs?” -J. Lurie

“I hope [we will eventually be able to verify every new paper by machine.]. Perhaps at some point we will write our papers… directly in some formal mathematics system.” – T. Tao

Timothy Gowers was part of a panel discussion at BigProof1 conference in 2017. He went through a list of automation tools that he would find useful and another list of tools that he would not find useful at all. One dream was to develop an automated assistant that would function at the level of a helpful graduate student. The senior mathematician would suggest the main lines of the proof, and the automated grad student would fill in the details. He also suggested improved search for what is in the literature, when you do not know the name of the theorem you are searching for, or even whether it exists. For this, full formalization is not needed.

Michael Kohlhase and others in his group have advocated “flexiformal” mathematics, as an interpolation between informal and formalized mathematics. They are also developing technology to translate between different proof assistants (or formal languages).

The Logipedia project also implements general methods to translate mathematics between formal languages, using a Logical Framework (Dedukti) and Reverse Mathematics.

An intended application of a large corpus of mathematics in a CNL would be big data sets for machine learning projects. Currently there is far too little data that aligns natural language mathematics with formal mathematics.

My dream is to someday have an automated referee for mathematical papers.

4. CNL technology works and can help to bridge the gap.

I believe strongly in the eventual mechanization of large parts of mathematics. According to a divide and conquer strategy for accomplishing this, we want to strike at the midpoint between current human practice and computer proof. In my view, CNLs do that. A million pages of this style of CNL is achievable.

4a. CNL has a long history

A brief history of the CNL Forthel (an acronym of FORmal THEoryLanguage) is sketched in K. Vershinin’s note. Koepke told me that Forthel (that is, fortel’, or фортель) means trick in Russian and that this is an intended part of the acronym.

Research started with V. Glushkov in the 1960s. A system of deduction was part of the system from the beginning (even if my interest is primarily in the language). A description of the language first appeared in the Russian paper “On a language for description of formal theories” Teoretischeskaya kibernetika, N3, 1970.

Research continued with V. Bodnarchuk, K. Vershinin and his group through the late 1970s. Most of the early development was based in Kiev.

The language was required to have a formal syntax and semantics. To allow for new definitions, “the thesaurus of the language should be separated from the grammar to be enrichable. On the other hand the language should be close to the natural language of mathematical publications.”

The development of the Forthel language development seems to have been intermittant. Andri Paskevich’s thesis in Paris gives a description of the language and deduction system as of the end of 2007.

Afterwards, development (renamed as Naproche-SAD) shifted to Bonn with Peter Koepke and his master’s student Steffan Frerix. The current Haskell code is available on github. In this document, I will use Forthel and Naproche-SAD as synonyms, although Forthel is the name of the controlled natural language, whereas Naproche-SAD includes a reasoning module connected to the E theorem prover.

In May 2019, Makarius Wenzel implemented a mode of Isabelle/PIDE for Naproche-SAD. It can be downloaded. This is currently the best way to interact with Naproche-SAD.

4b. The Forthel language has an elegant design

The text we produce is intentionally verbose and redundant. The aim is to produce texts that can be read by mathematicians without any specialized training in the formalization of mathematics. We want mathematicians (say after reading a one page description of our conventions) to be able to read the texts.

The ability to understand a CNL is an entirely different matter than to write or speak a CNL. The CNL might have invisible strict grammatical rules that must be adhered to. As Stephen Watt put it, there is a difference between watching a movie and directing a movie.

Our main references are Paskevich’s paper on Forthel (The syntax and semantics of the ForTheL language, 2007), his thesis at Paris XII and the Naproche-SAD source code.

Naproche-SAD achieves readability through a collection of tricks. The linguistics behind Naproche-SAD is actually quite elementary, and it is remarkable how English friendly it is with so little linguistics. One trick is the generous use of stock phrases (or canned phrases) that the user can supply.

Another trick is the use of free variants (synonyms), such as making set and the plural sets fully interchangeable synonyms. There are no grammatical notions of gender, case, declension, conjugation, agreement, etc. in Forthel. The needed forms are introduced as variants and the system does not care which are used.

Another trick is the use of filler words. These are words that are ignored by Forthel. For example, in many contexts indefinite and definite articles a, an, the are disregarded. They are there only for human readability.

The Naproche-SAD parser is written in Haskell. It is basically a simplified knock-off of the Haskell parsec parser library.

The grammar is not context-free (CFG). Some parser generators (such as LR(k) parsers) are not appropriate for our project.

Although the grammar is extensible, we can still describe the language by a finite list of production rules in BNF format. The collection of nonterminals cannot be changed. However, some of the nonterminals are designated as primitive. Primitive nonterminals differ from ordinary nonterminals in that new replacement rules can be added (by the user) to a primitive nonterminal. In this way, as mathematical text progresses, the grammar can be enriched with new constructions and vocabulary.

4c. Forthel language design principles
provide a template for future CNLs

This is work in progress. The point is that Forthel design is sufficiently generic that the language can be readily adapted to support back-end semantics of Lean’s dialect of CiC.

For this project, we recommend using the Haskell parsec parser library. Parsec is written as a library of parser combinators, with lazy evaluation, in a continuation style, with state stored in a parser monad.

If Lean4 has sufficiently powerful parsing capabilities, we can eventually port the project into Lean4. By using Haskell for our initial implementation, we make porting easier.

Actually, we combine two different parsing technologies: parsec style combinators and a top-down-operator precedence parser. The top-down-operator precedence parser is used for purely symbolic formulas. We describe below the rule used to extract sequences of symbolic lexemes that are shipped to the top-down-operator precedence parser. By including a top-down-operator precedence parser, we are able to add new symbolic notations with fine-grained levels of precedence.

We expect the grammar to be ambiguous. The parser should be capable of reporting ambiguous user input.

It seems to me that the biggest difficulty in writing a CNL for Lean will be the “last mile” semantics of Lean. By this, I mean the automatic inserting the correct coercions and casts along equality to produce a pre-expression that can be successfully elaborated and type checked by Lean. (I expect that automatically generated pre-expressions will frequently be off by casts along equality.)

Objections and counterarguments

At the BigProofs2 conference, several researchers made the argument that controlled natural languages are a step backwards from precise programming languages.

We should not abandon centuries of mathematical notational improvements

A sneering comparison was made between CNLs and the medieval practice of writing formulas and calculations out long-form in natural language. Significant progress in mathematics can be attributed to improved symbolic representations for formulas and calculations. It would be terrible to be forced in our language to write \pi/\sqrt{18} as pi divided by the square root of eighteen.

I agree. In fact, our controlled natural language is designed as a superset of Lean syntax. Lean is both a language of mathematics and a pure functional programming language similar to Haskell. Our augmented syntax loses none of that. We continue to write \pi/\sqrt{18} as such.

Nobody advocates abandoning notations that mathematicians find useful. A fully developed CNL includes all the great notational inventions of history such as decimal, floating point and other number formats, polynomial notation, arithmetic operations, set notation, matrix and vector notation, equations, logical symbolism, calculus, etc.

A CNL is also more than that, by providing precise semantics for mathematics in natural language.

Knuth, who has thought long and hard about mathematical notation, instructs that it is especially important “not to use the symbols \therefore,\ \Rightarrow,\ \forall,\ \exists,\ \ni; replace them by the corresponding words. (Except in works on logic, of course.)” The symbols are less fluent than the corresponding words. Knuth gives this example of a bad writing style:
\exists_{n_0\in N_0} \forall_{p\in P}\ p\ge n_0 \Rightarrow f(p)=0.
Another bad example from Knuth appears below in the section on examples.

We should not return to COBOL

Continuing with the criticism of CNL, some argue that CNLs are an unpleasant step in the direction of COBOL. Among the BigProof2 conference crowd, COBOL evokes Dijkstra’s assessment “The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offence.” Still today, there is a strong distaste among some theoretical computer scientists (and principled programming language designers) of designs based on natural language.

Like COBOL, a CNL is self-documenting, English-oriented, and more verbose than ordinary programming languages. However, the features of COBOL that incite wrath have nothing to do with CNLs: the GOTO statement, lack of modularity, no function calls, no type checking on subroutine calls, no libraries, and so forth.

Lean is already readable

Some say that Lean is already readable and that a CNL is therefore wasted effort.

I find the level of readability of Lean to be similar to that of Coq. It is somewhat less readable than scripts in Mizar and Isabelle/HOL/Isar.

For those trained in Lean, the statements of definitions and theorems are readable, but most users find it necessary to jump incessantly from file to file within an environment such as VSCode (or emacs) because relevant context is spread through many files.

Generally, even those who write the scripts in Lean find it necessary to jump around in this manner to make sense of their own scripts. It is wonderful to have tools that allow us to jump around, but when that becomes the only possible mode of interaction, our thinking about mathematics becomes fragmented. (I may be old-fashioned in this worry of fragmentation.)

Proofs in almost all proof assistants are unreadable (in any sort of linear text-oriented fashion). Lean is no exception.

My impression is that few mathematicians would be willing to invest the necessary effort to read Lean scripts.

Here are some samples of Lean code. I do not mean to single out the authors of these files for criticism. The files are meant to be representative of Lean.

In a CNL, there is no semantic penalty for writing the fundamental theorem of algebra as follows.

Assume that f is a polynomial over {\mathbb C} of positive degree. Then there exists a complex number z that is a root of f.

Documentation should be generated from source code, not vice versa.

Another argument is that we should generate natural language documentation from the formal text rather than the other way around. There are many successful examples of this approach, such as Mizar to LaTeX translation. These are useful tools, and I have no criticism of this.

The direction of code generation depends on the purpose. The source format is generally speaking the cleanest representation. A human will be directly involved in writing the source format. If the primary purpose is formalization, then formal scripts should be the source format from which others are generated. (Subtle differences in representation of mathematical concepts code can make a big difference in how painful the formal proofs are.) If the primary purpose is mathematical communication, then human-oriented language should be the source format. (Humans can be easily annoyed by the stilted language generated from formal representations.) Who should we ask for tolerance, the human or the computer?

Other variations appear in practice. In some projects there are two source formats and semantics-preserving translation from one to the other. For example, computer software can have a specification from which executable code might be automatically extracted, but for reasons of efficient execution it might be better to run an optimized hand-crafted version of the code that is verified to be equivalent to the extracted code.

Another approach was used in the Flyspeck project, which used an English text as the blueprint for the formal proof scripts. Both the blueprint and proof scripts were written by humans, and the alignment between them was also a human effort. I wish to avoid that labor-heavy style in the formal abstracts project.

Also, in the proof of the Kepler conjecture, the C++ code to rigorously test nonlinear inequalities by interval arithmetic was automatically generated from the formal specification of the inequalities in HOL Light.

The foundational back ends of CNLs are too weak

The Forthel language compiles down to first order logic. The reasoning in the earlier CNL example from Rudin is not based on axiomatic reasoning using the axioms of Zermelo-Fraenkel set theory. Rather, the authors made up their own system of axioms that were designed to prove exactly what they wanted to prove. For this reason, the example is really just a toy example that is foundationally ungrounded. (Yet it is a working technology that makes us see what is concretely realizable.)

None of the successful proof assistants are based on first order logic. Most of the widely used proof assistants are based on some form of type theory (simple type theory or the calculus of inductive constructions). Proof assistants based on set theory (such as Mizar) add a layer of soft typing on top of the set theory.

I agree with this criticism. I believe that a more powerful back end is needed for a CNL if it is to be widely adopted. I propose the Lean theorem prover as a back end. This will require some changes to the grammar of the CNL.

We already have too many proof assistants and incompatible research directions.

Another CNL will further fragment research on formalization.

Interoperability is important. It emerged as a notable theme at the BigProofs2 conference. I would hope that we can achieve some interoperability with other projects such as Flexiformal math and OMDoc, Logipedia and Dedukti, and the Lean theorem prover math libraries.

A long term project might be to design a CNL that is sufficiently universal to work with different proof assistants on the back end.

In fact, there is too little activity in the CNL space right now. I hope that we can be compatible with Arnold Neumaier’s group and Peter Koepke’s.

examples

The examples in this section are hypothetical. They are based on CNL grammar rules that have not been programming into a computer. They illustrate what we hope soon to obtain. The changes to the existing production rules of Forthel are relatively minor.

To continue reading, open the pdf.

I thank Arnold Neumaier and Peter Koepke for many discussions and for getting me interested in this topic.

New Proof of the Fundamental Lemma

A new proof of the Fundamental Lemma has been announced by
Groechenig, Wyss, and Ziegler [GWZ].

The fundamental lemma is an identity between integrals on p-adic
reductive groups that was conjectured by Langlands in his Paris
lectures on the stable trace formula (1979). The name
fundamental lemma” was a misnomer, because for decades it was only a
conjecture. These conjectures were later put into more precise form
by Langlands and Shelstad in 1987, then extended by Kottwitz and Shelstad (Foundations of Twisted Endoscopy, 1999) and Arthur in the twisted case.

The fundamental lemma was studied intensively by numerous researchers
for decades before a solution was found. Over the years the
fundamental lemma was transformed successively as a question about
counting points in Bruhat-Tits buildings (Langlands, Rogawski),
asymptotics and Igusa theory (Langlands and Hales), affine Springer
fibers (Kazhdan and Lusztig) and equivariant cohomology (Goresky,
Kottwitz, MacPherson), compactified Jacobians (Laumon, Ngo), and the
Hitchin fibration, stacks, and perverse sheaves (Ngo). Over time,
Waldspurger provided major simplifications of the original
conjecture. This list of mathematicians includes many that I greatly
admire.

These identities were finally established by Ngo Bao Chau (and by
Chaudouard-Laumon in the twisted case) in a spectacular proof that led
to his Fields medal.

Although I am no longer an active researcher in the Langlands program,
Robert Langlands was my thesis advisor, and my primary research
interest was the fundamental lemma and related identities during the
years 1984-1993. I gave a Bourbaki report on Ngo’s proof of the
fundamental lemma.

The fundamental lemma provides the identities between integrals that
are absolutely crucial for the existence of the stable form of the
Arthur-Selberg trace formula. The stable trace formula has become a
major tool in the study of automorphic representations, which lie at
the heart of the Langlands program.

Many applications in number theory follow from the fundamental lemma,
through the stable trace formula. For example, one of the earliest
cases of the fundamental lemma — base change for GL(2) — yielded
a trace formula that led to the Langlands-Tunnell theorem, an
essential ingredient in the proof of Fermat’s Last Theorem. Arthur’s
book on the classification of automorphic representations of classical
groups is another major application of the trace formula and
fundamental lemma.

As mentioned above, a recent preprint GWZ on the arXiv gives a new proof, which we now describe.

Hitchin fibration

One of the main ideas that led to a solution to the fundamental lemma
was Ngo’s realization that the Hitchin fibration is the natural global
analogue of affine Springer fibers. The starting point for the GWZ
proof of the fundamental lemma is also the Hitchin fibration. In
this, they do not differ from Ngo. They also follow Ngo in making
heavy use of a Picard stack that acts on the Hitchin fibration.

To appreciate the new proof, it is necessary to look at the details of
Ngo’s proof to see how it has been simplified by GWZ. Notably,
the perverse sheaves, support theorems, and decomposition theorem,  which were essential to Ngo’s approach, have been entirely eliminated. (The support theorems
take up a very technical 25 page section in Ngo.) In fact, in the GWZ
proof there are no sheaves at all, and only light use of cohomology
(for the Brauer group and Tate duality, which are needed even to state
the fundamental lemma).

In my mind, the ugliest part of Ngo’s (otherwise beautiful) proof came in his use of a
stratification by a numerical invariant. The construction of the
Hitchin fibration depends on a line bundle, and Ngo’s proof relies on
complicated relations between the stratification and the line bundle
as the degree of the line bundle tends to infinity. This very
technical part of the proof of the fundamental lemma has been
eliminated in the GWZ proof.

Another part of Ngo’s proof that has been eliminated by GWZ is the
explicit calculation of orbital integrals. Overall, the GWZ proof is
simpler in two ways: it eliminates the most complex machinery (such as
the decomposition theorem) and the most complex calculations (such
stratification estimates). However, the Hitchin fibration and stacks
are still needed in both proofs.

GWZ proof of the fundamental lemma

The GWZ approach originated with work on the topological mirror
symmetry conjecture for the smooth moduli space of Higgs bundles,
which uses p-adic integration to prove identities of stringy Hodge numbers. They realized that similar techniques should work to give a new proof of the fundamental lemma.

Ngo’s work shows the number of points (on Hitchin fibers over finite fields)
are related on different groups. The key idea of the GWZ proof is
that these point counting problems can be expressed as a p-adic
integral (over the Hitchin fibration). The advantage
of working with integrals is that they are insensitive to sets of
measure zero, and this allows badly behaved Hitchin fibers to be
ignored. This is crucial, because the badly behaved fibers are the
source of all the woes. Some of the ideas here have been influenced
by motivic integration, especially a paper by Denef
and Loeser on Batyrev’s form of the McKay correspondence.

In formulating the fundamental lemma as a p-adic identity, the GWZ
proof is the latest zag in a curious series of zig-zags between global
fields and local fields: the trace formula
(global) led to the fundamental lemma and affine Springer fibers
(local), which was reformulated on the Hitchin fibration (global), as
identities of p-adic integrals (local).

The fundamental identity in GWZ takes the form of a remarkable duality
of integrals between a reductive group G and its Langlands dual G’:

(1)  ∫_M(G,t,O) χ(s) dG = ∫_M(G',s,O) χ(t) dG'

(after dropping some subscripts and superscripts) where s and t
run over appropriate character lattices, O is the set of
integer points of a local field, M(G,t) are twists of the Hitchin
fibration M on G (and G’), and χ are functions defined through the Brauer
group (related to the κ in Langlands’s original formulation). This identity directly implies the fundamental lemma, and it
is cleaner than other ways of stating the fundamental lemma. For that
reason, I would propose that we accept the GWZ formula as the
primal form of the fundamental lemma.

The fundamental identity (1) when the functions χ are constant
is expressing that dual abelian varieties have the same volume. When χ is nontrivial, the identity is saying that a nontrivial character integrated over an abelian group is zero. Thus, the GWZ paper exposes the bare structure of
the fundamental lemma as a duality for abelian varieties.

In summary, the GWZ paper gives a new proof of the fundamental lemma,
which bypasses the most difficult parts of the earlier proof and exposes
its essential structure.

Birth of the Non-standard Fundamental Lemma

The fundamental lemma (FL) is a key ingredient in the stabilization of
the Arthur-Selberg trace formula, which is part of the heavy machinery
used in the Langlands program. Ngo Bao Chau received a Fields medal
for his proof of the FL. His proof is an amazing mathematical
achievement.

The FL has come to my attention again, because of a new proof
of the FL posted on the arXiv. An interesting feature of this
proof is the way that non-standard and standard identities become intertwined.

Twisted Endosopy

I became an assistant professor at the University of Chicago in 1990,
just down the hall from Kottwitz. He freely shared with me the early
versions of his research with Shelstad on the foundations of twisted
endoscopy as the manuscripts became available. Through Kottwitz’s
mentoring, I was one of the first to think about the general theory of
twisted endoscopy.

Descent

The paper by Langlands and Shelstad on descent for transfer factors
was the first paper I was ever asked to referee. The paper was
related to my thesis work under Langlands, and I went over
their paper in detail. Descent shows that identities of
orbital integrals can be deduced from identities of orbital integrals
on smaller reductive groups, by taking centralizers.

Combining Twisted Endoscopy with Descent

Clozel invited me to spend three months at Orsay in the fall of 1992.
Computing many twisted centralizers, I found that modulo standard
endoscopy, descent always gave the same conjectural stable identities
relating root systems G2 with G2, F4 with F4, or Bn with Cn,
exchanging root lengths. For GL(4) and GL(5), the
relevant identities are between B2 and C2. Calculations yielded
an elliptic curve for B2 and another
elliptic curve for C2. An isogeny between the elliptic curves
gave the transfer of orbital integrals.

I completed these calculations in time for my lecture in Orsay on
November 17, 1992, where I described the conjectural dualities of
stable orbital integrals exchanging root lengths for the non-simply laced
root systems, as well as the evidence provided by the elliptic curve
isogenies for B2-C2. By my records, that was the first public description of
the non-standard identities.

I repeated these non-standard conjectures in other lectures, notably
in lecture series at IAS in 1993 and 1995. at For example, my notes
for the third lecture at IAS in February 1993 contain the statement,
“Hope: the theory of twisted endoscopy is no more than standard
endoscopy combined with the stable dualities: (G2 ↔ G2), (F4 ↔ F4),
(Bn ↔ Cn), long roots ↔ short roots.” I never pushed these as my
conjectures because I have always viewed them as a direct consequence of
(and the most reasonable route to the proof of) the deeper conjectures
of Langlands, Kottwitz, and Shelstad. Nonetheless, a precise idea was
in place by late 1992.

A Review of the Lean Theorem Prover

Thomas Hales

This review gives a mathematician’s perspective on Lean.

What is great about Lean?

I turned to Lean when I started looking for a system with more foundational strength than HOL-Light, a proof assistant I have been happily using for many years. HOL-Light is great at what it does (such as real and complex analysis), but it falls short in its treatment of big objects, such as the category of all sets.

Both Coq and Lean are based on the logical foundations of CiC (calculus of inductive constructions). Unlike HOL, CiC can readily encode most structures in widespread use in mathematics.

I chose Lean (over Coq and other competitors) for a various reasons.

1. Lean sounds wonderful:  open source, a small trusted kernel, a powerful elaboration engine including a Prolog-like algorithm for type-class resolution, multi-core support, incremental compilation, support for both constructive and classical mathematics, successful projects in homotopy type theory, excellent documentation, and a web browser interface.  Lean source files can be edited in Emacs, Visual Studio Code, or CoCalc.

Lean seems to have learned from the past decade of research into the Coq system.

2. In more detail, a “minimalist and high performance kernel” was an explicit goal of the Lean. Independent implementations of the kernel have been given (Selsam in 2000 lines, etc.) alleviating any concerns about a bug in the C++ implementation of Lean.

3. The semantics of Lean are now completely spelled out (thanks to Mario Carneiro, building on Werner). In particular, Carneiro has built a model of Lean’s logic (CiC with non-cumulative universes) in ZFC set theory (augmented by a countable number of inaccessible cardinals).

4. Lean has a clean syntax. For example, to add two elements in an abelian group, one can simply write x+y and Lean correctly infers the group in which the addition is to be performed. I have more to say about Lean’s syntax later.

5. Lean makes it easy to switch from constructive to classical logic (you just open the classical logic module). Lean makes quotient types easy (unlike Coq, when tends to work with awkward setoids).

6. Lean is its own meta language. I find this very appealing. Contrast this with HOL-Light, which has OCaml as meta-language or Coq which has a domain-specific language Ltac for tactics.

7. Finally, there was a personal reason. Carnegie-Mellon University is a center of Lean library development. I live in Pittsburgh and am a regular participant in CMU’s Lean group meetings.

What’s not so great about Lean?

1. The kernel is bloated. Specifically, from what I hear, for performance reasons, mutually inductive types will soon be moved into the kernel. This destroys the former claims of a minimalist kernel.

2. Lean is not backwards compatible. Lean 3 broke the Lean 2 libraries, and old libraries still haven’t been ported to Lean 3. After nearly 2 years, it doesn’t look like that will ever happen. Instead new libraries are being built at great cost. Lean 4 is guaranteed to break the Lean 3 libraries when it arrives.  In short, Lean is experimental, evolving, and unstable.

3. The learning curve is steep. It is very hard to learn to use Lean proficiently. Are you a graduate student at Stanford, CMU, or Pitt writing a thesis on Lean? Are you a student at Imperial being guided by Kevin Buzzard? If not, Lean might not be for you.

4. Lean is its own metalanguage. Lean is new, and the language libraries are almost nonexistent.  Hardly any major programs have been written in Lean.   Java programmers outnumber Lean programmers by a million to one.  To first approximation, it is impossible to do any serious programming in Lean.

I miss HOL-Light, whose metalanguage is an established programming language (OCaml) with large supporting libraries. Whenever I needed a tactic, I just sat down and wrote it in OCaml. I can’t do this in Lean, and it is extremely frustrating to work in a environment that is not equipped for serious programming.

In fact, I could live with almost all my other criticisms, if I could write my own code in Lean to fix what I don’t like. I have discussed Lean programming issues with experts, and the tools simply are not available.

5. Typing in Lean is nominal rather than structural.  As a mathematician and OCaml programmer, I would have preferred design decisions that favor a structural approach.  Mathematicians are used to working in set theory, and decisions should be made that push the type theory to be more set-like, by making types more interoperable.

6. There are performance issues. It is not clear (to me or perhaps even to anyone) why performance is such a big problem, because Lean was implemented in C++ for the sake of performance. However in fact, the compilation of the math libraries is currently very slow. Something is wrong here.

7. Ugly projection chains are required. Consider the definition of semigroup (a mathematical structure with an associative binary operation)

universe u
class semigroup (α : Type u) extends has_mul α :=
(mul_assoc : ∀ a b c : α, a * b * c = a * (b * c))

This looks simple enough, and it has an appealing syntax that allows the symbol * to be used for multiplication because the class extends has_mul. However, the elaborator expands this into something ugly

@[class, priority 100]
structure semigroup : Type u → Type u
fields:
semigroup.mul : Π {α : Type u} [c : semigroup α], α → α → α
semigroup.mul_assoc : ∀ {α : Type u} [c : semigroup α] (a b c_1 : α),
@eq α
(@has_mul.mul α (@has_mul.mk α (@semigroup.mul α c))
(@has_mul.mul α (@has_mul.mk α (@semigroup.mul α c)) a b) c_1)
(@has_mul.mul α (@has_mul.mk α (@semigroup.mul α c)) a
(@has_mul.mul α (@has_mul.mk α (@semigroup.mul α c)) b c_1))

Every time the multiplication symbol is used, the semigroup is explicitly converted to another structure has_mul, which is responsible for handling the multiplication.

I have intentionally selected a very simple example. Things are much much worse when you look at the ugly expansions of a more complicated mathematical structure (such as an ordered ring).

8. Structure depends on notation. Lean has a library of results about multiplicative groups and a separate library of results about additive groups. The only difference is that one uses the symbol * for the group operation and the other uses the symbol + for the group operation. Mathematician find it absurd that the theorems in group theory have been made to depend on the symbol used for composition.

The problem grows worse in structures that depend on groups. For example, there are four types of products of two groups, because the first factor can be either a +-group or a *-group and the second factor can also be either of the two.

There is an interesting history behind this design flaw.  Originally, type classes were introduced into functional languages to make ad hoc polymorphism less ad hoc. What started as a way to clean up operator overloading notation, ended up welding notation to structures where it is not welcome.

9. No diamonds are allowed. From a programming language point of view, I understand the issues surrounding diamonds. However, for mathematicians, diamonds are extremely natural and they occur in great abundance under many names (pullbacks, fibered products, Cartesian squares, etc.). It seems to me that a system designed for doing mathematics should do more than just declare them illegal.

Read the full discussion that led to the diamond ban, where it argued (unpersuasively) that even though the algebra library used diamonds “in many different places,” the ban would only “be a minor annoyance.” In fact, I find it to be a major impediment. I’m told that I can just give my own implementation of classes with diamonds, but see item 4 above — I effectively have no computer language to use for this.

10. Structures are meaninglessly parameterized from a mathematical perspective. To briefly introduce the topic of parameters and bundling, users choose whether data appears as an external parameter. Here is an example of magma, both in parametric and bundled form

structure magma_with_parameter (α : Type*) :=
(op : α → α → α)
structure magma_bundled :=
(α : Type*)
(op : α → α → α)

I think of the parametric versus bundled variants as analogous to currying or not; are the arguments to a function presented in succession or as a single ordered tuple? However, there is a big difference between currying functions and currying structures. Switching between curried and uncurried functions is cheap, but it is nearly impossible in Lean to curry a structure. That is, what is bundled cannot be later opened up as a parameter. (Going the other direction towards increased bundling of structures is easily achieved with sigma types.) This means that library designers are forced to take a conservative approach and expose as a parameter anything that any user might reasonably want exposed, because once it is bundled, it is not coming back.

Mathematicians tend to be flexible about what information they choose to expose about a structure, according to whatever information is relevant to the given context. They don’t want a predetermined bundling decision that is hard-wired into the libraries.

How are Lean structures meaninglessly parametrized? As an illustration, the type α : Type* of the underlying carrier of the group is passed as a parameter (as in the magma_with_parameter example above). For a mathematician this is entirely meaningless information that is passed as a parameter. Who cares about the particularities of the underlying set of the group? What matters is the group structure. A mathematician says, let G be a group, but never let α be a set and let G be a group constructed on carrier α. Lean requires the user to give both α and G every damn time. Other language designers fight on behalf of the users to save them keystrokes and boilerplate. I expect the same from Lean.

This gets worse as the algebraic structures grow in complexity. You cannot just say, let M be a topological module of a topological ring R. I count 7 different structures that first have to be explicitly declared every time a topological module is introduced in Lean. This is intolerable. I haven’t made the count for tensor products of topological bi-modules, but you get the idea. And in case you are wondering, yes, mathematicians do  take tensor products of topological bi-modules.

(Lean does it this way for a reason: it uses the exposed parameter α as an index for type-class resolution, as a cheap way to avoid diamonds, etc. But why should mathematicians be forced to wander into Lean internals?)

11. Lean discards valuable information that is later reconstructed (at a cost) by its type class resolution engine.

For example, we write G:group α in Lean to say that G is a group with carrier α. Then, when we write the associative law, ∀ (x y z : α), x * y * z = x * (y * z), valuable information is lost because all typing information refers to the carrier α instead of the group G, so Lean must hunt around for a group G associated with the raw type α.

If there are two groups on the same underlying carrier (which happens for example with semi-direct products), then Lean gives unpredictable results. There is a way to set priorities here, but that is not advised, because these settings have global scope.

The Mizar approach seems saner. We write G:group (bundled in Mizar style) and

∀ (x y z : G), x * y * z = x * (y * z).

The difference with Lean might appear small, but now the statement has not discarded the group G and we do not have to hunt for it. Since G is not a type, z : G coerces to z : G.carrier, or more verbosely, z : group.carrier G, and the group is still visible.