Bertrand Russell Prize of the AMS

(The following interview originally appeared in the Line  about the Bertrand Russell Prize of the AMS)

Professor Thomas C. Hales of the University of Pittsburgh has established the Bertrand Russell Prize of the AMS. The prize looks beyond the confines of the profession to research or service contributions of mathematicians or related professionals that promote good in the world. Prof Hales talked about his inspiration for the prize.

1. What about Bertrand Russell inspired you to endow a prize in his name?

Russell wrote that in his youth, “mathematics was my chief interest and, and my chief source of happiness.” Yet, his legacy extends far beyond mathematics. On the whole, Russell’s writings promote rational analysis in pursuit of humanitarian ends. The prize has related aims.

2. If you were to award a prize to Bertrand Russell, which of his achievements would you recognize?

I would award Russell a prize for his efforts during the Cold War to warn the world of the dangers of nuclear catastrophe.

The Russell-Einstein manifesto, which was published in 1955, led to the founding of the Pugwash conference, an organization devoted to the reduction of nuclear weapons and to the social responsibility of scientists.

In 1995, the Nobel Peace Prize was shared by Joseph Rotblat (one of the 11 signatories of the Russell-Einstein manifesto) and the Pugwash conference for their work.

3. In what ways have your mathematics intersected those of Russell’s?

Bertrand Russell is the father of type theory.  Historically, there were two resolutions of Russell’s paradox: axiomatic set theory and the theory of types. There is a direct chain of influence from Russell’s type theory, to Alonzo Church’s simply-typed lambda calculus, to Dana Scott’s logic of computable functions (LCF), to higher-order logic (HOL). The formal proof of theorems in HOL is one of my primary research interests.

4. What is an example from the past of a person, a mathematical achievement or application you feel might warrant this prize?

I will pick an old example that has not lost its relevance.

Norbert Wiener, the creator of cybernetics and the mathematical theory of communication (with Shannon), campaigned for the ethical use of these technologies. With remarkable prescience, Norbert Wiener educated the public about the social disruptions (such as widespread unemployment) that would result from automation, unless mitigated through sound public policy.

More about the Prize

Prize Description:  This prize looks beyond the confines of our profession to research or service contributions of mathematicians or related professionals to promoting good in the world.  It recognizes the various ways that mathematics furthers fundamental human values.

Prize Details:  The $5,000 prize will be awarded every three years.

About this Prize:  The mission of the AMS includes

  1. promoting the uses of mathematical research,
  2. advancing the status of the profession of mathematics, and
  3. fostering an awareness and appreciation of mathematics and its connections to other disciplines and everyday life.

This prize, proposed and funded by Thomas Hales, is designed to promote these goals.  Mathematical contributions that further world health, our understanding of climate change, digital privacy, or education in developing countries, are some examples of the type of work that might be considered for the prize.

Nominate a Candidate



Bad Packings

Thomas C. Hales (Pittsburgh)

[This blog post is based on a colloquium I gave in the Pitt math department on February 3, 2017. The arXiv paper (1703.01352) giving technical details appeared March 7, 2017.]

Some shapes fill the plane more efficiently than others.

For example, some shapes tile the plane, so that there is no unoccupied space. Shapes that tile include the equilateral triangle, the square, the parallelogram, and the regular hexagon. Fish also tile the plane, as shown by Escher.

Shapes that do not tile the plane include circles and pentagons. It is known that the optimal density of a circle packing in the plane is \pi/\sqrt{12}\approx 0.9069. The pentagon packing problem was solved by Kusner and me in 2016. The optimal density is about 0.921.

When studying optimal packing densities, we exclude non-convex disks because otherwise we could hollow out the interior of the disks so that its best packing has arbitrarily small density. For example, hollowed out squares have low packing density.


Central Symmetry

A region in the plane is centrally symmetric if it is the same when rotated by 180 degrees. A camel is not centrally symmetric because we can tell which camel is upside down. Similarly,  triangles and generic quadrilaterals are not centrally symmetric.

Circles and parallelograms are centrally symmetric. Escher’s hands give an algorithm for drawing centrally symmetric figures.

Enter Reinhardt

This blog post is about the problem of bad packings: what centrally symmetric convex disk in the plane has the property that its densest packing is the worst? Reinhardt asked this question and conjectured an answer in 1934. The conjecture is still open.

The problem can be described as follows. A contract stipulates that a miser must make a payment with a tray of identical gold coins filling the tray as densely as possible. The contract requires the coins to be convex and centrally symmetric. What shape coin should the miser choose to part with as little gold as possible?

The answer is not a pentagon for two reasons: its best packing beats the circle, and because the pentagon is not centrally symmetric.

Tiling with hexagons

Centrally symmetric hexagons are obtained by cutting two opposite corners from a parallelogram.  Centrally symmetric hexagons always tile the plane.


Fejes Toth found the best way to pack any centrally symmetric disk: put it in the smallest possible centrally symmetric hexagon, then tile the plane with the hexagons. (In degenerate cases, the hexagon might become a parallelogram.)  Reinhardt himself only considered lattice packings when he formulated his problem, but because of Fejes Toth’s result, the non-lattice and lattice densities are the same, so this does not matter.

The density is the ratio of the area of the disk to the area of the hexagon. This ratio is not affected by rescaling both the disk and the hexagon by the same factor. To normalize the problem, we rescale the hexagons so that they have area \sqrt{12}. This is convenient because it is the area of the smallest centrally symmetric hexagons that contain a circle of radius 1.

We see that Fejes Toth’s strategy works for circles: we obtain the densest packing of circles by fitting each circle as tightly as possible into a hexagon, then tiling the hexagons.

Could the answer to the Reinhardt problem be long and skinny, like the ellipses shown below? A simple argument allows us to restrict to disks that are close to circular. An invertible linear transformation of the plane does not change the ratio of areas (and hence the density). A linear transformation can be applied to make answer to the Reinhardt problem nearly round. In fact, by linear transformations, without loss of generality the boundary of the disk solving the Reinhardt problem passes through the six vertices of a regular hexagon.


Clipping Squares down to Octagons

Here is how Reinhardt might have discovered his conjecture. One way to search for the worst disk is to start with a square and start cutting away unneeded portions (in a centrally symmetric way). Let’s cut away two opposite corners. This doesn’t help, because a square with two clipped corners is a centrally symmetric hexagon and still tiles.

So let’s cut away all four corners of a square. As we cut away more and more from the corners,  its best packing density becomes worse and worse. Except if we cut away too much, we get smaller square and we are back where we started. So let’s stop midway at an octagon. Small further improvements are possible if you make tiny clips at the corners to smooth out the corners of the octagon.

Reinhardt’s conjecture (1934): The smoothed octagon is the worst centrally symmetric shape for packing.

This conjecture is still open.

Reinhardt made an important observation. A shape can be clipped further (and so is not the worst) unless a centrally symmetric hexagon of smallest area passes through each point of the boundary. Reinhardt conjectured that the optimal smoothing of the corners is by arcs of hyperbolas.


This means that the solution to the Reinhardt problem, whatever it is, is hexagonally symmetric in this sense: each point on the boundary can be associated with five other points, for a total of six points.  If we take any of these sets of six points and draw the tangents to the curve at these six points, we get a centrally symmetric hexagon, and its area does not depend on the six points.  We can imagine this as an Escher print with six hands in a ring drawing the figure.

The smoothed octagon has this hexagonal symmetry. (Hexagonal symmetry is a surprising property of an octagon.)  To draw the smoothed octagon with six hands, at any moment, two opposite hands are drawing rounded corners and the remaining four hands are drawing straight edges.    Greg Egan has produced some spectacular animated graphics of this.  The density of the packing does not change as the smoothed octagons roll.


The circles roll in a circle packing in the similar way as the smoothed octagons roll in a packing, except that as the circles roll, there is no visual change in the packing because of their symmetry.

Digression: Ulam conjecture

We can ask the same question in three dimensions: what is the worst shape for packing?

Ulam conjectured that in three dimensions the sphere is the worst shape for packing. Nobody knows if this is true (but some recent progress has been made). The conjecture first became known through a column by Martin Gardner.

Why is the Ulam problem hard? You first have to solve the sphere packing problem to know the density of sphere packings. Then you have to show everything else is worse.  In that sense, the Reinhardt/Ulam problem is much harder than any packing problem.

The sphere packing problem has only been solved in dimensions 2,3,8,24. (For more about packings in 8 and 24 dimensions, see Cohn’s article.)

Some non-Euclidean geometry

(I’m going to make a few digressions, but we will eventually tie up the loose ends.)

In his Elements, Euclid proposed the parallel postulate for the geometry of the plane: given a line L and a point not on that line, there exists a unique line through the point that is parallel to L.

Students of Euclid tried for centuries to give a proof of the parallel postulate. Finally, mathematicians started to believe that no proof of the parallel postulate is possible.

To prove that no proof of the parallel postulate is possible, it was necessary to construct a model of non-Euclidean geometry. In these models, all of the fundamental postulates hold except for the parallel postulate, which fails. In these models, the meaning of primitive terms such as point and line is changed.

Eventually several models of non-Euclidean geometry were found.

The first was the upper-half plane. A second model is the open unit disk. A third model is a branch of the hyperboloid.  As it turns out, all three of these models are useful for the Reinhardt problem.

Dubins car problem

Here is a seemingly unrelated problem in the plane that will turn out to be the key to understanding the Reinhardt conjecture.

The Dubins problem is to take a car that is at a certain position and direction in the plane and navigate it to an ending position and direction in such a way that the absolute value of the curvature of the path is at most 1, and to find the shortest such path.  (See the illustration below.)

In 1957, Dubins showed that the shortest path always consists of straight segments and segments of maximum curvature. According to Wikipedia,

The Dubins path is commonly used in the fields of robotics and control theory as a way to plan paths for wheeled robots, airplanes and underwater vehicles.

Let us juxtapose a solution to the Dubins car problem with its straight edges and circular arcs and a few segments of a smoothed polygon (with its straight edges and hyperbolic-arc-rounded corners). See any similarity?


Dubins car solutions have straight segments and segments of maximum curvature.

Reinhardt’s smoothed octagon has straight segments and segments of maximum curvature, subject to the hexagonal symmetry constraints. (It is an easy fact that the segments of maximum curvature subject to hexagonal symmetry constraints are exactly arcs of hyperbolas.)  We can think of Reinhardt’s problem as a variant of Dubins car problem, where hexagonal symmetry is imposed and the steering wheel turns only to the left.

We now have a well-defined research program to resolve the Reinhardt conjecture: solve the Reinhardt conjecture in the same way that the Dubins car problem was solved.

I’m willing to bet that this 1934 conjecture will soon be solved because of this connection with the Dubins car problem.

Here are some features of the Dubins car problem (and now features of the Reinhardt problem).

  • It is a problem in optimal control theory. Until recently noticing the analogy with Dubins, we did not even know that the Reinhardt problem is an optimal control problem.  It was misclassified for decades as a problem in discrete geometry, which impeded progress towards a solution.
  • It is a bang-bang solution.
  • It is studied using Pontryagin’s maximum principle (PMP). In particular, the smoothed octagon is a Pontryagin extremal.

I will explain each of these points in turn.

A Problem in Optimal Control Theory

An optimal control problem has the form of minimizing the cost of x, where x is a solution to a differential equation that depends on a control function u.  The problem of optimal control is to pick the control function u, such that after solving the differential equation for x with given boundary conditions, the cost of x is minimized.

To make this concrete, here is an example of a control problem that we take from non-Euclidean geometry.

The solution x to the differential equation will be a path in the hyperboloid model of non-Euclidean geometry.  We take the cost of x to be the number we get by integrating the height function on the hyperboloid along the path x.

We take the control function u:[0,T]\to U to take values in an equilateral triangle (shown here with a control stick).  The differential equation is shown, and the coefficients (a,b,c) depend on the control u.


Here is the connection between the Reinhardt problem and non-Euclidean geometry.

Theorem: The solution to this optimal control problem in non-Euclidean geometry gives the solution of the Reinhardt problem. 

The connection between non-Euclidean geometry and bad packings comes through the group SL(2,R) which acts on both the Euclidean and non-Euclidean planes.

In particular (and quite remarkably) the integral of the height function on the hyperboloid is equal to the area of the convex disk D in the plane (up to a fixed constant 1.5).

Also,  the motion along the path x gives the animation of the rolling disks D (in Egan-style animations).

Bang-bang controls

Suppose U is a convex set. A bang-bang control u:[0,T]\to U is one that take values in the set of extreme points of U. A feature of many optimal control problems is that the control that minimizes cost is a bang-bang control.

Here are some standard examples of bang-bang solutions.

Parking a car in a garage in minimal time

The optimal control problem is to park a car in a garage in minimal time.  The car is directly in front of the garage so no turning is needed.  There is a single control variable with range [-1,1] for acceleration (negative values for the break pedal and positive values for the gas pedal).  The optimal solution is bang-bang: to park a car as fast as possible, floor the gas pedal, driving at reckless speed towards the garage. Then just at the moment when  crashing into the back wall of the garage seems inevitable, slam on the breaks, coming to a screeching halt just as the front bumper touches the wall.

Think of bang-bang control as the extreme stuff of Hollywood movies: the bomb that is deactivated just as the timer is reaching zero; the getaway car that evades police on its tail by flying across the train track the instant before the long freight train blocks all further traffic; and the lover who professes love and saves the imperiled relationship when it is all but too late.  (In fact, Bang-Bang is a film title.)

Dubins Car Problem

The circular segments in the Dubins car problem are given by bang-bang controls: the signed curvature is as large or as small as possible.  (However,  The straight-edge segments in Dubins care problem are not bang-bang.  They are called singular arcs.)

Optimal investment strategies

Think of bang-bang optimal financial strategies as the Bill Gates lifetime investment strategy.   Bill Gates the Microsoft CEO ruthlessly builds a corporate empire. Then bang, positions change, and Bill and Melinda Gates the philanthropists plan to donate 95 percent of their wealth to charity.

Many financial investment strategies come as solutions to control problems, jumping from one extreme position to another in an optimal way, according to a bang-bang control.

The smoothed octagon has a bang-bang control

The smoothed octagon has a bang-bang control with finitely many switches.  The smoothed octagon is divided into segments, each of which is  as straight as possible or as curved as possible, with abrupt transitions between the segments.  In terms of the control triangle U described above, the control remains at one vertex for a time, then suddenly jumps to another vertex, and so on.

Enter Pontryagin

Pontryagin gave some necessary conditions that the optimal solution to a control problem must have.

Pontryagin’s conditions are similar to the method of Lagrange multipliers taught to calculus students.  The typical problem is to minimize a function f of x subject to constraints G(x)=0. Lagrange tells us to write \lambda_0 f + \lambda\cdot G with Lagrange multipliers: (\lambda_0,\lambda).  (Generally, \lambda_0=1 and is omitted, but for optimal control which we turn to next, it cannot be omitted.)

The approach is to calculate all critical points and to pick the best. The computation of the multipliers is part of calculating the critical points.

The same idea applies to optimal control, with some small adaptations.  The function f to be minimized is called the cost.  Now, x is a path rather than a point in Euclidean space, and the constraints G(x)=0 are differential equations.  No matter.  Now the multiplier \lambda is also a path rather than a point in Euclidean space, and is itself a solution to a differential equation.

The multiplier \lambda takes values in the cotangent bundle T^*M of M=SL(2) x H, where H is the hyperboloid model.   Cotangent bundles of manifolds are examples of symplectic manifolds, and we can specify the differential equation that \lambda satisfies by giving a real-valued function h on T^*M, called the Hamiltonian.  (Recall that a function h on  has a differential dh, which becomes a vector field on T^*M using the canonical symplectic form.  The integral curves \lambda of that vector field are the solutions to the differential equation.)

Pontryagin maximum principle is a list of  necessary conditions for optimality.  These conditions include an explicit Hamiltonian, which is used to describe the differential equation \lambda must satisfy.  We call \lambda a Pontryagin extremal if it satisfies all the necessary conditions enumerated in the Pontryagin maximum principle.

Theorem: The smoothed octagon is a Pontryagin extremal.

Theorem: The smoothed octagon is a (micro) local minimum.

By micro-local, I mean that the neighborhoods we consider are in the cotangent bundle.

The differential equations are solved exactly and explicitly in Mathematica as part of the proofs. We have used a second derivative test of local minimality.  Briefly put, interval arithmetic was used on a computer to show that the second derivative is positive.

There are in fact infinitely many Pontryagin extrema (smoothed 6k+2-gons) converging to the circle of radius 1. The circle is also a Pontryagin extremal trajectory (a singular arc).  A switch in a bang-bang solution is defined as the jump that the control function u makes from one extreme point of the control space to another.  The the number of switches in these smoothed polygons tends to infinity with k.

If, for example, we could prove that the smoothed 6k+2-gons are the only extrema, (and this seems plausible to me), then that would complete the proof of the Reinhardt conjecture.

There also appear to be infinitely many Pontryagin extrema for local cost maximization (smoothed 6k+4-gons), also converging to the circle. (The  4-gon is degenerate and has corners.)

We graph the cost of the known extrema as a function of the number of straight edges in the smoothed polygon.  The smoothed octagon is the deep dip at n=8.



I have done many calculations in the paper that I have not described in this blog post.

Here is a summary of our main points.

  • In 1934 Reinhardt proposed that the smoothed octagon is the worst packer among centrally symmetric convex disks.
  • The Reinhardt problem can be formulated and studied as a problem in optimal control.
  • In this formulation, the problem is to minimize the total height of paths in the hyperboloid model of non-Euclidean geometry.
  • The shape of the smoothed octagon is completely explained by a bang-bang control.
  • The Reinhardt problem is very similar to other problems in optimal control theory that have been solved (such as Dubins car).
  • The smoothed octagon is a Pontryagin extremal trajectory and a micro-local minimizer.
  • The solution has been confined to a 5-dimensional manifold (which can be searched numerically).
  • I’m willing to wager that the Reinhardt problem will be solved using optimal control theory and that this will happen soon!


Our indebtedness to Escher is apparent.  The spectacular animated graphics of smoothed octagons were developed by Greg Egan.  I have used graphics from a wide varieties of sources.  We acknowledge earlier funding from NSF grant 1104102.

Aristotle’s Tetrahedral Blunder

Aristotle falsely believed that congruent regular tetrahedra tile space:

Among surfaces it is agreed that there are three figures which fill the place that contain them–the triangle, the square and the hexagon: among solids only two, the pyramid and the cube. -De Caelo

The tetrahelix does not extend to a tiling.

(Here the pyramid means the regular tetrahedron.) Even more remarkable than Aristotle’s false belief is that “it took 1800 years for Aristotle’s error to be resolved,” according to the fascinating history documented by Lagarias and Zong. In partial defense of Aristotle, they write that “of course, at the time of Aristotle, methods of geometric measurement and computation were more limited and computers were not available!”

Here we prove that tetrahedra do not tile space, using only math that Aristotle might have understood. This means it took 1800 years more than it should have to set the record straight. Aristotle’s referees should have caught his error before publication!

We limit our proof to the case when the tetrahedra are arranged face-to-face, but a small adaptation of our argument, which we leave to the reader, shows that no tiling is possible even without the face-to-face constraint.

Let us assume for a contradiction that congruent tetrahedra tile space. By placing tetrahedra face-to-face, they are also edge-to-edge and vertex-to-vertex. All the tetrahedra around a fixed vertex V form a Platonic solid with n equilateral triangular faces, where n is the number of tetrahedra meeting at the center V. There are only three Platonic solids with triangular faces: the tetrahedron itself, the octahedron, and the icosahedron. We can immediately rule out the tetrahedron and the octahedron because the angle from the center V to two adjacent vertices of the solid is not an acute angle. (The angle would have to be 60 degrees if part of a tetrahedron tiling). Thus, the tiling must consist of n=20 tetrahedra around the center V, forming an icosahedron.

a diameter of the icosahedron and one face of a tetrahedron around V

To complete the proof we rule out the icosahedron. Consider a plane containing the diameter (the black vertical line) of the icosahedron, and one triangular face as shown. If we let each edge be 1 unit in length, then the diameter is 2, and the vertex V’ has height 1/2 above the center. By symmetry, the icosahedron has five vertices at height 1/2 and five vertices at height -1/2 below center (as shown below).  The distance between the planes at heights +1/2 and -1/2 is 1.  There is a unique point (the orthogonal projection, shown as a blue dot in the figure to the left) on the plane of height -1/2 at distance 1 from V’. This leads immediately to a contradiction: following the two edges on the tetrahedron from V’ to V1 and V2, we find that there are at least two such points!

In the icosahedron, the vertex V’ at height 1/2 cannot have distance 1 from two distinct points V1, V2 at height -1/2.

Acknowledgement: The tetrahelix image is from a post by Brian Hayes

Elliptic Curve Addition Without Tears

[This post is based on a recent preprint of mine about addition for Edwards elliptic curves, building on work by Friedl, Bernstein, Lange, and their collaborators.]

Every math or cryptography student should know two fundamental facts about elliptic curves.

Fundamental Fact 1. The addition rule for an elliptic curve is exactly the same as the addition rule for the circle.

The addition rules are not merely similar.  They are exactly the same rule applied to different curves.

The addition of angles was already a familiar notion in Euclid, in history extending over centuries.  Today, there are many ways to describe the addition of angles of a circle: including trigonometry and multiplication of complex numbers.   Many of these descriptions are unsatisfactory, because they somehow presuppose that the circle is round, and are therefore special to the circle.

The description that I am about to give of addition of angles on the circle is one that I have never encountered before, but because it is so elementary, this construction must be well-known, even if I cannot find a reference.

circle addition
The red plus blue dots equal the purple dot.

It is best illustrated.  The red dot plus the blue dot equals the purple dot.  The sum of the red, blue, and green dots is the identity (the point at 3 o’clock on the circle).   The black dot is fixed at 9 o’clock.  The moving curve that passes through the red, blue, green, and black dots is a rectangular hyperbola, with asymptotes parallel to the coordinate axes.   The moving cross is the center of the hyperbola.  When the blue and black dots collide, the hyperbola becomes a pair of lines and the blue and black dots move to a different branch. The colored bars show the angles of the blue, red, and purple dots, confirming that red plus blue equals purple.

This is not some new addition rule on the circle.  It is the familiar clock arithmetic used to show that 2am comes four hours after 10pm (except that mathematicians add counterclockwise starting at 3 o’clock on the dial).  We just normally leave out the hyperbola.

Simple algebra shows that this construction works.    To continue, we need to set aside all other descriptions of angle addition, and accept this construction as the one true addition rule.  We apply this addition rule to an elliptic curve.  Elliptic curves can be represented in many forms.  The most usual form is called the Weierstrass form of the curve, but we prefer Edwards’s form of elliptic curves, which look like warped circles in the figures below.

Edwards curve addition
The red plus blue dots equal the purple dot

We use exactly the same addition rule for these Edwards elliptic curves (the warped circle): the red dot plus the blue dot equals the purple dot. The sum of the red, blue, and green dots is the identity (the point at 3 o’clock).  The moving curve that passes through the red, blue, green, and black dots is a rectangular hyperbola, with asymptotes parallel to the coordinate axes, as before. To add two points (red and blue), fit them to a hyperbola, and use the hyperbola to determine the green dot, which is paired with the answer (in purple).

A deformation from the circle to elliptic curves

That is all there is to it!  We have described the addition rule on elliptic curves.   It looks strange until we recognize that this is secretly how addition works for the circle too.  We illustrate the circle and elliptic curves being deformed into each other to see that the same addition rule applies to both.  Our proof of the associative law works without case splits, uniformly for the circle and elliptic curves.  We take this up in the next section.

Fundamental Fact 2. No advanced mathematics is required to prove that addition on the elliptic curve is associative.

Working with Edwards curves, the associative law can be proved using little more than polynomial division.

Typically, proofs of the associative law for elliptic curves use one or more of the following tools: intersection multiplicities, Bezout’s theorem, divisors, the Picard group, the Riemann-Roch theorem, and the Weierstrass P-function. These tools are all completely unnecessary.

The proof of associativity is given by a short  Mathematica calculation. (I used Mathematica, because that is what I use, but other computer algebra systems should work equally well.)  First, we write an explicit formula for the Edwards curve and take three points on the curve:

e[x_-,y_-]:= x^2 + y^2 - 1 - d x^2 y^2;~~ e_1 = e[x_1,y_1];~~ e_2 = e[x_2,y_2];~~ e_3 = e[x_3,y_3].

Then we create an explicit formula for addition (\oplus) using our description with the hyperbola.  The formula for addition is a rational function (ratio of polynomials) of \{x_1,y_1\} and \{x_2,y_2\} (using Mathematica’s curly braces for ordered pairs). We test for failure of the associative law by setting

\{g_1,g_2\} = (\{x_1,y_1\}\oplus\{x_2,y_2\})\oplus \{x_3,y_3\} - \{x_1,y_1\}\oplus (\{x_2,y_2\}\oplus \{x_3,y_3\})

We check associativity with the Mathematica command:


This command uses a naive polynomial division algorithm to show that each g_i is a combination: g_i = r_1 e_1 + r_2 e_2 + r_3 e_3.  Thus, if the three points \{x_i,y_i\} lie on the Edwards curve, then e_1 =e_2=e_3=0, so also g_i = 0.  Since the terms g_i are zero, which occurs exactly when associativity holds, we have proved the associative law for affine Edwards curves.

We might worry that the denominators that appear in the formulas for the addition rule or in g_i might sometimes be zero, causing exceptions that we are glossing over.  However, the beauty of affine Edwards curves is that even though formulas have denominators, these denominators never vanish (on the rational points of the Edwards curve) provided the constant d appearing in the formula for the Edwards curve is not a nonzero square.  (This fact can also be easily checked using Mathematica’s PolynomialReduce.)  As a bonus, these lines of Mathematica code also cover the case of a circle (d=0) as a special case. In summary, we can really verify the associative law with a few simple lines of Mathematica!

Projective Curves

Inversion swaps one red dot with the other

The preprint also handles the case of projective Edwards curves (which corresponds to d a nonzero square).    An automorphism of the curve, called inversion, swaps one red dot with the other.  Inversion sends the four blue dots to the four points at infinity (which can be identified with the four dashed asymptotes).   The projective curve is covered by two copies of the affine Edwards curve that are glued together using this inversion.   We conveniently represent points on the projective curve as pairs (P,i), where P is a point on copy i of the affine curve. The formula for inversion is simply

(P,i) \mapsto (P,i+1 \mod 2).

The addition rule on the projective curve takes a very simple form

(P,i)+(Q,j) = (P\oplus Q,i+j \mod 2),

where P\oplus Q is addition on the affine curve, using the hyperbola as before. Using this formula to reduce verifications to the affine chart, the general associativity proof (in characteristic different from two) can again be verified by simple polynomial division in Mathematica.

Acknowledgements and References

This work was motivated by the need for a simple proof of the group axioms on elliptic curves for an undergraduate course in cryptography I was teaching.  Edwards curves are known to avoid the exceptional cases that Weierstrass curves encounter, so it was natural to ask whether the proof of the associative law for Edwards curves also avoids exceptional cases.

We have interchanged the x and y coordinates on the Edwards curve to make the addition rule compatible with addition on the circle.

Friedl was the first to use a computer algebra system to prove the associative law of addition for elliptic curves.  He worked with the Weierstrass form of the curve, which needlessly complicates the calculations.

The properties of Edwards curves have been extensively documented in a series of papers by Bernstein, Lange, et al.  We have simplified some of their formulas.   Bernstein and Lange motivated elliptic curves addition by analogy with the circle in a presentation to the 31st CCC in 2014.  Arene, Lange, et al. were the first to introduce hyperbolic addition on elliptic curves. I found that the same rule works for the circle after reading their article.

Pentagon Packings

by Thomas Hales and Wöden Kusner


The pentagonal ice-ray is a dense packing.

A new preprint of ours proves the pentagonal ice-ray gives a densest packing of congruent regular pentagons in the plane.  It is a computer-assisted proof that will potentially take referees months to check.  Before describing the ideas of the proof, we give some history.

Pentagon packings in history.

Pyramid of Unas

Patterns of five-pointed stars appear in the Egyptian pyramid of Unas (Fifth dynasty, 24th century BCE).  With a little imagination, we can enclose each star in a pentagon to obtain a periodic packing of regular pentagons.

Dürer’s packing
Shifted layers in Dürer’s packing.

In 1525, Albrecht Dürer described some pentagon arrangements, including one that is relevant to our work. Variants of Dürer’s packing can be obtained by translating its layers. Later, Kepler produced some pentagon arrangements, but they are of limited interest to us because of their low density.

Chinese window lattices

Daniel Sheets Dye, who lived in Chengdu China, spent decades of his life collecting and cataloguing Chinese lattice designs. (Here we mean lattice in an architectural rather than mathematical sense.) Lattice designs were once a pervasive part of Chinese architecture, particularly in Sichuan province, reaching an artistic pinnacle around 1900 then fading somewhat from the urban landscape.

One type of lattice designs was called the ice-ray by Chinese artisans, a name possibly derived from patterns of cracks in sheets of ice.  The pentagon packing of optimal density (shown at top) is an ice-ray design that was known to the Chinese. It first appeared in Chinese lattice work in Chengdu around 1900. Dye’s book of 1937 calls the design a masterpiece. Writing of the symmetrical ice-rays as a whole, Dye observes that

In many ways it is the finest, richest, and most chaste group of Chinese lattice we have seen. -Dye 1937

penrose tiling
A Penrose tiling
The blue dome of Maragha

Interest in pentagon packings exploded in the 1970s with Penrose tilings. Retrospectively, pentagonal patterns in the twelfth-century blue dome of Maragha inspired new aperiodic pentagonal patterns. Schechtman et al. observed in 1982 and published in 1984 the first evidence of quasicrystals, which were soon connected with Penrose tilings. The physics of pentagons continues to be explored, including the self-assembly of pentagonal platelets. The 2011 announcement of Schechtman’s Nobel Prize in Chemistry noted that

An important discovery that helped pave the way for the understanding of the discovery of quasicrystals was the construction and analysis by Penrose of his famous pentagonal tiling. – 2011

Directly inspired by Schechtman’s quasicrystals and Penrose tilings, Henley published an article on pentagon packings in 1986 that includes both Dürer’s packing and the pentagonal ice-ray. Referring to the pentagonal ice-ray, which he calls 8(c), he writes

The packing in 8(c) seems to be the closest packing of pentagons. Indeed in a gas of hard pentagons which was slowly frozen, we find a large single domain of Fig. 8(c) packing. -Henley

Based on the Chinese name for the packing, we call this the pentagonal ice-ray conjecture. Our research proves the conjecture.

durer to ice rayIn 1990, Kuperberg and Kuperberg made a general study of high density packings of convex bodies in the plane. They showed that any convex body can be placed in a one-parameter family of double lattice packings and proved that this family always includes the densest of all double-lattice packings. The Kuperberg family of pentagon packings undulates between the shifting layers of Dürer’s packing and the pentagonal ice-ray. Based on these results, they too made the pentagonal ice-ray conjecture. The Kuperberg bathroom floor is a pentagonal ice-ray.

Proof Strategy: Delaunay triangles

A Delaunay triangle in the optimal circle packing

Our proof of the pentagonal ice-ray conjecture is a modification of Fejes Tóth’s proof of the circle-packing problem in the plane. The circle-packing problem is to show that the hexagonal circle packing has the greatest density. For purposes of studying packings of greatest density, we can assume that the packings are saturated, meaning that there is no room to add further circles. The centers of the circles in any circle packing are the vertices of a triangulation called the Delaunay triangulation.

The equilateral triangle minimizes area among triangles with fixed lower edge and upper vertex in the blue region.

Smaller triangles have higher circle density. The optimality of the hexagonal circle-packing follows if we show that the area of each triangle in an arbitrary saturated packing is at least that of the equilateral triangles in the hexagonal circle-packing. If we normalize the radius of each circle to be 1, then the edges of each triangle will have length at least 2. A minimizing triangle has an edge of length exactly 2, which we place along the x-axis. The other two sides have length at least 2 and the entire triangle has circumradius at most 2. (If the triangle has circumradius bigger than 2, there is room to pack an additional circle, which is contrary to the saturation condition.) This constrains the third vertex of the triangle to lie in the blue region. There are three points in this region that minimize the area of the triangle, and all three triangles have area equal to those in the hexagonal packing. One of the three triangles is equilateral and the other two are obtuse with angles 30-30-120. This completes Fejes Tóth’s proof.

Our strategy is to adapt Fejes Tóth’s proof to the pentagonal ice-ray conjecture. We take a saturated packing of pentagons and form the Delaunay triangulation with vertices at the centers of the pentagons.

Some Delaunay triangles in the Dürer packing are smaller than triangles in the pentagonal ice-ray.

Almost immediately we see that the proof falls apart when applied to pentagons. In contrast with Fejes Tóth’s proof for circle packings, the Delaunay triangles in the pentagonal ice-ray do not minimize area. In fact, if we track the area of the smallest triangle as we undulate through the Kuperberg family of packings, it turns out that the pentagonal ice-ray maximizes the area of the smallest triangle, which is exactly the opposite of what we would hope for. The minimum occurs for a packing very close to the Dürer packing. (We conjecture that the triangle that minimizes area among the Kuperberg double lattices is the triangle that minimizes area among all acute Delaunay triangles in any pentagon packing.)

This problem is easily fixed by pairing up Delaunay triangles along their longest edges. We call these pairs of Delaunay triangles dimers.   Wöden Kusner started serious calculations on pentagon packings in early 2013.  His 2014 thesis contains a proof of the local optimality of the dimer in the pentagonal  ice-ray among all pentagon packings.

Obtuse Delaunay triangle of small area

Yet, this is not the end of our headaches. Obtuse Delaunay triangles can have alarmingly small area. To handle this problem, again we combine triangles into larger clusters and show that on average these clusters have larger area than the ice-ray triangles.  In the end, we need to consider clusters containing at most four Delaunay triangles.



Triangle with large knob
Triangle with small knob

Another strategy is to modify the areas of triangles slightly by adding a knob along the longest edge of some triangles and cutting a corresponding hole in the adjacent triangle, like pieces from a jigsaw puzzle. Our knobs come in two sizes of areas about 0.008 and 0.055 as shown. The larger knob is barely large enough to make the Dürer triangle with one knob slightly larger than the ice-ray triangle with no knobs or holes. Our rules permit a triangle to have at most one knob and at most three holes (one on each edge).

We prove the pentagonal ice-ray conjecture by showing that the average area of each cluster of triangles (including their knobs and holes) is at least the area of the ice-ray triangle (with no knobs or holes).

Why Pentagons?

The most natural packing problem is perhaps the problem of determining the densest packing of spheres in n-dimensional Euclidean space. The two dimensional case (circles in the plane) is easy. The three dimensional problem is hard, and the problem in 8 and 24 dimensions is remarkably deep. Outside those dimensions, the sphere packing problem is still unsolved.

After spheres, the most natural shape to consider is regular polygons in the plane. Equilateral triangles tile, so there is nothing to do here. Squares and regular hexagons also tile. The regular polygon with the fewest edges that does not tile is the pentagon. This was a natural place for us to start. (Inspired by Beth Chen’s work, we played with the problem of regular tetrahedra, but quickly retreated to pentagons.)

The next unsolved case is seven edges: regular heptagons. The local optimality of the corresponding Kuperberg double lattice has been established. Kallus conjectures that among all convex disks in the plane, the heptagon is the shape whose densest packing has lowest density.  David Mount has shown that the densest double-lattice packing of a given convex polygon can be computed in linear time in the number of edges.

Computer Calculations

In the end, the solution of the pentagon-packing problem is a hefty 60 pages of text and 5000 lines of bespoke OCaml computer code that takes about 60 hours to run on a laptop computer. We use a wonderful interval arithmetic package to control for computer roundoff errors.  The proof is computer-assisted, but is not formally verified in a proof assistant.

We use automatic differentiation to prove the local optimality of the pentagonal ice-ray. First derivative calculations show that the area of dimers decrease as they are deformed into the Kuperberg family of dimers. Second derivative calculations show among dimers in the Kuperberg family, the ice-ray dimer has least area.

One novelty is our use of meet-in-the middle algorithms to simplify the computer calculations of cluster areas. Cryptography has had a direct influence on our work, through its frequent use of meet-in-the middle algorithms (MITM) algorithms. In particular, in 2015 Hales attended a lecture by Adi Shamir at the Royal Society that described applications of MITM outside cryptography (such as the Rubik’s cube). In our approach, information from each triangle is stored separately in a hashtable, and the condition that two adjacent triangles fit together is treated as a collision in the hashtables.

Primate Privacy

Thomas Hales (Pittsburgh)

Governments and Silicon Valley are becoming brazen in their denial of privacy. Vint Cerf, a father of the internet and Google’s chief internet evangelist, has gone so far as to claim that privacy was a temporary by-product of the industrial revolution and urban growth [11]. If privacy is a recent accident, mere fumes from an industrial smokestack, then we might conclude that its loss means naught.

Science supports a grander view of privacy, giving it an integral place in our evolutionary history, extending throughout the animal kingdom, and culminating in the unique capabilities of our species and other great apes.

My exposure to animal behavior came through my ex-wife Kerri, a University of Chicago trained primatologist. For years, she told me stories of the regular agonisms of the baboons of Brookfield Zoo and Amboseli, Kenya. Listening to her stories did not turn me into an expert, yet they attuned me to a divide separating primatology research from Silicon Valley fictions.

What I write might be obvious to ethologists and attentive animal lovers. But to my dismay, privacy debates are often formulated abstractly, without reference to biology. I see this as a fatal mistake. A notable exception is Westin’s classic work, which connects the animal’s defense of home territory with human privacy, viewed as a protected domestic space [18]. Scholarly works typically trace the concept of privacy only as far back as Aristotle. This truncated perspective disregards anthropological evidence — perhaps after a quick nod to Mead’s Coming of Age in Samoa — and neglects millions of years of evolutionary history [13]. Long before Aristotle and pointing to a distant past, our early epic literature described widespread practices of secrecy, disguises, and anonymity in the pursuit of autonomy.

A definition of privacy

Setting aside moral, legal, and political theories, let us turn to privacy as a biological capacity, especially among primates, and even more especially among great apes.

Following Moglen, privacy can be defined to have three components: (1) secrecy – our ability to selectively hide the content of messages; (2) anonymity – our ability to selectively hide the identity of the sender and receiver of messages; and (3) autonomy — our freedom to advance plans without coercion that might result from the violation of our secrecy and anonymity [16]. We extend Moglen’s definition to other species.

In this definition, messages are broadly construed in the sense of information theory: “A message need not be the result of a conscious human effort for the transmission of ideas. For example, the records of current and voltage kept on the instruments of an automatic substation are as truly messages as a telephone conversation” [20].

Secrecy and anonymity are abilities to act, rather than the actual performance of the act. The hiding must be selective in the sense that an organism may switch it on or off. So by this definition, genetically invariable camouflage in a species does not qualify as privacy. However, the hiding need not be intentional, permitting examples like the Kallima butterfly, which is richly colored when in flight but which hides itself by imitating a dead leaf when at rest with its wings folded [7,p.47].

As we will see, the practice of secrecy is well documented in primates, especially chimpanzees, including hiding or feigning injury, the masking of emotions, concealing sexual interest and arousal, furtive mating, misdirecting attention, and concealing objects behind the back, in the mouth, or between the fingers. Primate communities can be hierarchical with tremendous advantages for those that dominate, and far fewer freedoms for subordinates. We will give numerous examples of secrecy as an enabler of autonomy among subordinates.

Biological precedents of anonymity can be found in camouflage and mimicry, and in the safety of the herd. In interspecific mimicry, the kingsnake mimics the coral snake and the swallowtail butterfly mimics other butterflies. Although various forms of mimicry are prevalent in the animal kingdom, I find that anonymity is the part of the definition of privacy that extends least readily to primates. I will leave it to others to develop the appropriate concept. In this post, privacy will be defined narrowly as a combination of secrecy and autonomy.

Evolutionary sociobiology

The next several paragraphs explain my assumptions and perspective. The impatient reader may jump ahead to the examples of primate privacy. This post adopts the perspective of evolutionary sociobiology as described by biologists such as E. O. Wilson and Goldsmith [8], [21], [22].

Evolution can produce what is called behavioral scaling, which is a flexible behavioral capacity. This fits nicely with privacy as a behavioral capacity. In this view, behavior is not a deterministic response to a simple identifiable stimulus in a fixed action pattern. Rather, behavioral flexibly depends on a large number of deterministic and random input variables from the environment.

The idea that evolution has produced behavioral capacities that can accommodate themselves to a range of environmental contingencies is commonly accepted by contemporary ethologists. — Goldsmith [8,p.110]

Goldsmith goes on to call the behavioral flexibility of the vertebrate brain “the complete antithesis of genetic determinism” [8,p.123]. Yet, this antithesis of determinism is not an absolute freedom. “For a biologist the idea of infinite cultural flexibility is unacceptable” [4,p.259]. The behavioral scaling itself (that is, the contingent range of possible outputs) is determined by our evolutionary history and cannot be renegotiated by culture. Herein lies human nature.

Evolutionary continuity between humans and great apes

A basic assumption is our evolutionary continuity with great apes.

We live in a time of increasing acceptance of our kinship with the apes. True, humanity never runs out of claims of what sets us apart, but it is a rare uniqueness claim that holds up for over a decade. If we consider our species without letting ourselves be blinded by the technological advances of the last few millennia, we see a creature of flesh and blood with a brain that, albeit three times larger than a chimpanzee’s, doesn’t contain any new parts. Even our vaunted prefrontal cortex turns out to be of rather typical size compared with that of other primates. No one doubts the superiority of our intellect, but we have no basic wants or needs that are not also present in our close relatives. Just like us, monkeys and apes strive for power, enjoy sex, want security and affection, kill over territory, and value trust and cooperation. Yes, we have computers and airplanes, but our psychological makeup remains that of a social primate. — de Waal [6,p.16]

We share with other great apes a long evolutionary history; a large fraction of our DNA; brain structure; sense organs; and a range of emotions including happiness, sadness, anger, and fear that are all evoked in similar contexts. Staying on guard against false anthropomorphisms, many researchers nonetheless accept a principle of evolutionary continuity. When, for example, a chimpanzee responds in the same way that a human responds in a given context, we may infer a probable underlying agreement of cognitive processes. I recommend Tomasello’s recent book for a particularly careful comparative analysis of how human thinking agrees and differs from that of other great apes [17].

Deception unravelled

There is one other potential misconception to clear up before turning to the biological evidence of privacy.

Research on animal secrecy is generally classified as a subfield of research on Machiavellian intelligence and deception. I was surprised to find so much privacy research buried here. By this classification, it might seem that biologists condemn the practice of secrecy. But in fact, no moral judgment is intended by this classification, and the terms deception and Machiavellian intelligence must be read in a scientifically neutral way. In the evolutionary literature, Machiavellian intelligence is used counterintuitively in a broad technical sense for many forms of social intelligence, even those encompassing cooperative behavior [19,p.12].

From an evolutionary, comparative perspective, ‘Machiavellian’ intelligence is an expression used instead for contrast with simpler social repertoires widespread in the animal kingdom (as well as with non-social uses of intelligence). It is the complex, indirect nature of the strategies we wish to highlight, and this usage seems to have become commonplace in our discipline. — Byrne and Whiten [19,p.13]

Mitchell, editor of the book Deception, describes the use of the word “deceit” as an umbrella term “to include both deception and hiding,” notably even when hiding involves no deception [14]. Deception may get higher billing than secrecy in the academic literature, but following our interest in privacy, we extract the secrecy research and set deception aside.

This post makes no judgments about the morality of the practice of privacy among humans or other primates. We do not touch theories of the evolutionary basis of human morality. However, I recommend Greene’s Moral Tribes [10].

Also, by making privacy the exclusive focus of this post, we in no way deny the broader context of primate behavior. We and our primate cousins can be both selfish and altruistic, both solitary and social, and both competitive and cooperative. Privacy can be used to promote both selfish and altruistic goals (the sage who through seclusion finds wisdom to benefit humanity, or the mother who withholds developmentally harmful messages from her child). However, this is not our focus here.

Finally, I acknowledge the enormous variations that exist in the practice of privacy among different species, and within our own species, as anthropologists document. That is material for another essay. With all this background out of the way, let us turn to examples of privacy-related behaviors among primates.

Sexual privacy among primates

Secret sexual liaisons are not solely a human invention. De Waal describes secret matings between low-ranking male chimpanzees and females in oestrus.

For a long time we thought that sexual preferences of oestrus females for particular males could be inferred from the frequency with which they followed, groomed or presented to males. The limited value of such measurements were discovered by watching the colony enter the night cages. Oestrus females who had not paid any attention, not even by means of looking up, to a male who had repeatedly made sexual advances during the daytime, would sometimes rush up to the same male’s cage to quickly mate through the bars. After observing many of these sequences, it became clear that these hurried matings only occurred with low-ranking males in the absence of high-ranking ones. The observations implied that females are capable of fully concealing sexual interests, presumably in order to forestall aggressive interference by dominant males. For the same reason, females learn to suppress screams during the climax of copulation. They may still utter a high-pitched scream during a “public” mating with the alpha male, but by the end of adolescence females do not scream anymore during furtive matings with subordinate males. — De Waal [15,p.229]

A male chimpanzee, named Dandy, was not allowed by the alpha male to mate with adult females. Nevertheless, Dandy successfully mated with females by making dates with them on the sly, arranged through surreptitious glances. He and his partner would independently walk away from the group, as if unaware of the other, then meet in rendezvous behind tree trunks [2,p.124].

Male chimpanzees sometimes make sexual advances to females by displaying their erections. At the same time, they try to hide their erections from dominant males, “who watch over the sexual contacts of other males.” The chimpanzee penis is a “bright pink color, which strongly contrasts with the dark fur.” It is not easily hidden. A male chimpanzee will sometimes try to expose his erection to a female while simultaneously hiding it from a nearby dominant male by positioning his body and arms in the right way. When a dominant male approaches, a subordinate might turn his back until his erection is lost [15,p.233].

Secretive mating also occurs among rhesus monkeys and among baboons. Observations at the Wisconsin Primate Center indicate that the alpha rhesus monkey, Spickles, “does all the ‘public’ mating. He copulates in full view of everybody, uttering a series of loud barks at the climax.” By contrast, the number two male does all his mating on the sly, out of view of Spickles [4,p.131]. With baboons, “the alpha male accounts for the vast majority of matings, particularly those that occur around the time of ovulation.” Yet, there is “also often a considerable amount of `sneaky’ mating” that gives non-alpha males some reproductive success [3,p.56], [1].

Concealing injury and fear

Primates have a general tendency to hide injuries, especially from aggressors. A chimp Amos had an enlarged liver and several cancerous growths that he kept a secret until the end.

Amos must have felt miserable for months, but any sign of vulnerability would have meant loss of status. Chimps seem to realize this. A limping male in the wild was seen to isolate himself for weeks to nurse his injuries, yet would show up now and then in the midst of the community to give a charging display full of vigor and strength, after which he’d withdraw again. That way, no one would get any ideas. Amos didn’t betray his condition until the day before his death. — De Waal [6,p.26]

Selectivity is part of our definition of privacy In fact, primates are selective in hiding injuries. After describing the gentle care that a female named Daisy gave to Amos the day before his death, De Waal comments,

The incident illustrated two contrasting sides of primate social life. First, primates live in a cut-throat world, which forces a male to conceal physical impairment for as long as possible in order to keep up a tough facade. But second, they are part of a tight community, in which they can count on affection and assistance from others, including nonrelatives. Such dualities are hard to grasp. Popular writers tend to simplify things. — De Waal [6,p.27]

For chimpanzees, teeth-baring grins and yelping sounds are signs of fear and nervousness. Males attempt to avoid revealing these signs to opponents during bluff displays. They may largely suppress their yelps. However, they lack the facial motor control to suppress their teeth-baring directly. Instead they resort to other means such as turning their back to hide signs of fear. De Waal writes of one male who turned his back to hide the grin that betrayed his fear, then used his fingers to force his lips back in place. “The manipulation occurred three times before the grin ceased to appear. After that, the male turned around to bluff back at his rival” [15,p.233].

Hiding resources

We extend our discussion for a moment beyond primates. Food caching is practiced by many species. For example, crows and ravens hide food, but they do so selectively. They have the intelligence to distinguish between their social partners and potential thieves. Ravens adopt intricate strategies to mislead pilferers. They create false stashes, going through the motions of creating a stash, but without depositing the goods. Or they shuttle the goods from one cache to another, to further confuse competitors [12,pp.45,227,230].

We now return to chimpanzees. They selectively hide food resources. We have already met Dandy when we discussed his secret dates. Here he is again.

Not drawing attention to oneself or not paying attention to another are everyday tactics of the chimpanzee. These tactics are performed with… great sophistication…. A young male, Dandy, was seen walking over a place where some grapefruits had been hidden under the sand. Since he did not react at all, not even by stopping or looking around, we assumed that he had not noticed anything. More than three hours later, however, when most other chimpanzees (including the dominant males) were asleep, Dandy made a bee-line to the spot, where he calmly and without hesitation dug up and devoured the grapefruits. The size of the enclosure, and his controlled deliberate manners, exclude the possibility of his accidentally finding the fruits. — De Waal [15,p.228]

De Waal tells the following story about an adolescent female chimp, named Zwart, who saw someone throw an apple into her enclosure at the zoo. Rather than attract the attention of other chimps by rushing to pick it up, she kept her composure, and casually approached it, after only briefly glancing at it. She sat near the apple and inconspicuously groped for it without looking. After taking hold of the apple, she walked off using a stride that is generally not used when food is grasped. (“Chimpanzees usually walk on two or three extremities when carrying food.”) Safely out of sight behind a pile of tires, she finally ate the apple [15,p.228].

These examples show that some chimpanzees go to great lengths to hide food. On the other hand, sharing can be a way for chimpanzees to boost their influence within the group. Males can be “surprisingly generous [except with rivals] when it comes to material things” [5,p.204]. They even cooperate with a special scream that signals to others the capture of fresh meat [6,p.226].

Kanzi is a pygmy chimpanzee at the Language Research Center, who regularly hides objects. The purpose of the secrets often seems to be to increase autonomy from his human caretaker. Once, suspecting foul play, Kanzi’s caretaker made a thorough search of the enclosure, but found nothing. Kanzi even (deceptively) joined in search. The suspicions were confirmed when the caretaker looked away, and Kanzi quickly produced a door key “from nowhere” and escaped the enclosure [2,p.226].

Kanzi loves to eat wild mushrooms, but his caretakers do not allow it. He “has devised many ways to hide them. Often he will secretively conceal them between his thumb and forefinger as he walks, never breaking stride.” When his human companion looks away, “Kanzi will quickly pop the mushroom into his mouth…. The only tell-tale sign is the distinct mushroom odour which lingers upon his breath” [2,p.231].

Chimps sometimes conceal rocks that are hurled as weapons. During charging displays, a chimpanzee Nikkie “almost always carries a weapon in his hand, often holding it behind his back” [15,p.235]. Santino is a “male chimp at a Swedish zoo. Every morning, before the visitors arrived, he’d leisurely collect rocks from the moat around his enclosure, stacking them up into neat piles hidden from view. This way he’d have an arsenal of weapons when the zoo opened its gates” [6,p.205].

The safety of silence

Chimpanzee mothers are known to silence their children by holding a finger or hand over their mouths in situations where noise might provoke the aggression of a nearby adult male chimpanzee.

Most of the examples I have quoted are concerned with privacy of an individual within a community. Privacy also matters between different groups. Goodall describes how male chimpanzees form border patrols to safeguard their territory from intruders from different groups. When on patrol, they maintain silence to conceal their presence, but when they return to the safety of home base, they may let loose with hoots and drumming [15,p.234], [9].

Challenges to autonomy

The pursuit of autonomy can be a desperate undertaking in a hierarchical community. Here are some examples of dominant animals forcibly searching subordinates for contraband.

When a low-ranking female baboon discovers a clutch of bird eggs, she looks furtively around her and then stuffs as many eggs into her mouth as fast as she can. If she is detected she runs away, averting her face from any onlookers. If another baboon catches up to her, she resolutely keeps her mouth firmly closed, even when the other attempts to pry it open. But if a high-ranking female stumbles onto a similar trove, she calmly and deliberately eats the eggs in plain view of others. The inescapable impression is that the low-ranking female is trying to conceal her discovery from others in order to avoid theft. For high-ranking animals, such subterfuge is unnecessary. — Cheney and Seyfarth [3,p.153]

In a related example, De Waal shows a picture of a dominant rhesus monkey inspecting the cheek pouches of a juvenile for hidden food [4,p.128].

Summary and Conclusions

These examples suggest that privacy can be exercised by low-ranking primates to avoid direct conflict with dominant group members. Privacy allows individuals to pursue autonomous activities within a community. Selectively withholding and revealing secrets allows the individual to adjust to the complex, changing dynamics of the group.

Private matings break the near monopoly of the alpha male over access to oestrus females, without upsetting the social hierarchy. Food is sometimes shared, and sometimes consumed alone. Injuries are generally kept hidden when possible, but sometimes revealed when it seems safe to do so.

We form general categories of privacy-seeking behavior among primates: in sexual relations, in protecting valuable resources, and in hiding injury from others. De Waal reminds us, we too “are a hierarchical primate,” and it is hard not to draw the obvious parallels with humans that seek privacy in their sexual relations, resources, and medical histories. It is striking that we seek privacy in domains closely tied to reproduction, access to resources and survival — which are all basic to evolution.

I do not expect this post to be the final word. Anecdotal evidence is merely an early step in systematic data collection and analysis of scientific evidence. Many of my examples come from animals in confined spaces that are quite different from their natural habitats. More evidence is needed to determine whether the examples I give are representative of the various species I mention. Nevertheless, these anecdotes suggest that privacy seeking is an enduring part of primate behavior.


I thank Kerri Smith for her comments and acknowledge the significant influence of De Waal’s writings on this post.


[1] Jeanne Altmann, Susan C Alberts, Susan A Haines, Jean Dubach, Philip Muruthi, Trevor Coote, Eli Geffen, David J Cheesman, Raphael S Mututua, Serah N Saiyalel, et al. Behavior predicts genes structure in a wild primate group. Proceedings of the National Academy of Sciences, 93(12):5797–5801, 1996.

[2] Richard Byrne and Andrew Whiten, editors. Machiavellian intelligence: social expertise and the evolution of intellect in monkeys, apes, and humans. Oxford University Press, 1989.

[3] Dorothy L Cheney and Robert M Seyfarth. Baboon metaphysics: the evolution of a social mind. University of Chicago Press, 2008.

[4] Frans De Waal. Peacemaking among primates. Harvard University Press, 1990.

[5] Frans De Waal. Chimpanzee politics: Power and sex among apes. JHU Press, 2007.

[6] Frans De Waal. The bonobo and the atheist: In search of humanism among the primates. WW Norton & Company, 2013.

[7] Peter Forbes. Dazzled and deceived: mimicry and camouflage. Yale University Press, 2011.

[8] Timothy H Goldsmith. The biological roots of human nature. Oxford University Press, 1994.

[9] Jane Goodall, Adriano Bandora, Emilie Bergmann, Curt Busse, Hilali Matama, Esilom Mpongo, Ann Pierce, David Riss, et al. Intercommunity interactions in the chimpanzee population of the Gombe National Park. The great apes, 5:13–53, 1979.

[10] Joshua Greene. Moral tribes: emotion, reason and the gap between us and them. Atlantic Books Ltd, 2014.

[11] Jacob Kastrenakes. Google’s chief internet evangelist says ’privacy may actually be an anomaly’, November 20, 2013.

[12] John M Marzluff, Tony Angell, and Paul R Ehrlich. In the company of crows and ravens. Yale University Press, 2007.

[13] Margaret Mead. Coming of age in Samoa. Penguin, 1973.

[14] Robert W Mitchell. Deception and hiding in captive lowland gorillas (gorilla gorilla gorilla). Primates, 32(4):523–527, 1991.

[15] Robert W Mitchell and Nicholas S Thompson, editors. Deception: Perspectives on human and nonhuman deceit. SUNY Press, 1986.

[16] Eben Moglen. Oh, freedom, Oct 30, 2013.

[17] Michael Tomasello. A natural history of human thinking. Harvard University Press, 2014.

[18] Alan F Westin. Privacy and freedom. Atheneum, 1967.

[19] Andrew Whiten and Richard W Byrne, editors. Machiavellian intelligence II: Extensions and evaluations, volume 2. Cambridge University Press, 1997.

[20] Norbert Wiener. Extrapolation, interpolation, and smoothing of stationary time series, volume 2. MIT press Cambridge, MA, 1949.

[21] Edward O Wilson. Sociobiology. Harvard University Press, 2000.

[22] Edward O Wilson. On human nature. Harvard University Press, 2012.

27 Formal Proof Projects

Freek Wiedijk maintains a list of 100 well-known math theorems and tracks which of them have been formalized. As of May 2014, all but 11 had been formalized. There is a wiki page of 27 notoriously long mathematical proofs. Only three of them have been formalized: Feit-Thompson’s odd order theorem,  the four-color theorem, and the Kepler conjecture. Merging lists, we obtain the following list of suggestions for formal proof projects.

  1. Insolvability of General Higher Degree Equations (Abel-Ruffini)
  2. Green’s theorem
  3. The undecidability of the Continuum Hypothesis
  4. Fermat’s Last theorem
  5. Puiseux’s Theorem
  6. The Isoperimetric Inequality
  7. The Hermite-Lindemann Transcendence Theorem
  8. The Law of Large Numbers
  9. Dissection of Cubes (Littlewood)
  10. Classification of simple complex Lie algebras
  11. Constructibility of the 65537-gon
  12. Lasker-Noether theorem
  13. Resolution of singularities in characteristic zero
  14. Resolution of singularities for 3-folds in characteristic at least 7 (Abyhankar)
  15. Discrete series representations of Lie groups (Harish-Chandra)
  16. Burnside’s bounded exponent problem (negative solution by Novikov-Adian)
  17. Grothendieck’s FGA and SGA
  18. Ramanujan conjecture and the Weil conjectures
  19. Eisenstein series (Langlands)
  20. Arthur-Selberg trace formula
  21. Almgren’s regularity theorem
  22. Lafforgue’s theorem
  23. Poincare conjecture
  24. Thurston’s geometrization conjecture
  25. Classification of finite simple groups
  26. Robertson-Seymour theorem
  27. Strong perfect graph theorem

Formalizing NIST standards

Thomas C. Hales
University of Pittsburgh

Based on Snowden documents, the New York Times reported that the NSA worked to subvert NIST cryptographic standards. This article discusses general weaknesses in the NIST standard 800-90A for pseudo-random number generation. Among other defects, this NIST standard is deficient because of a pervasive sloppiness in the use of mathematics. This, in turn, prevents serious mathematical analysis of the standard and promotes careless implementation in code. We propose formal verification methods as a remedy.

Levels of Mathematical Rigor

We may categorize mathematical argument as informal, rigorous, or formal.

Informal mathematics is the vulgar product of the popular press. Informally, a function is continuous if it can be sketched without lifting a pencil from a sheet of paper; chaos is the unpredictable effect of a butterfly flapping its wings in Brazil; and a pseudo-random number is one that is paradoxically deterministic yet effectively random. Informal mathematics is not wrong so much as it is unsuitable for careful science and engineering.

A more rigorous approach to mathematics became necessary in the final decades of the nineteenth century to resolve paradoxes in Cantor’s set theory and other troubles. For example, disputes about continuity were resolved by clarifying definitions, eventually refining a single intuitive notion of continuity into a family of related notions: continuity, uniform continuity, and so forth. Most professional mathematical publications now adhere to widely accepted standards of mathematical rigor, enforced through the diligence of human referees.

Formal mathematics is yet a higher standard. English and other natural languages are abandoned and replaced with languages whose syntax and semantics are designed with mathematical precision. The system specifies every admissible rule of logic and every mathematical axiom. Quality is enforced by a computer, which exhaustively checks every logical step of a proof.

Formal verification becomes appropriate in proofs whose complexity surpasses the capabilities of checking by hand. (A wiki page catalogues numerous long mathematical theorems that might benefit from formalization.) Formal methods are well-suited for many computer-assisted mathematical proofs. In fact, at the formal level the line is erased between algorithms implemented as computer code and mathematical reasoning. A single software system handles the formal verification of both.

Formal methods have been under development for decades, and in recent years the verification of complex software systems, hardware, and intricate theorems has become a reality. Already in 1989, it was possible to formally specify and verify a simple computer system from high-level language to microprocessor. As recent examples, we mention the full verification of a C compiler and complex mathematical theorems such as the Feit-Thompson theorem and the Four-Color theorem.

Formal Verification in Cryptography

Formal verification of computer code can be advised when human life or large economic interests are at stake: aircraft control systems, widely adopted cryptographic standards, or nuclear reactor controllers. Formal verification reduces the software defect rate to a level that is scarcely attainable by other means.

For several reasons, cryptography calls out for formal verification. The field is highly mathematical. Many key algorithms can be implemented as small blocks of code. A tiny defect can potentially defeat the entire algorithm. Adversaries actively seek out bugs to exploit. Cryptography safeguards large financial interests and fundamental human freedoms.

Various formal tools have been constructed especially for application to cryptography. The pi-calculus has been adapted to cryptographic protocols. Projects in the Isabelle proof assistant include protocol verification through inductive definitions and game analysis. In the Coq proof assistant, there have been successful formal verifications of cryptographic primitives and code-based cryptographic proofs. Significantly, formal methods have started to enter the standardization process.

The working group on the Theoretical Foundations of Security Analysis and Design and the Computer Security Foundations Symposium of the IEEE (CSF 2013) promote formal methods in cryptography.

In truth, our imperfect knowledge prevents the comprehensive verification of cryptographic systems. We are stymied by notorious problems like P versus NP and the existence of one-way functions. We lack definitive lower bounds on the computational complexity of concrete problems such as factoring of integers. Research into security reductions is ongoing. There is no comprehensive security model. For example, the Dolev-Yao model works at a high level of abstraction, assuming that cryptographic primitives function perfectly, while other models operate at various levels of detail.

Nevertheless, we can work with these limitations, implementing a collection of interrelated formal proofs grounded in current technological capabilities, and move forward from there.

The informality of NIST standards

Earlier critiques of the NIST standard 800-90A for pseudo-random number generation have focused on specific defects. Here, we argue that mathematical weaknesses run throughout the standard. Amid the accusations that the NSA has undermined NIST cryptographic standards, we remind NIST that one of the most effective ways to subvert a cryptographic standard is to muddle the math.

The first requirement of a standard is to set the tone and level of discourse that reflects the current technological capabilities of the matter at hand. By choosing to present an informal standard, avoiding both rigor and formal mathematics, NIST has produced a standard that is out of step with the technology of the time.

Some definitions in the NIST standard are merely informal. For example, the NIST standard defines1 pseudo-randomness as “deterministic yet also effectively random” (NIST 800-90A, p.7). A mathematically rigorous definition of pseudo-random generators requires much more care, referencing rigorous notions of measure, probability, and complexity theory. Properly formulated definitions are given in Luby, Yao, Blum and Micali. As it is manifestly impossible to base rigorous or formal mathematical proofs on something so vague as “deterministic yet effectively random,” the early pages of the NIST standard effectively ward off careful mathematical analysis.

The lack of rigor continues throughout the document. Algorithms are described with English-text pseudo-code. With more care, NIST might have provided a formal specification and a reference implementation in executable code in a language with precise semantics. Overall, the standard gives few mathematical arguments, and these do not inspire confidence. The document slips into convenient inaccuracies: heuristics rather than proofs, fixed-point arithmetic, and Shannon entropy rather than min-entropy. (See NIST 800-90A, Appendix C.2.) In fact, the standard is imprecise to such a degree that competing definitions of entropy are largely irrelevant.

An example of NIST reasoning

This section goes into detail about a particular mathematical argument that appears in the NIST standard.2 For our purposes, the topic of discussion matters less than the nature of the NIST committee’s mathematical thought. Do they reason as a mathematician in an unbroken chain of logic, or is the committee satisfied by a lesser standard?

The context is the following. Let E be an elliptic curve defined over a finite field Fp, defined in affine coordinates by a polynomial equation y2 = f(x). The pseudo-random generator extracts bits from the x coordinates of a sequence of points P1, P2,… on the elliptic curve. The construction of the sequence of points does not concern us here. The issue is this: if points are sampled uniformly at random from E(Fp), then their x coordinates are not uniformly distributed in the finite field; in fact, the x coordinates obviously belong to the subset of the finite field on which f(x) is a square. Research estimates are needed to determine how big an issue this is. Aggressive truncation of bits from the binary representation of x might improve pseudo-randomness but would make the algorithm less efficient.3

NIST quotes the research of El Mahassni and Shparlinski as “an additional reason that argues against increasing truncation.” There are numerous gaps in NIST reasoning.

  • A bound on discrepancy is not the same as uniform distribution.
  • Uniform distribution is not the same as cryptographically secure pseudo-randomness.
  • The sets {Pi} of points used in real-world implementations have cardinalities far too small to be relevant to the given asymptotic estimates.
  • The research does not support the specific NIST rule that “the recommended number of bits discarded from each x-coordinate will be 16 or 17″ and does not convincingly “argue against increasing truncation.”

Nevertheless, NIST uses the research to make the inflated claim that “certain guarantees can be made about the uniform distribution of the resulting truncated quantities” (NIST 800-90A). This is proof by intimidation.


The NIST standard 800-90A states that “a user of a DRBG for cryptographic purposes requires assurance that the generator actually produces (pseudo) random and unpredictable bits. The user needs assurance that the design of the generator, its implementation and its use to support cryptographic services are adequate to protect the user’s information” (NIST 800-90A). We agree.

What assurances does NIST actually provide about the generator? The document contains no mathematical proofs of pseudo-randomness and no supporting citations. Indeed, careful proofs would be impossible, because as we have noted, definitions are more informal than rigorous. Instead, the user of the standard must rely on NIST’s authoritative claim that “the design of each DRBG mechanism in this Recommendation has received an evaluation of its security properties prior to its selection for inclusion in this Recommendation.” That one sentence is the extent of NIST assurance of design. That’s it! It seems that for NIST, assurance means to comfort with soothing words. To a mathematician, this attitude is exasperating. There is no mathematical statement of what those security properties are, and no discussion of the methods that were used to reach the conclusion. We are not told who made the evaluation or what the results of the evaluation were.

Based on the Memorandum of Understanding between NIST and NSA from 1989, quoted in Schneier (p. 601), we might wonder whether NIST’s part in the evaluation was limited. According to the memo, “The NIST will … recognize the NSA-certified rating of evaluated trusted systems under the Trusted Computer Security Evaluation Criteria Program without requiring additional evaluation” (emphasis added).

Here is the NIST assurance of HMAC_DRBG (deterministic random bit generation based on hash message authentication codes). It states, “In general, even relatively weak hash functions seem to be quite strong when used in the HMAC construction. On the other hand, there is not a reduction proof from the hash function’s collision resistance properties to the security of the DRBG” (NIST 800-90A Appendix E.2). Note the informal tone of the discussion, the reassurance that a weakness is strength, the brevity, and absence of mathematical theorems.

Cryptographic standards derive their security through the underlying mathematics. We can place our trust in mathematics but not in assurances such as these.


According to NIST aspirations, “NIST works to publish the strongest cryptographic standards possible.” Our analysis shows that judged by professional mathematical standards, NIST is very far from its goal. Indeed, the current NIST standard was written in a pre-Snowden era of unverified assurances.

NIST sets the standard both by its choice of algorithms and by its
attitude towards rigor. Overall, its general careless tone will
facilitate vulnerable implementations of the standard.

Better approaches to standardization are available. In fact, a number of formal verification projects have been completed (such as a formal verification of a C compiler mentioned above) that dwarf what we specifically ask NIST to do. Please adopt verification technologies in widespread use! Improvement in the formal specification of NIST standards is the first critical step in a larger process of formal verification along the entire chain, including the underlying mathematical concepts, cryptographic primitives, protocols and algorithms, and end implementations in computer code.

End Notes

1. [Here is the full definition from NIST: “A process (or data produced by a process) is said to be pseudorandom when the outcome is deterministic, yet also effectively random, as long as the internal action of the process is hidden from observation. For cryptographic purposes, `effectively’ means `within the limits of intended cryptographic strength.'” Speaking of the data, we may ask with Knuth, “is 2 a random number?”] (return to text)
2. [We pick the most extensive mathematical argument in the document. It is telling that this argument is used to justify weakening the standard for the sake of efficiency.] (return to text)
3. [In light of the much discussed back door to the elliptic curve algorithm, NSA had a secret interest in persuading users not to discard many bits from x; aggressive truncation would make the back door more difficult to use.] (return to text)


M. Abadi and C. Fournet, Mobile values, new names, and secure communication, In POPL ’01: Proceedings of the 28th ACM SIGPLAN-SIGACT symposium on the principles of programming languages (2001), 104–115.

M. Abadi and A. D. Gordon, A calculus for cryptographic protocols: The spi calculus, Information and Communication 148 (January 1999), 1–70.

NIST 800-90A, E. Barker and J. Kelsey, Recommendation for random number generation using deterministic random bit generators, NIST Special Publication 800-90A (2012).

W. R. Bevier, W. A. Hunt Jr., J Strother Moore, and W. D. Young, An approach to systems verification, Journal of Automated Reasoning 5 (1989), 411–428 and 422–423.

M. Blum and S. Micali, How to generate cryptographically strong sequences of pseudo-random bits, SIAM Journal on Computing 13 (1984), 850–864.

D. Dolev and A. C. Yao, On the security of public-key protocols, IEEE Transaction on Information Theory 2 (March 1983), 198–208.

M. Drmota and R. F. Tichy, Sequences, discrepancies and applications, Lecture Notes in Mathematics, Springer, 1997.

G. Gonthier et al., A machine-checked proof of the odd order theorem, Lecture Notes in Computer Science 7998 (2013), 163–179.

G. Gonthier, Formal proof — the four colour theorem, Notices of the AMS 55 (December 2008), no. 11, 1382–1393.

J. Harrison, Formal verification at Intel, Proceedings. 18th Annual IEEE Symposium on Logic in Computer Science (2003), 45–54.

M. Luby, Pseudorandomness and cryptographic applications, Princeton University Press, 1996.

E. El Mahassni and I. Shparlinski, On the uniformity of distribution of congruential generators over elliptic curves, Discrete Mathematics and Theoretical Computer Science (2002), 257–264, Sequences and their Applications.

R. Milner, Communicating and mobile systems: the pi-calculus, Cambridge University Press, 1999.

A. Yao, Theory and applications of trapdoor functions, FOCS (1982), 384–400.

The NSA back door to NIST

Thomas C. Hales (University of Pittsburgh)

(This article will be published in the Notices of the American Mathematical Society.)

Use once. Die once. — activist saying about insecure communication

This article gives a brief mathematical description of the NIST standard for cryptographically secure pseudo-random number generation by elliptic curves, the back door to the algorithm discovered by Ferguson and Shumow, and finally the design of the back door based on the Diffie-Hellman key exchange algorithm.

NIST (the National Institute for Standards and Technology) of the U.S. Department of Commerce derives its mandate from the U.S. Constitution, through the congressional power to “fix the standard of weights and measures.” In brief, NIST establishes the basic standards of science and commerce. Whatever NIST says about cryptography becomes implemented in cryptographic applications throughout U.S. government agencies. Its influence leads to the widespread use of its standards in industry and the broad adoption of its standards internationally.

Through the Snowden disclosures, the NIST standard for pseudo-random number generation has fallen into disrepute. Here I describe the back door to the NIST standard for pseudo-random number generation in elementary and mathematically precise terms. The NIST standard offers three methods for pseudo-random number generation [NIST]. My remarks are limited to the third of the three methods, which is based on elliptic curves.

Random number generators can either be truly random (obtaining their values from randomness in the physical world such as a quantum mechanical process) or pseudo-random (obtaining their values from a deterministic algorithm, yet displaying a semblance of randomness). The significance of random number generation within the theory of algorithms can be gauged by Knuth’s multivolume book, The Art of Computer Programming. It devotes a massive 193 pages (half of volume two) to the subject! A subclass of pseudo-random number generators are cryptographically secure, intended for use in cryptographic applications such as key generation, one-way hash functions, signature schemes, private key cryptosystems, and zero knowledge interactive proofs [Luby].

Elliptic curves as pseudo-random number generators

The NIST standard gives a list of explicit mathematical data (E,p,n,f,P,Q) to be used for pseudo-random number generation [NIST]. Here E is an elliptic curve defined over a finite field \mathbb{F}_p of prime order p. The group E(\mathbb{F}_p) has order n, which is prime for all of the curves that occur in the NIST standard. The elements of the group E(\mathbb{F}_p) consist of the set of points on an affine curve, together with a point at infinity which serves as the identity element of the group. The affine curve is defined by an equation y^2 = f(x) for some explicit cubic polynomial f in \mathbb{F}_p[x]. Finally, P and Q are given points on the affine curve.

NIST gives a few sets of data and in each case the prime number p is large. (The smallest is greater than 10^{77}.) No explanation is given of the particular choices (E,p,n,f,P,Q). We are told to use these data and not to question why. The standard stipulates that “one of the following NIST approved curves with associated points shall be used in applications requiring certification under FIPS-140 [U.S. government computer security accreditation].”

When A is any point other than the identity in E(\mathbb{F}_p), we may evaluate the coordinate function x at A, to obtain x(A)\in \mathbb{F}_p. By further lifting \mathbb{F}_p to a set of representatives in \mathbb{Z}, we obtain a function by composition x1 : E(\mathbb{F}_p)\setminus\{0\}~ \to~ \mathbb{F}_p~\to~ \mathbb{Z}. Write (n,A)\mapsto n * A for the \mathbb{Z}-module action of \mathbb{Z} on E. (We write powers of the group element A using multiplicative rather than exponential notation.)

The pseudo-random bit generator is initialized with a random integer seed s, obtained by some different process such as a separate random number generator. What is important for us is that the number s represents the hidden internal state of the algorithm. The hidden state must be kept secret for the pseudo-randomness to be effective. (Once the state is disclosed, a pseudo-random sequence becomes predictable and useless for many cryptographic purposes.)

The essence of the pseudo-random bit generator can be written in the Objective Caml language as follows. In the syntax of this language, each phrase (let x = a in …) defines the value of x to be a. The last line of the block of code gives the output of the function.

let pseudo_random s =
 let r = x1 (s * P) in
 let s' = x1 (r * P) in
 let t = x1 (r * Q) in
 let b = extract_bits t in

That is, we successively apply the integer s or r to the point P or the point Q and take the x1 coordinate of the resulting point, then extract some bits from the number t. The integer s’ becomes the new secret internal state to be fed into the next iteration of the function. The output b is passed to the consumer of pseudo-random bits. This output may become publicly known. The function extract_bits operates by converting t to a list of bits, discarding the 16 most significant bits (for reasons that do not matter to this discussion), and giving the remaining bits as output. According to NIST standards, by iterating this function, updating the internal state at each iteration, a cryptographically secure stream b … of pseudo-random bits is obtained.

The back door

This algorithm is fatally flawed, as Ferguson and Shumow pointed out [Shumow-Ferguson]. Since P and Q are non-identity elements of a cyclic group of prime order, each is a multiple of the other. Write P = e * Q, for some integer e. We show that once we have e in hand, it is a simple matter to determine the secret internal state s of the pseudo-random bit generator by observing the output b, and thus to compromise the entire system.

The function extract_bits discards 16 bits. Given the output b, we take the 2^{16} (a small number of) possible preimages t of b under extract_bits. For each t, the coordinate x is known, and solving a quadratic, there are at most two possibilities for the coordinate y of a point A on the elliptic curve such that t = x1 (A). One such A is r * Q. For each A, we compute e * A. One of the small number of possibilities for e * A is

e * (r * Q) = r * (e * Q) = r * P.     (1)

Finally s’ = x1 (r * P). In short, the internal state s’ can be be narrowed down to a small number of possibilities by an examination of the pseudo-random output bitstream. Shumow and Ferguson state that in experiments, “32 bytes of output was sufficient to uniquely identify the internal state of the PRNG [pseudo-random number generator].”

The back door to the algorithm is the number e such that P = e * Q. To use the back door, one must know of the value e. The NIST standard does not disclose e (of course!), and extensive cryptographic experience suggests that it is hard to compute e from the coordinates of P and Q (unless you happen to own a quantum computer). This is the problem of discrete logarithms. But starting with e, there is no difficulty in creating a pair P and Q. The back door is universal: a single number e gives back door access to the internal state of the algorithm of all users worldwide.

It is a matter of public fact that the NSA was tightly involved in the writing of the standard. Indeed, NIST is required by law to consult with NSA in creating its standard. According to the New York Times, “classified NSA memos appear to confirm that the fatal weakness, discovered by two Microsoft cryptographers in 2007, was engineered by the agency” [NYT]. The news article goes on to say that “eventually, NSA became the sole editor” and then pushed aggressively to make this the standard for the 163 member countries of the International Organization for Standardization. Further historical and social context appears in [Wired]. NSA had facile access to the crown jewel e and motive to seize it. Draw your own conclusions.


1. This back door to this algorithm is extremely elementary from a mathematical perspective. We wrote the essential algorithm in six lines of computer code, even if more supporting code is needed to make it industrial strength. The algorithm could be explained to undergraduate math majors, or sufficiently advanced high-school students. The story also has the spy-agency intrigue to make a good math club talk or a special lecture in an elementary abstract algebra course. We essentially just need to understand that an elliptic curve is an abelian group whose elements (other than the identity element) are determined by two numbers x and y, that y is the root of a quadratic when x is given, and that every non-identity element of a cyclic group of prime order is a generator. Easy stuff.

2. Without prior knowledge of the back door, how difficult would it be to rediscover the possible existence of a back door? An analysis of the argument shows the required level of creativity is that of an undergraduate homework problem. We must think to write the element P as a multiple of the generator Q in a cyclic group of prime order. This a student learns in the first weeks of undergraduate algebra.

The rest of the process of inverting the pseudo-random number generator is determined by the definition of the function itself: simply take each step defining the function and reverse the steps, asking for the preimage of the function at each step of its definition, working from the output back to the secret state s’. Once the question of inverting the function is asked, it is easy to do the group theory, even if it is computationally difficult to write e explicitly.

One-way functions are a standard tool in the cryptographer’s bag. Every professional who has been trained to analyze cryptographic algorithms knows to ask the question of invertibility. It is unsettling that NIST and others do not seem to have asked this basic question.

Diffie-Hellman key exchange

In what follows, let us assume that someone, whom we will call the Spy, has access to the back door e. How is it possible for the Spy and the end user (the User) of the NIST algorithm to come into possession of the same shared secret (the internal state of the pseudo-random number generator), when all communication between them is public? Information flows from the Spy to the User through the published NIST standard, and from the User back to the Spy through the public output of the pseudo-random generator. The back door must have a remarkable cryptographic design to permit a secret to pass across these public channels, yet prevent the secret from becoming known to a third party.

As we now explain, the design of the back door to NIST is based on a well-known algorithm in cryptography called the Diffie-Hellman key exchange [Diffie-Hellman]. This is an algorithm to share a secret between two parties, when there is a possibility that the channel of communication is being monitored. In the current context, the Spy has full knowledge of the Diffie-Hellman key exchange for what it is. However, the User participates in the exchange innocently and unwittingly, by blindly following the rules of the NIST protocol.

The Diffie-Hellman key exchange requires a group, which we will take to be a cyclic group E of order n (to preserve notation). The group E, its order n, and a generator Q are made public. To share a secret, the first party (the Spy), picks a random number e, which is kept secret, and publishes P = e * Q to the world. The second party (the User) picks a random number r, which is kept secret, and publishes r * Q. Then by Equation (1), the Spy who knows e and r *Q, and the User who knows r and e * Q can both compute (r e) * Q = r * P, which is the shared secret. (In our context, the shared secret determines the internal state s’ of the pseudo-random number generator.) If E is a group in which the public knowledge E, n, Q, P = e * Q, r * Q does not allow the easy computation of (r e) * Q, then the shared secret is protected from public disclosure by the difficulty of the computation. In this way, the only two who learn the internal state of the pseudo-random number generator are the Spy and the User.

What we have described here is not an imaginary scenario: NIST documents do in fact publish the data E, n, Q, and P, needed to initiate the Diffie-Hellman exchange. A user, when making public the output from the pseudo-random number generator, does in fact complete the exchange. Diffie-Hellman is Diffie-Hellman, whether it has been advertised as such or not.

To say that the Diffie-Hellman key exchange algorithm is well-known is a vast understatement. This algorithm is a significant lesson in virtually every first course in cryptography everywhere in the world. Building on Merkle, the Diffie-Hellman paper, by starting the entire field of public key cryptography, is one of the most influential papers in cryptography ever written.

What is the significance of all this? It is no secret that the NSA employs some of the world’s keenest cryptographic minds. They all know Diffie-Hellman. In my opinion, an algorithm that has been designed by NSA with a clear mathematical structure giving them exclusive back door access is no accident, particularly in light of the Snowden documents. This is a work of experts.


[NIST] E. Barker and J. Kelsey, Recommendation for random number generation using deterministic random bit generators. NIST Special Publication 800-90A (2012),

[Diffie-Hellman] W. Diffie and M. Hellman, New directions in cryptography. IEEE Transactions on Information Theory 22 (1976), 644-654.

[Luby] M. Luby, Pseudorandomness and cryptographic applications, Princeton University Press, 1996.

[NYT] N. Perloth, J. Larson, and S. Shane. N.S.A. able to foil basic safeguards of privacy on web, September 5, 2013, New York Times,

[Shumow-Ferguson] D. Shumow and N. Ferguson, On the possibility of a back door in the NIST SP800-90 dual EC PRNG,, 2007.

[Wired] K. Zetter, How a crypto `backdoor’ pitted the tech world against the NSA, Wired (Sept 24, 2013),