Thomas C. Hales (University of Pittsburgh)
(This article will be published in the Notices of the American Mathematical Society.)
Use once. Die once. — activist saying about insecure communication
This article gives a brief mathematical description of the NIST standard for cryptographically secure pseudo-random number generation by elliptic curves, the back door to the algorithm discovered by Ferguson and Shumow, and finally the design of the back door based on the Diffie-Hellman key exchange algorithm.
NIST (the National Institute for Standards and Technology) of the U.S. Department of Commerce derives its mandate from the U.S. Constitution, through the congressional power to “fix the standard of weights and measures.” In brief, NIST establishes the basic standards of science and commerce. Whatever NIST says about cryptography becomes implemented in cryptographic applications throughout U.S. government agencies. Its influence leads to the widespread use of its standards in industry and the broad adoption of its standards internationally.
Through the Snowden disclosures, the NIST standard for pseudo-random number generation has fallen into disrepute. Here I describe the back door to the NIST standard for pseudo-random number generation in elementary and mathematically precise terms. The NIST standard offers three methods for pseudo-random number generation [NIST]. My remarks are limited to the third of the three methods, which is based on elliptic curves.
Random number generators can either be truly random (obtaining their values from randomness in the physical world such as a quantum mechanical process) or pseudo-random (obtaining their values from a deterministic algorithm, yet displaying a semblance of randomness). The significance of random number generation within the theory of algorithms can be gauged by Knuth’s multivolume book, The Art of Computer Programming. It devotes a massive 193 pages (half of volume two) to the subject! A subclass of pseudo-random number generators are cryptographically secure, intended for use in cryptographic applications such as key generation, one-way hash functions, signature schemes, private key cryptosystems, and zero knowledge interactive proofs [Luby].
Elliptic curves as pseudo-random number generators
The NIST standard gives a list of explicit mathematical data (E,p,n,f,P,Q) to be used for pseudo-random number generation [NIST]. Here E is an elliptic curve defined over a finite field of prime order p. The group has order n, which is prime for all of the curves that occur in the NIST standard. The elements of the group consist of the set of points on an affine curve, together with a point at infinity which serves as the identity element of the group. The affine curve is defined by an equation for some explicit cubic polynomial f in . Finally, P and Q are given points on the affine curve.
NIST gives a few sets of data and in each case the prime number p is large. (The smallest is greater than .) No explanation is given of the particular choices (E,p,n,f,P,Q). We are told to use these data and not to question why. The standard stipulates that “one of the following NIST approved curves with associated points shall be used in applications requiring certification under FIPS-140 [U.S. government computer security accreditation].”
When A is any point other than the identity in , we may evaluate the coordinate function x at A, to obtain . By further lifting to a set of representatives in , we obtain a function by composition Write for the -module action of on E. (We write powers of the group element A using multiplicative rather than exponential notation.)
The pseudo-random bit generator is initialized with a random integer seed s, obtained by some different process such as a separate random number generator. What is important for us is that the number s represents the hidden internal state of the algorithm. The hidden state must be kept secret for the pseudo-randomness to be effective. (Once the state is disclosed, a pseudo-random sequence becomes predictable and useless for many cryptographic purposes.)
The essence of the pseudo-random bit generator can be written in the Objective Caml language as follows. In the syntax of this language, each phrase (let x = a in …) defines the value of x to be a. The last line of the block of code gives the output of the function.
let pseudo_random s = let r = x1 (s * P) in let s' = x1 (r * P) in let t = x1 (r * Q) in let b = extract_bits t in (s',b);
That is, we successively apply the integer s or r to the point P or the point Q and take the x1 coordinate of the resulting point, then extract some bits from the number t. The integer s’ becomes the new secret internal state to be fed into the next iteration of the function. The output b is passed to the consumer of pseudo-random bits. This output may become publicly known. The function extract_bits operates by converting t to a list of bits, discarding the 16 most significant bits (for reasons that do not matter to this discussion), and giving the remaining bits as output. According to NIST standards, by iterating this function, updating the internal state at each iteration, a cryptographically secure stream b … of pseudo-random bits is obtained.
The back door
This algorithm is fatally flawed, as Ferguson and Shumow pointed out [Shumow-Ferguson]. Since P and Q are non-identity elements of a cyclic group of prime order, each is a multiple of the other. Write P = e * Q, for some integer e. We show that once we have e in hand, it is a simple matter to determine the secret internal state s of the pseudo-random bit generator by observing the output b, and thus to compromise the entire system.
The function extract_bits discards 16 bits. Given the output b, we take the (a small number of) possible preimages t of b under extract_bits. For each t, the coordinate x is known, and solving a quadratic, there are at most two possibilities for the coordinate y of a point A on the elliptic curve such that t = x1 (A). One such A is r * Q. For each A, we compute e * A. One of the small number of possibilities for e * A is
e * (r * Q) = r * (e * Q) = r * P. (1)
Finally s’ = x1 (r * P). In short, the internal state s’ can be be narrowed down to a small number of possibilities by an examination of the pseudo-random output bitstream. Shumow and Ferguson state that in experiments, “32 bytes of output was sufficient to uniquely identify the internal state of the PRNG [pseudo-random number generator].”
The back door to the algorithm is the number e such that P = e * Q. To use the back door, one must know of the value e. The NIST standard does not disclose e (of course!), and extensive cryptographic experience suggests that it is hard to compute e from the coordinates of P and Q (unless you happen to own a quantum computer). This is the problem of discrete logarithms. But starting with e, there is no difficulty in creating a pair P and Q. The back door is universal: a single number e gives back door access to the internal state of the algorithm of all users worldwide.
It is a matter of public fact that the NSA was tightly involved in the writing of the standard. Indeed, NIST is required by law to consult with NSA in creating its standard. According to the New York Times, “classified NSA memos appear to confirm that the fatal weakness, discovered by two Microsoft cryptographers in 2007, was engineered by the agency” [NYT]. The news article goes on to say that “eventually, NSA became the sole editor” and then pushed aggressively to make this the standard for the 163 member countries of the International Organization for Standardization. Further historical and social context appears in [Wired]. NSA had facile access to the crown jewel e and motive to seize it. Draw your own conclusions.
1. This back door to this algorithm is extremely elementary from a mathematical perspective. We wrote the essential algorithm in six lines of computer code, even if more supporting code is needed to make it industrial strength. The algorithm could be explained to undergraduate math majors, or sufficiently advanced high-school students. The story also has the spy-agency intrigue to make a good math club talk or a special lecture in an elementary abstract algebra course. We essentially just need to understand that an elliptic curve is an abelian group whose elements (other than the identity element) are determined by two numbers x and y, that y is the root of a quadratic when x is given, and that every non-identity element of a cyclic group of prime order is a generator. Easy stuff.
2. Without prior knowledge of the back door, how difficult would it be to rediscover the possible existence of a back door? An analysis of the argument shows the required level of creativity is that of an undergraduate homework problem. We must think to write the element P as a multiple of the generator Q in a cyclic group of prime order. This a student learns in the first weeks of undergraduate algebra.
The rest of the process of inverting the pseudo-random number generator is determined by the definition of the function itself: simply take each step defining the function and reverse the steps, asking for the preimage of the function at each step of its definition, working from the output back to the secret state s’. Once the question of inverting the function is asked, it is easy to do the group theory, even if it is computationally difficult to write e explicitly.
One-way functions are a standard tool in the cryptographer’s bag. Every professional who has been trained to analyze cryptographic algorithms knows to ask the question of invertibility. It is unsettling that NIST and others do not seem to have asked this basic question.
Diffie-Hellman key exchange
In what follows, let us assume that someone, whom we will call the Spy, has access to the back door e. How is it possible for the Spy and the end user (the User) of the NIST algorithm to come into possession of the same shared secret (the internal state of the pseudo-random number generator), when all communication between them is public? Information flows from the Spy to the User through the published NIST standard, and from the User back to the Spy through the public output of the pseudo-random generator. The back door must have a remarkable cryptographic design to permit a secret to pass across these public channels, yet prevent the secret from becoming known to a third party.
As we now explain, the design of the back door to NIST is based on a well-known algorithm in cryptography called the Diffie-Hellman key exchange [Diffie-Hellman]. This is an algorithm to share a secret between two parties, when there is a possibility that the channel of communication is being monitored. In the current context, the Spy has full knowledge of the Diffie-Hellman key exchange for what it is. However, the User participates in the exchange innocently and unwittingly, by blindly following the rules of the NIST protocol.
The Diffie-Hellman key exchange requires a group, which we will take to be a cyclic group E of order n (to preserve notation). The group E, its order n, and a generator Q are made public. To share a secret, the first party (the Spy), picks a random number e, which is kept secret, and publishes P = e * Q to the world. The second party (the User) picks a random number r, which is kept secret, and publishes r * Q. Then by Equation (1), the Spy who knows e and r *Q, and the User who knows r and e * Q can both compute (r e) * Q = r * P, which is the shared secret. (In our context, the shared secret determines the internal state s’ of the pseudo-random number generator.) If E is a group in which the public knowledge E, n, Q, P = e * Q, r * Q does not allow the easy computation of (r e) * Q, then the shared secret is protected from public disclosure by the difficulty of the computation. In this way, the only two who learn the internal state of the pseudo-random number generator are the Spy and the User.
What we have described here is not an imaginary scenario: NIST documents do in fact publish the data E, n, Q, and P, needed to initiate the Diffie-Hellman exchange. A user, when making public the output from the pseudo-random number generator, does in fact complete the exchange. Diffie-Hellman is Diffie-Hellman, whether it has been advertised as such or not.
To say that the Diffie-Hellman key exchange algorithm is well-known is a vast understatement. This algorithm is a significant lesson in virtually every first course in cryptography everywhere in the world. Building on Merkle, the Diffie-Hellman paper, by starting the entire field of public key cryptography, is one of the most influential papers in cryptography ever written.
What is the significance of all this? It is no secret that the NSA employs some of the world’s keenest cryptographic minds. They all know Diffie-Hellman. In my opinion, an algorithm that has been designed by NSA with a clear mathematical structure giving them exclusive back door access is no accident, particularly in light of the Snowden documents. This is a work of experts.
[NIST] E. Barker and J. Kelsey, Recommendation for random number generation using deterministic random bit generators. NIST Special Publication 800-90A (2012), http://csrc.nist.gov/publications/nistpubs/800-90A/SP800-90A.pdf.
[Diffie-Hellman] W. Diffie and M. Hellman, New directions in cryptography. IEEE Transactions on Information Theory 22 (1976), 644-654.
[Luby] M. Luby, Pseudorandomness and cryptographic applications, Princeton University Press, 1996.
[NYT] N. Perloth, J. Larson, and S. Shane. N.S.A. able to foil basic safeguards of privacy on web, September 5, 2013, New York Times, http://www.nytimes.com/2013/09/06/us/nsa-foils-much-internet-encryption.html.
[Shumow-Ferguson] D. Shumow and N. Ferguson, On the possibility of a back door in the NIST SP800-90 dual EC PRNG, http://rump2007.cr.yp.to/15-shumow.pdf, 2007.
[Wired] K. Zetter, How a crypto `backdoor’ pitted the tech world against the NSA, Wired (Sept 24, 2013), http://www.wired.com/threatlevel/2013/09/nsa-backdoor/