Modular inversion is a common mathematical operation that is given within cryptographic algorithms based on finite groups generated by a prime number. Mainly, these algorithms are related to public key cryptographic, specially, to Elliptic Curve Cryptography (ECC). The way to compute a modular inverse is always hard; at least, it is roughly 100 times harder than the opposite operation, the modular product.

# Algorithms & performance

Basically there are two main algorithms to calculate modular inverses: the one based on Euler’s totient theorem, and the other based on the Euclides GCD algorithm.

## Algorithm based on Euler’s totient

Euler’s totient theorem states, for any $a$ coprime of $p$ (so, if $p$ is prime, for all $a > 1$):

$a^{\phi(p)} = 1 \;(\mathrm{mod}\ p)$

where $\phi(p)$ is the Euler’s totient function, which for $p$ prime is $\phi(p) = p -1$; so, we can obtain the inverse by computing a modular exponentiation:

$a^{-1} \,\mathrm{mod}\ p = a^{\phi(p) - 1} \;\mathrm{mod}\ p = a^{p-2} \;\mathrm{mod}\ p$

But modular exponentiation is quite costfull. This solution implies at least $\log_2(p)$ modular squarings and about half of products (or less using some tricks).

## Algorithm based on Euclides’ GCD algorithm

I write down one of the possibles Euclidean algorithms for modular inverse in pseudocode (taken from Elliptic Curve Cryptography by D. Hanckerson, A. Menezes & S. Vanstone):

INPUT : Prime p and a ∈ [1, p − 1].
OUTPUT : a^(−1) mod p.
1. u ← a, v ← p.
2. x1 ← 1, x2 ← 0.
3. While (u != 1 and v != 1) do
3.1 While u is even do
u ← u/2.
If x1 is even then x1 ← x1 /2; else x1 ← (x1 + p)/2.
3.2 While v is even do
v ← v/2.
If x2 is even then x2 ← x2 / 2; else x2 ← (x2 + p)/2.
3.3 If u ≥ v then: u ← u − v, x1 ← x1 − x2 ;
Else: v ← v − u, x2 ← x2 − x1 .
4. If u = 1 then return(x1 mod p); else return(x2 mod p).


This is quite faster than the other (but still very slow compared to product).

# Algorithms & side-channel attacks

The Euclidean algorithm has the drawback that its execution time depends on the input $a$. See, for example, the lines 3.1, 3.2, 3.3 where the code can go quicker or slower depending on the “if” result. Then, if $a$ is sensible data —say a private key—, the algorithm can leak information about it through timing or power analysis.

On the other side, Euler algorithm, despite its bad performance, it is constant time and does not depend on the input. This is why Daniel Bernstein elected it for his Curve25519 and Ed25519 projects.

## A simple way to use Euclidean algorith safely

I know that there are efforts to modify Euclidean algorithm in a way it computes the modular inverse at constant time, without depending on the input $a$.

But there is a trick that allows using Euclidean algorithm without leaking information about the inverted number:

A secret number $0 < k < p$ is obtained randomly. We can force $k$ to have set the highest possible bit. So, we perform the following operations to obtain the modular inverse of $a$:

$s = ka \,\mathrm{mod}\ p$ $s' = \mathrm{EuclideanInverse}(s, p)$ $a^{-1} = ks' \,\mathrm{mod}\ p$

It works because

$ks' \,\mathrm{mod}\ p = k/(ka) \,\mathrm{mod}\ p = 1/a \,\mathrm{mod}\ p$

The algorithm may leak information about $s$, which is usesless for an attacker because $k$ is secret. Indeed, if an inversion of same parameter is done, a different $k$ is obtained, and then it is impossible to harvest any information about $a$.

So by the price of two multiplications, we avoid the algorithm to leak information about sensible data. It can be even cheaper, because the Euclidean algorithm can perform $k/s \pmod p$ rightly replacing x1 ← 1 by x1 ← k in the line 2. of the code shown above. So,

$a^{-1} = \mathrm{EuclideanDivision}(k, s, p)$

P.S. 2017-07-24 Of course this is not a new trick. It is very well known. Even dbj knows it, as he write in the first article that describes Curve25519:

This (Euler inversion) is about 7% of the Curve25519 computation. An extended-Euclid inversion of $z$, randomized to protect against timing attacks, might be faster, but the maximum potential speedup is very small, while the cost in code complexity is large.

Well, I simply do not agree. With the The GNU Multiple Precision Arithmetic Library the code is quite simple!