Modular Arithmetic I - Montgomery

The series of Modular Arithmetic will introduce readers to basic modular arithmetic knowledge, and some classical algorithms. I plan to divide it into 3 parts: Montgomery, Barrett, and Look-Up Table. Since I don't have an ECE background, all of them will mainly focus on mathematical perspectives. This is the first part of the series - Montgomery algorithms.


The article will introduce the following things:

  • Why do we need modular arithmetic algorithms?
  • Montgomery reduction
  • Montgomery multiplication
    • Different ways to use MontREDC due to different purposes
  • Montgomery modulo
  • Montgomery exponentiation
  • A NTT-friendly Montgomery REDC
    • How do we merge Mont algorithms into NTT

Why do we need modular arithmetic algorithms?

How do you calculate \(r = a \bmod n \in [0,n)\)?

Intuitively, we calculate it by calculating \(\frac{a}{n}\) and find the quotient and remainder by hand. But how does the computer calculate it?

There are 2 native ways to calculate \(r = a \bmod n\).

  1. \(r = a - n - n - \cdots - n\), such that \(r \in [0, n)\);
  2. \(r = a - pq\), where \(p = \lfloor \frac{a}{n} \rfloor\) is the quotient and \(q\) is the greatest integer to let \(r \in [0, n)\).

The problem of 1) is that \(a\) can be relatively too large compared to \(n\), thus the round of subtraction will make the algorithm very inefficient. For 2), the issue is brought up by \(\lfloor \frac{a}{n} \rfloor\), which needs division that is hard for computer hardware to perform.

You may also wonder: but I can actually code division in Python/C++/etc, and it's not that slow? That is because the compiler does the optimisation for you. However, if your work is more on low-level design / hardware design etc, i.e. those works that the compiler cannot do for you, you will need to work with modular arithmetic for computer and hardware.

Montgomery's method is one of the approaches for fast modular arithmetic. The algorithms include: reduction, multiplication, big number modulo, and exponentiation.

Montgomery Reduction

We first look at Montgomery reduction. It does the following thing: let \(N\in [0,2^{k})\) be the modulus fitting in \(k\) bit. There is a function taking input \(z \in [0, N\cdot 2^k)\) and calculating \(F(z)=zR^{-1}\bmod N\). Here, to be precise, we use "\(=\)" to represent exact equality and "\(\equiv\)" to represent congruence.

Thus, the function will have the following: \[ \begin{aligned} F(z) &= zR^{-1} \bmod N\\ F(zR) &= z \bmod N \end{aligned} \] Now we want to know how to set \(R\). In fact, to make the further calculation easier, we want \(R\) to be a power of \(2\) and slightly greater than \(N\), so one good choice for \(R\) will be \(2^k\). And later we will require \(gcd(R,N) = 1\), thus \(N\) is odd.

By extended Euclidean algorithm \(ax+by = gcd(a,b)\), we can make \(RR'-NN' = gcd(R,N) = 1\). Then the property of \(R'\) and \(N'\) are: \[ \begin{aligned} RR' &\equiv 1 \bmod N,\\ -NN' &\equiv 1 \bmod R. \end{aligned} \] So we can first calculate: \[ \begin{aligned} R' \in [0,N),\\ N' \in [0,N). \end{aligned} \] \(R'\) is just the inverse of \(R \bmod N\) and \(N'\) is the minus inverse of \(N \bmod R\). Now we derive the equation: \[z = z \cdot 1 = z \cdot (RR'-NN') \longrightarrow z + zNN' = zRR'\] Remember we first want to obtain the reduction function, which is calculating \(zR^{-1} \bmod N \in [0,N)\). Thus, we divide the right-hand side by \(R\) to obtain: \[ \begin{aligned} zR' \mod N &= \frac{zRR'}{R} \mod N,\\ \frac{zRR'}{R} = \frac{z+zNN'}{R} &\equiv \frac{z+zNN'+kRN}{R} \mod N. \end{aligned} \] Let \(m = zN'+kR\), then \(\frac{zRR'}{R} \equiv \frac{z+mN}{R} \mod N\). We wish to calculate and constrain \(\frac{z+mN}{R}\) in the range \([0,2N)\).

So, the range of \(m\) should be: \[ \begin{aligned} &\frac{z+mN}{R} \in [0,2N)\\ &\Longrightarrow z+mN \in [0,2NR)\\ &\Longrightarrow mN \in [0,NR)\\ &\Longrightarrow m \in [0,R) \end{aligned} \] Apparently, we can use \(\bmod R\) to make it in the range. There are 2 ways to make this happen: 1. \(m = ((z \bmod R)N') \bmod R\) 2. \(m = (zN')\bmod R\)

We usually adapt the first one, since the second one will require a \(2k \cdot k\) multiplication. It sort of relates to how your DSP is designed. Therefore, we finally get \(zR^{-1} \equiv \frac{z+mN}{R}\in [0,2N)\), where \(m = ((z \bmod R)N') \bmod R\). The final exact answer will be \(\frac{z+mN}{R}\) or \(\frac{z+mN}{R} - N\).

Code

1
2
3
4
5
6
7
8
9
def mont_redc(z, N):
r_bit = N.bit_length()
m = ((z & ((1<<r_bit)-1)) * n_neg_inv) & ((1<<r_bit)-1)
res = (z + m * N) >> r_bit
count = 0
while(res >= N):
count+=1
res-= modulus
return res

Montgomery Multiplication

We aim to calculate \(a\cdot b \bmod N\) by only using REDC function.

\[ \begin{aligned} a\cdot b \bmod N &= abRR' \bmod N = REDC(abR)\\ &= REDC(abRRR') = REDC(REDC(abRR))\\ &= REDC(REDC((aR\bmod N)(bR\bmod N))) \\ &= REDC(REDC((aRRR'\bmod N)(bRRR'\bmod N)))\\ &= REDC(REDC(REDC(a(R^2\bmod N))\cdot REDC(b(R^2\bmod N)))) \end{aligned} \] Thus, if we define: \[ \begin{aligned} \hat{a} &= aR \bmod N = REDC(a(R^2\bmod N)),\\ \hat{b} &= bR \bmod N = REDC(b(R^2\bmod N)), \end{aligned} \]

We can obtain: \[ a\cdot b \bmod N = REDC(REDC(\hat{a}\hat{b})) \] ### Code

1
2
3
4
5
6
7
8
precal_r2 = ((1<<N.bit_length()) ** 2) % N
def mont_mult(a, b, N, precal_r2):
a_mont = mont_redc(a * precal_r2)
b_mont = mont_redc(b * precal_r2)
abrr = a_mont * b_mont
abr = mont_redc(abrr, N)
ab = mont_redc(abr, N)
return ab

Montgomery Mod

Montgomery Exponentiation

Montgomery Variants: for NTT friendly primes

But How?

I'll finish this last section after I finish writing Fast and Fourier series, because some contents are related.