Forward FFT (or NTT)

A-16.2 Forward FFT (or NTT)

We assume a polynomial ring of either $ℝ [X] ∕ X^{n} + 1$ for FTT, and $ℤ {[X]}_{p} ∕ X^{n} + 1$ for NTT (where $X^{n} + 1$ is a cyclotomic polynomial). The $x$ coordinates to evaluate the target polynomial are $n$ distinct $n$ -th roots of unity that satisfy $X^{n} = 1$ , which are: ${1, ω, ω^{2}, \dots, ω^{n - 1}}$ . In the case of FFT (i.e., $ℝ [X] ∕ X^{n} + 1$ ), the primitive $n$ -th root of unity is $ω = e^{\frac{2 𝑖𝜋}{n}}$ . In the case of NTT (i.e., $ℤ_{p} [X] ∕ X^{n} + 1$ where $p$ is a prime), the primitive $n$ -th root of unity $ω = g^{\frac{p - 1}{n}}$ , where $g$ is the generator of $ℤ_{p}^{\times}$ and $n$ divides $p - 1$ (a variation of Summary A-10.5 in §A-10.5).

Then, the point-value representation of polynomial $A (X)$ is $((x_{0}, y_{0}^{⟨ a ⟩}), (x_{1}, y_{1}^{⟨ a ⟩}), \dots (x_{n - 1}, y_{n - 1}^{⟨ a ⟩}))$ , where:

$y_{i}^{⟨ a ⟩} = A (ω^{i}) = \sum_{j = 0}^{n - 1} a_{j} \cdot {(ω^{i})}^{j} = \sum_{j = 0}^{n - 1} a_{j} \cdot ω^{𝑖𝑗}$ .

We call the vector ${\vec{y}}^{⟨ a ⟩} = (y_{0}^{⟨ a ⟩}, y_{1}^{⟨ a ⟩}, \dots, y_{n - 1}^{⟨ a ⟩})$ the Discrete Fourier Transform (DFT) of the coefficient vector $\vec{a} = (a_{0}, a_{1}, \dots, a_{n - 1})$ . We write this as ${\vec{y}}^{⟨ a ⟩} = 𝖣𝖥𝖳 (\vec{a})$ . As explained in §A-16.1, the computation of DFT takes $O (n^{2})$ , because we have to evaluate $n$ distinct $X$ values for a polynomial which has $n$ terms.

A-16.2.1 High-level Idea

FFT (or NTT) is an improved way of computing DFT which reduces the time complexity from $O (n^{2})$ to $O (n \log n)$ . The high-level idea of FFT is to split the $(n - 1)$ -degree (or lesser degree) target polynomial $A (X)$ to evaluate into 2 half-degree polynomials $A_{0} (X)$ and $A_{1} (X)$ as follows:

$A (X) = a_{0} + a_{1} X + a_{2} X^{2} + \dots + a_{n - 1} X^{n - 1}$

$A (X) = A_{0} (X^{2}) + X \cdot A_{1} (X^{2})$ $A_{0} (X) = a_{0} + a_{2} X + a_{4} X^{2} + \dots + a_{n - 2} X^{\frac{n}{2} - 1}$

$A_{1} (X) = a_{1} + a_{3} X + a_{5} X^{2} + \dots + a_{n - 2} X^{\frac{n}{1} - 1}$

The above way of splitting a polynomial into two half-degree polynomials is called the Cooley-Tukey step. As we split $A (X)$ into two smaller-degree polynomials $A_{0} (X)$ and $A_{1} (X)$ , evaluating $A (X)$ at $n$ distinct $n$ -th roots of unity ${ω^{0}, ω^{1}, ω^{2}, \dots, ω^{n - 1}}$ is equivalent to evaluating $A_{0} (X)$ and $A_{1} (X)$ at $n$ distinct squared $n$ -th roots of unity ${{(ω^{2})}^{0}, {(ω^{2})}^{1}, {(ω^{2})}^{2}, \dots, {(ω^{2})}^{n - 1}}$ and computing $A_{0} (X^{2}) + X \cdot A_{1} (X^{2})$ . However, remember that the primitive $n$ -th root of unity $ω$ has order $n$ (i.e., $ω^{n} = 1$ and $ω^{m} \neq 1$ for all $m < n$ ). Therefore, the second half of ${{(ω^{2})}^{0}, {(ω^{2})}^{1}, {(ω^{2})}^{2}, \dots, {(ω^{2})}^{n - 1}}$ is a repetition of the first half. This implies that we only need to evaluate $A_{0} (X)$ and $A_{1} (X)$ at $\frac{n}{2}$ distinct $x$ coordinates each instead of $n$ distinct coordinates, because the polynomial evaluation results for the other half are the same as those of the first half (as their input $x$ to the polynomial is the same).

We recursively split $A_{0} (X)$ and $A_{1} (X)$ into half-degree polynomials and evaluate them at half-counted $n$ -th roots of unity. Then, the total rounds of splitting are $\log n$ , and each round’s maximum number of root-to-coefficient multiplications is $n$ , which aggregates to $O (n \log n)$ .

A-16.2.2 Details

Suppose we have a polynomial ring which is either $ℤ_{p} [X] ∕ X^{8} + 1$ (i.e., over a finite field with prime $p$ ) or $ℝ [X] ∕ X^{8} + 1$ (over complex numbers). We denote the primitive $(n = 8)$ -th roots of unity as $ω$ , and the $n$ distinct $(n = 8)$ -th roots of unity are: ${ω^{0}, ω^{1}, ω^{2}, ω^{3}, ω^{4}, ω^{5}, ω^{6}, ω^{7}}$ .

Now, we define our target polynomial to evaluate as follows:

$A (X) = a_{0} + a_{1} X + a_{2} X^{2} + a_{3} X^{3} + a_{4} X^{4} + a_{5} X^{5} + a_{6} X^{6} + a_{7} X^{7}$

We split this 7-degree polynomial into the following two 3-degree polynomials (using the Cooley-Tukey step):

$A_{0} (X) = a_{0} + a_{2} X + a_{4} X^{2} + a_{6} X^{3}$

$A_{1} (X) = a_{1} + a_{3} X + a_{5} X^{2} + a_{7} X^{3}$

$A (X) = A_{0} (X^{2}) + X \cdot A_{1} (X^{2})$

We recursively split the above two 3-degree polynomials into 1-degree polynomials as follows:

$A_{0, 0} (X) = a_{0} + a_{4} X$ , ... $A_{0, 1} (X) = a_{2} + a_{6} X$

$A_{0} (X) = A_{0, 0} (X^{2}) + X \cdot A_{0, 1} (X^{2})$

$A_{1, 0} (X) = a_{1} + a_{5} X$ , ... $A_{1, 1} (X) = a_{3} + a_{7} X$

$A_{1} (X) = A_{1, 0} (X^{2}) + X \cdot A_{1, 1} (X^{2})$

$A (X) = A_{0} (X^{2}) + X \cdot A_{1} (X^{2})$

$A (X) = \underset{FFT Level 3}{\underset{⏟}{\underset{FFT Level 2}{\underset{⏟}{(\underset{FFT Level 1}{\underset{⏟}{A_{0, 0} (X^{4})}} + X^{2} \cdot \underset{FFT Level 1}{\underset{⏟}{A_{0, 1} (X^{4})}})}} + X \cdot \underset{FFT Level 2}{\underset{⏟}{(\underset{FFT Level 1}{\underset{⏟}{A_{1, 0} (X^{4})}} + X^{2} \cdot \underset{FFT Level 1}{\underset{⏟}{A_{1, 1} (X^{4})}})}}}}$

To evaluate $A (X)$ at $n$ distinct roots of unity $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ , we evaluate the above formula’s each FFT level at $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ , from level $1 \leq l \leq 3$ .

FFT Level $𝐥 = 1$ : We evaluate $A_{0, 0} (X^{4})$ , $A_{0, 1} (X^{4})$ , $A_{1, 0} (X^{4})$ , and $A_{1, 1} (X^{4})$ at $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ . However, notice that plugging in $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ to $X^{4}$ results in only 2 distinct values: $ω^{0}$ and $ω^{4}$ . This is because the order of $ω$ is $n$ (i.e., $ω^{n} = 1$ ), and thus ${(ω^{0})}^{4} = {(ω^{2})}^{4} = {(ω^{4})}^{4} = {(ω^{6})}^{4}$ , and ${(ω^{1})}^{4} = {(ω^{3})}^{4} = {(ω^{5})}^{4} = {(ω^{7})}^{4}$ . Therefore, we only need to evaluate $A_{0, 0} (X^{4})$ , $A_{0, 1} (X^{4})$ , $A_{1, 0} (X^{4})$ , and $A_{1, 1} (X^{4})$ at 2 distinct $x$ values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. As there are a total of 4 polynomials to evaluate (i.e., $A_{0, 1} (X^{4}), A_{0, 1} (X^{4}), A_{1, 0} (X^{4}), A_{1, 1} (X^{4})$ ), we compute FFT a total of $4 \cdot 2 = 8$ times.

FFT Level $𝐥 = 2$ : Based on the evaluation results from FFT Level 1 as building blocks, we evaluate $A_{0} (X^{2})$ and $A_{1} (X^{2})$ at $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ . However, notice that plugging in $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ to $X^{2}$ results in only 4 distinct values: $ω^{0}$ , $ω^{2}$ , $ω^{4}$ , and $ω^{6}$ . This is because the order of $ω$ is $n$ (i.e., $ω^{n} = 1$ ), and thus ${(ω^{0})}^{2} = {(ω^{4})}^{2}$ , ${(ω^{1})}^{2} = {(ω^{5})}^{2}$ , ${(ω^{2})}^{2} = {(ω^{6})}^{2}$ , and ${(ω^{3})}^{2} = {(ω^{7})}^{2}$ . Therefore, we only need to evaluate $A_{0} (X^{2})$ and $A_{1} (X^{2})$ at 4 distinct $x$ values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition (where we use the results from FFT Level 1 as building blocks and FFT Level 2’s computation structure is the same as that of FFT Level 1). There are a total of 2 polynomials to evaluate (i.e., $A_{0} (X^{2}), A_{1} (X^{2})$ ), thus we compute FFT a total of $2 \cdot 4 = 8$ times.

FFT Level $𝐥 = 3$ : Based on the evaluation results from FFT Level 2 as building blocks, we evaluate $A (X)$ at $X = {ω^{0}, ω^{1}, \dots, ω^{7}}$ . For this last level of computation, we need to evaluate all 8 distinct $X$ values, since they are all unique values, and each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. There is a total of 1 polynomial to evaluate (i.e., $A (X)$ ), thus we compute FFT a total of $1 \cdot 8 = 8$ times.

Generalization: Suppose that the degree of the target polynomial to evaluate is bound by $n$ degree and we define $L = \log n$ (i.e., the total number of FFT levels). Then, the forward FFT operation requires a total of $L$ FFT levels, where each $l$ -th level requires the evaluation of $2^{L - l}$ polynomials at $2^{l}$ distinct $X$ values. Therefore, the total number of FFT computations for forward FFT is: $\log (n) \cdot (2^{L - l} \cdot 2^{l}) = 2^{L} \log n = n \log n$ . Therefore, the time complexity of forward FFT is $O (n \log n)$ .

Using the FFT technique, we reduce the number of $x$ points to evaluate into half as the level goes down (while the number of polynomials to evaluate doubles), and their growth and reduction cancel each other, resulting in $O (n)$ for each level. Since there are $\log n$ such levels, the total time complexity is $O (n \log n)$ . The core enabler of this optimization is the special property of the $x$ evaluation coordinates: its power (i.e., $ω^{i}$ ) is cyclic. To enforce this cyclic property, FFT requires the evaluation points of $x$ to be the $n$ -th roots of unity.

[prev][parent][next]