A-16.2 Forward FFT (or NTT)

We assume a polynomial ring of either [X]Xn + 1 for FTT, and [X]pXn + 1 for NTT (where Xn + 1 is a cyclotomic polynomial). The x coordinates to evaluate the target polynomial are n distinct n-th roots of unity that satisfy Xn = 1, which are: {1,ω,ω2,,ωn1}. In the case of FFT (i.e., [X]Xn + 1), the primitive n-th root of unity is ω = e2𝑖𝜋 n . In the case of NTT (i.e., p[X]Xn + 1 where p is a prime), the primitive n-th root of unity ω = gp1 n , where g is the generator of p× and n divides p 1 (a variation of Summary A-10.5 in §A-10.5).

Then, the point-value representation of polynomial A(X) is ((x0,y0a),(x1,y1a),(xn1,yn1a)), where:

yia = A(ωi) = j=0n1aj (ωi)j = j=0n1aj ω𝑖𝑗.

We call the vector ya = (y0a,y1a,,yn1a) the Discrete Fourier Transform (DFT) of the coefficient vector a = (a0,a1,,an1). We write this as ya = 𝖣𝖥𝖳(a). As explained in §A-16.1, the computation of DFT takes O(n2), because we have to evaluate n distinct X values for a polynomial which has n terms.

A-16.2.1 High-level Idea

FFT (or NTT) is an improved way of computing DFT which reduces the time complexity from O(n2) to O(nlogn). The high-level idea of FFT is to split the (n 1)-degree (or lesser degree) target polynomial A(X) to evaluate into 2 half-degree polynomials A0(X) and A1(X) as follows:

A(X) = a0 + a1X + a2X2 + + an1Xn1

A(X) = A0(X2) + X A1(X2) A0(X) = a0 + a2X + a4X2 + + an2Xn 2 1

A1(X) = a1 + a3X + a5X2 + + an2Xn 1 1

The above way of splitting a polynomial into two half-degree polynomials is called the Cooley-Tukey step. As we split A(X) into two smaller-degree polynomials A0(X) and A1(X), evaluating A(X) at n distinct n-th roots of unity {ω0,ω1,ω2,,ωn1} is equivalent to evaluating A0(X) and A1(X) at n distinct squared n-th roots of unity {(ω2)0,(ω2)1,(ω2)2,,(ω2)n1} and computing A0(X2) + X A1(X2). However, remember that the primitive n-th root of unity ω has order n (i.e., ωn = 1 and ωm1 for all m < n). Therefore, the second half of {(ω2)0,(ω2)1,(ω2)2,,(ω2)n1} is a repetition of the first half. This implies that we only need to evaluate A0(X) and A1(X) at n 2 distinct x coordinates each instead of n distinct coordinates, because the polynomial evaluation results for the other half are the same as those of the first half (as their input x to the polynomial is the same).

We recursively split A0(X) and A1(X) into half-degree polynomials and evaluate them at half-counted n-th roots of unity. Then, the total rounds of splitting are logn, and each round’s maximum number of root-to-coefficient multiplications is n, which aggregates to O(nlogn).

A-16.2.2 Details

Suppose we have a polynomial ring which is either p[X]X8 + 1 (i.e., over a finite field with prime p) or [X]X8 + 1 (over complex numbers). We denote the primitive (n = 8)-th roots of unity as ω, and the n distinct (n = 8)-th roots of unity are: {ω0,ω1,ω2,ω3,ω4,ω5,ω6,ω7}.

Now, we define our target polynomial to evaluate as follows:

A(X) = a0 + a1X + a2X2 + a3X3 + a4X4 + a5X5 + a6X6 + a7X7

We split this 7-degree polynomial into the following two 3-degree polynomials (using the Cooley-Tukey step):

A0(X) = a0 + a2X + a4X2 + a6X3

A1(X) = a1 + a3X + a5X2 + a7X3

A(X) = A0(X2) + X A1(X2)

We recursively split the above two 3-degree polynomials into 1-degree polynomials as follows:

A0,0(X) = a0 + a4X, ... A0,1(X) = a2 + a6X

A0(X) = A0,0(X2) + X A0,1(X2)

A1,0(X) = a1 + a5X, ... A1,1(X) = a3 + a7X

A1(X) = A1,0(X2) + X A1,1(X2)

A(X) = A0(X2) + X A1(X2)

A(X) = (A0,0(X4) FFT Level 1 + X2 A 0,1(X4) FFT Level 1)FFT Level 2 + X (A1,0(X4) FFT Level 1 + X2 A 1,1(X4) FFT Level 1)FFT Level 2FFT Level 3

To evaluate A(X) at n distinct roots of unity X = {ω0,ω1,,ω7}, we evaluate the above formula’s each FFT level at X = {ω0,ω1,,ω7}, from level 1 l 3.

FFT Level 𝐥 = 1: We evaluate A0,0(X4), A0,1(X4), A1,0(X4), and A1,1(X4) at X = {ω0,ω1,,ω7}. However, notice that plugging in X = {ω0,ω1,,ω7} to X4 results in only 2 distinct values: ω0 and ω4. This is because the order of ω is n (i.e., ωn = 1), and thus (ω0)4 = (ω2)4 = (ω4)4 = (ω6)4, and (ω1)4 = (ω3)4 = (ω5)4 = (ω7)4. Therefore, we only need to evaluate A0,0(X4), A0,1(X4), A1,0(X4), and A1,1(X4) at 2 distinct x values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. As there are a total of 4 polynomials to evaluate (i.e., A0,1(X4),A0,1(X4),A1,0(X4),A1,1(X4)), we compute FFT a total of 4 2 = 8 times.

FFT Level 𝐥 = 2: Based on the evaluation results from FFT Level 1 as building blocks, we evaluate A0(X2) and A1(X2) at X = {ω0,ω1,,ω7}. However, notice that plugging in X = {ω0,ω1,,ω7} to X2 results in only 4 distinct values: ω0, ω2, ω4, and ω6. This is because the order of ω is n (i.e., ωn = 1), and thus (ω0)2 = (ω4)2, (ω1)2 = (ω5)2, (ω2)2 = (ω6)2, and (ω3)2 = (ω7)2. Therefore, we only need to evaluate A0(X2) and A1(X2) at 4 distinct x values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition (where we use the results from FFT Level 1 as building blocks and FFT Level 2’s computation structure is the same as that of FFT Level 1). There are a total of 2 polynomials to evaluate (i.e., A0(X2),A1(X2)), thus we compute FFT a total of 2 4 = 8 times.

FFT Level 𝐥 = 3: Based on the evaluation results from FFT Level 2 as building blocks, we evaluate A(X) at X = {ω0,ω1,,ω7}. For this last level of computation, we need to evaluate all 8 distinct X values, since they are all unique values, and each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. There is a total of 1 polynomial to evaluate (i.e., A(X)), thus we compute FFT a total of 1 8 = 8 times.

Generalization: Suppose that the degree of the target polynomial to evaluate is bound by n degree and we define L = logn (i.e., the total number of FFT levels). Then, the forward FFT operation requires a total of L FFT levels, where each l-th level requires the evaluation of 2Ll polynomials at 2l distinct X values. Therefore, the total number of FFT computations for forward FFT is: log(n) (2Ll 2l) = 2L logn = nlogn. Therefore, the time complexity of forward FFT is O(nlogn).

Using the FFT technique, we reduce the number of x points to evaluate into half as the level goes down (while the number of polynomials to evaluate doubles), and their growth and reduction cancel each other, resulting in O(n) for each level. Since there are logn such levels, the total time complexity is O(nlogn). The core enabler of this optimization is the special property of the x evaluation coordinates: its power (i.e., ωi) is cyclic. To enforce this cyclic property, FFT requires the evaluation points of x to be the n-th roots of unity.