We assume a polynomial ring of either for FTT, and for NTT (where is a cyclotomic polynomial). The coordinates to evaluate the target polynomial are distinct -th roots of unity that satisfy , which are: . In the case of FFT (i.e., ), the primitive -th root of unity is . In the case of NTT (i.e., where is a prime), the primitive -th root of unity , where is the generator of and divides (a variation of Summary A-10.5 in §A-10.5).
Then, the point-value representation of polynomial is , where:
.
We call the vector the Discrete Fourier Transform (DFT) of the coefficient vector . We write this as . As explained in §A-16.1, the computation of DFT takes , because we have to evaluate distinct values for a polynomial which has terms.
FFT (or NTT) is an improved way of computing DFT which reduces the time complexity from to . The high-level idea of FFT is to split the -degree (or lesser degree) target polynomial to evaluate into 2 half-degree polynomials and as follows:
The above way of splitting a polynomial into two half-degree polynomials is called the Cooley-Tukey step. As we split into two smaller-degree polynomials and , evaluating at distinct -th roots of unity is equivalent to evaluating and at distinct squared -th roots of unity and computing . However, remember that the primitive -th root of unity has order (i.e., and for all ). Therefore, the second half of is a repetition of the first half. This implies that we only need to evaluate and at distinct coordinates each instead of distinct coordinates, because the polynomial evaluation results for the other half are the same as those of the first half (as their input to the polynomial is the same).
We recursively split and into half-degree polynomials and evaluate them at half-counted -th roots of unity. Then, the total rounds of splitting are , and each round’s maximum number of root-to-coefficient multiplications is , which aggregates to .
Suppose we have a polynomial ring which is either (i.e., over a finite field with prime ) or (over complex numbers). We denote the primitive -th roots of unity as , and the distinct -th roots of unity are: .
Now, we define our target polynomial to evaluate as follows:
We split this 7-degree polynomial into the following two 3-degree polynomials (using the Cooley-Tukey step):
We recursively split the above two 3-degree polynomials into 1-degree polynomials as follows:
, ...
, ...
To evaluate at distinct roots of unity , we evaluate the above formula’s each FFT level at , from level .
FFT Level : We evaluate , , , and at . However, notice that plugging in to results in only 2 distinct values: and . This is because the order of is (i.e., ), and thus , and . Therefore, we only need to evaluate , , , and at 2 distinct values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. As there are a total of 4 polynomials to evaluate (i.e., ), we compute FFT a total of times.
FFT Level : Based on the evaluation results from FFT Level 1 as building blocks, we evaluate and at . However, notice that plugging in to results in only 4 distinct values: , , , and . This is because the order of is (i.e., ), and thus , , , and . Therefore, we only need to evaluate and at 4 distinct values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition (where we use the results from FFT Level 1 as building blocks and FFT Level 2’s computation structure is the same as that of FFT Level 1). There are a total of 2 polynomials to evaluate (i.e., ), thus we compute FFT a total of times.
FFT Level : Based on the evaluation results from FFT Level 2 as building blocks, we evaluate at . For this last level of computation, we need to evaluate all 8 distinct values, since they are all unique values, and each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. There is a total of 1 polynomial to evaluate (i.e., ), thus we compute FFT a total of times.
Generalization: Suppose that the degree of the target polynomial to evaluate is bound by degree and we define (i.e., the total number of FFT levels). Then, the forward FFT operation requires a total of FFT levels, where each -th level requires the evaluation of polynomials at distinct values. Therefore, the total number of FFT computations for forward FFT is: . Therefore, the time complexity of forward FFT is .
Using the FFT technique, we reduce the number of points to evaluate into half as the level goes down (while the number of polynomials to evaluate doubles), and their growth and reduction cancel each other, resulting in for each level. Since there are such levels, the total time complexity is . The core enabler of this optimization is the special property of the evaluation coordinates: its power (i.e., ) is cyclic. To enforce this cyclic property, FFT requires the evaluation points of to be the -th roots of unity.