We assume a polynomial ring of for FFT, and for NTT (where is a cyclotomic polynomial). The coordinates to evaluate the target polynomial are the distinct roots of , which are , , where is the primitive -th root of unity. Then, the point-value representation of the polynomial is , where:
.
We call the vector the Discrete Fourier Transform (DFT) of the coefficient vector . We write this as . As explained in §A-16.1, the computation of the DFT takes , because we have to evaluate distinct values for a polynomial that has terms.
FFT (or NTT) is an improved method for computing the DFT, which reduces the time complexity from to . The high-level idea of FFT is to split the -degree (or lesser degree) target polynomial to evaluate into 2 half-degree polynomials and as follows:
The above method of splitting a polynomial into two half-degree polynomials is called the Cooley-Tukey step. As we split into two smaller-degree polynomials and , evaluating at the odd-powered primitive -th roots of unity is equivalent to evaluating and at distinct squared -th roots of unity and computing . However, remember that the primitive -th root of unity has order (i.e., and for all ). Therefore, the second half of is a repetition of the first half. This implies that we only need to evaluate and at distinct coordinates each, instead of distinct coordinates, because the polynomial evaluation results for the other half are the same as those of the first half (as their input to the polynomial is the same).
We recursively split and into half-degree polynomials and evaluate them at half-counted (i.e., ) -th roots of unity. Then, the total number of rounds of splitting is , and the maximum number of root-to-coefficient multiplications in each round is , which aggregates to .
Suppose we have a polynomial ring that is either (i.e., over a finite field with prime ) or (over complex numbers). We denote by a primitive -th root of unity, and the 8 distinct roots of are: .
Now, we define our target polynomial to evaluate as follows:
We split this 7-degree polynomial into the following two 3-degree polynomials (using the Cooley-Tukey step):
We recursively split the two 3-degree polynomials above into 1-degree polynomials as follows:
, ...
, ...
To evaluate at the distinct roots , we evaluate each FFT level of the above formula at , starting from level .
FFT Level : We evaluate , , , and at . However, notice that plugging in to results in only 2 distinct values: and . This is because the order of is (i.e., ), and thus , and . Therefore, we only need to evaluate , , , and at 2 distinct values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. As there are a total of 4 polynomials to evaluate (i.e., ), we compute the FFT a total of times.
FFT Level : Based on the evaluation results from FFT Level 1 as building blocks, we evaluate and at . However, notice that plugging in to results in only 4 distinct values: , , , and . This is because the order of is (i.e., ), and thus , , , and . Therefore, we only need to evaluate and at 4 distinct values instead of 8, where each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition (where we use the results from FFT Level 1 as building blocks, and the computational structure of FFT Level 2 is the same as that of FFT Level 1). There are a total of 2 polynomials to evaluate (i.e., ); thus, we compute the FFT a total of times.
FFT Level : Based on the evaluation results from FFT Level 2 as building blocks, we evaluate at . For this last level of computation, we need to evaluate all 8 distinct values, since they are all unique values, and each evaluation requires a constant number of arithmetic operations: computing 1 multiplication and 1 addition. There is a total of 1 polynomial to evaluate (i.e., ); thus, we compute the FFT a total of times.
Generalization: Suppose that the degree of the target polynomial to evaluate is at most (with terms), and we define (i.e., the total number of FFT levels). Then, the forward FFT operation requires a total of FFT levels, where each -th level requires the evaluation of polynomials at distinct values. Therefore, the total number of FFT computations for the forward FFT is: . Therefore, the time complexity of the forward FFT is .
Using the FFT technique, we reduce the number of points to evaluate to half as the level decreases (while the number of polynomials to evaluate doubles), and their growth and reduction cancel each other, resulting in for each level. Since there are such levels, the total time complexity is . The core enabler of this optimization is the special property of the evaluation coordinates: its power (i.e., ) is cyclic. To enforce this cyclic property, FFT requires the evaluation points of to be the odd-powered primitive -th roots of unity.