The Frobenius FFT

Let

be the finite field with

elements and let

be a primitive

-th root of unity in an extension field

. Given a polynomial

of degree less than

, we will show that its discrete Fourier transform

can be computed essentially

times faster than the discrete Fourier transform of a polynomial

of degree less than

, in many cases. This result is achieved by exploiting the symmetries provided by the Frobenius automorphism of

over

Keywords

A.C.M. subject classification: G.1.0 Computer-arithmetic, F.2.1 Number-theoretic computations

1.Introduction

Let

and let

be a primitive

-th root of unity in

. The traditional algorithm for computing discrete Fourier transforms [7] takes a polynomial

of degree

on input and returns the vector

. If

actually admits real coefficients, then we have

for all

, so roughly

of the output values are superfluous. Because of this symmetry, it is well known that such “real FFTs” can be computed roughly twice as fast as their complex counterparts [3, 25].

More generally, there exists an abundant literature on the computation of FFTs of polynomials with various types of symmetry. Crystallographic FFT algorithms date back to [26], with contributions as recent as [20], but are dedicated to crystallographic symmetries. So called lattice FFTs have been studied in a series of papers initiated by [11] and continued in [27, 4]. A more general framework due to [1] was recently revisited from the point of view of high-performance code generation [18]. Further symmetric FFT algorithms and their truncated versions were developed in [16].

In this paper, we focus on discrete Fourier transforms of polynomials

with coefficients in the finite field with

elements. In general, primitive

-th roots of unity

only exist in extension fields of

, say

. This puts us in a similar situation as in the case of real FFTs: our primitive root of unity

lies in an extension field of the coefficient field of the polynomial. This time, the degree of the extension is

instead of

. As in the case of real FFTs, it is natural to search for algorithms that allow us to compute the DFT of a polynomial in

approximately

times faster than the DFT of a polynomial in

The main purpose of this paper is to show that such a speed-up by an approximate factor of

can indeed be achieved. The idea is to use the Frobenius automorphism

as the analogue of complex conjugation. If

, then we will exploit the fact that

for all

and

. This means that it suffices to evaluate

at only one element of each orbit for the action of the Frobenius automorphism on the cyclic group

generated by

. We will give an efficient algorithm for doing this, called the Frobenius FFT.

Our main motivation behind the Frobenius FFT is the development of faster practical algorithms for polynomial multiplication over finite fields of small characteristic. In [12], we have shown how to reduce multiplication in

to multiplication in

, while exploiting the fact that efficient DFTs are available over

. Unfortunately, this reduction involves an overhead of a factor

for zero padding. The Frobenius FFT can be thought of as a more direct way to use evaluation-interpolation strategies over

for polynomial multiplication over

. We have not yet implemented the algorithms in the present paper, but the ultimate hope is to make such multiplications twice as efficient as in [12].

Let us briefly detail the structure of this paper. In section 2, we first recall some standard complexity results for finite field arithmetic. We pursue with a quick survey of the fast Fourier transform (FFT) in section 3. The modern formulation of this algorithm is due to Cooley and Tukey [7], but the algorithm was already known to Gauß [9].

In section 4, we introduce the discrete Frobenius Fourier transform (DFFT) and describe several basic algorithms to compute DFFTs of prime order. There are essentially two efficient ways to deal with composite orders. Both approaches are reminiscent of analogue strategies for real FFTs [25], but the technical details are more complicated.

The first strategy is to reduce the computation of

discrete Fourier transforms of polynomials

of degree

to a single discrete Fourier transform of a polynomial

of degree

. This elegant reduction is the subject of section 5, but it is only applicable if we need to compute many transforms. Furthermore, at best, it only leads to the gain of a factor

instead of

, due to numerous applications of the Frobenius automorphism

The second strategy, to be detailed in section 6, is to carefully exploit the symmetries in the usual Cooley–Tukey FFT; we call the resulting algorithm the Frobenius FFT (FFFT). The complexity analysis is more intricate than for the algorithm from section 5 and the complete factor

is only regained under certain conditions. Fortunately, these conditions are satisfied for the most interesting applications to polynomial multiplication in

or when

becomes very large with respect to

In this paper, we focus on direct DFFTs and FFFTs. Inverse FFFTs can be computed in a straightforward way by reversing the main algorithm from section 6. Inverting the various algorithms from section 4 for DFFTs of prime orders requires a bit more effort, but should not raise any serious difficulties. The same holds for the algorithms to compute multiple DFFTs from section 5.

Besides inverse transforms, it is possible to consider truncated Frobenius FFTs, following [14, 15, 21], but this goes beyond the scope of this paper. It might also be worth investigating the use of Good's algorithm [10] as an alternative to the Cooley–Tukey FFT, since the order

usually admits many distinct prime divisors in our setting.

2.Finite field arithmetic

Throughout this paper and unless stated otherwise, we will assume the bit complexity model for Turing machines with a finite number of tapes [23]. Given a field

, we will write

for the set of polynomials in

of degree

Let us write

for some prime number

and

. Elements in the finite field

are usually represented as elements of

, where

is a monic irreducible polynomial of degree

. We will denote by

the cost to multiply two polynomials in

. Multiplications in

can be reduced to a constant number of multiplications of polynomials in

via the precomputation of a “pre-inverse” of

. The multiplication of two polynomials in

can be reduced to the multiplication of two polynomials in

using Kronecker substitution [8, Corollary 8.27]. In other words, we have

The best currently known bound for

was established in [13]. It was shown there that

Sometimes, we rather need to perform

multiplications

with

. The complexity of this operation will be denoted by

. Throughout this paper, it will be convenient to assume that

is increasing in both

and

. In other words,

is increasing in both

and

The computation of the Frobenius automorphism

requires

multiplications in

when using binary powering, so this operation has complexity

. This cost can be lowered by representing elements in

with respect to so-called normal bases [5]. However, multiplication in

becomes more expensive for this representation.

The discrete Frobenius Fourier transform potentially involves computations in all intermediate fields

where

divides

. The necessary minimal polynomials

for these fields and the corresponding pre-inverses can be kept in a table. The bitsize of this table is bounded by

. It is known [22] that

. Consequently, the bitsize of the table requires

storage.

Another important operation in finite fields is modular composition: given

and a monic polynomial

of degree

, the aim is to compute the remainder

of the euclidean division of

. If

is the minimal polynomial of the finite field

, then this operation also corresponds to the evaluation of the polynomial

at a point in

. If

and

are two distinct monic irreducible polynomials in

of degree

, then the conversion of an element in

to an element in

also boils down to one modular composition of degree

over

We will denote by

the cost of a modular composition as above. Theoretically speaking, Kedlaya and Umans proved that

[19]. From a practical point of view, one rather has

. If

is smooth and one needs to compute several modular compositions for the same modulus, then algorithms of quasi-linear complexity do exist [17].

Given primitive elements

and

with

, the paper [17] also contains efficient algorithms for conversions between

and

. Using modular composition, we notice that this problem reduces in time

to the case where we are free to choose primitive elements

and

that suit us. We let

be a cost function for conversions between

and

and denote

3.Fast Fourier transforms

3.1The discrete Fourier transform

Let

be any field and let

be a primitive

-th root of unity for some

. Now consider a polynomial

, where

is a small number, then

can be computed by evaluating

directly for

, using Horner's method for instance. If

is a prime number, then Rader and Bluestein have given efficient methods for the computation of discrete Fourier transforms of order

; see [24, 6] and subsection 3.2 below.

In subsection 3.4, we are interested in the case when

is composite. In that case, one may compute

using the fast Fourier transform (or FFT). The modern formulation of this algorithm is due to Cooley and Tukey [7], but the algorithm was already known to Gauß [9]. The output

of the FFT uses a mirrored indexation

that we will introduce in subsection 3.3 below.

3.2Rader reduction

Assume that we wish to compute a DFT of large prime length

. The multiplicative group

is cyclic, of order

. Let

be such that

is a generator of this cyclic group. Given a polynomial

and

with

, we have

In other words, we have essentially reduced the computation of a discrete Fourier transform of length

to a so-called cyclic convolution of length

3.3Generalized bitwise mirrors

Let

be a positive integer with a factorization

such that

for each

. Then any index

can uniquely be written in the form

where

for all

. We call

the digits of

when written in the mixed base

. We also define the -mirror of

Finally, given

, let

and

, so that

and

. Then it is not hard to see that for all

and

, we have

3.4The Cooley–Tukey FFT

We are now in a position to recall how the FFT works. Let

and

be as at the end of the previous subsection. We denote

and notice that

is a primitive

-th root of unity. Similarly,

is a primitive

-th root of unity. Recall that

for

. For any

and

, we have

In this formula, one recognizes two polynomial evaluations of orders

and

. More precisely, we may decompose the polynomial

This leads to the following recursive algorithm for the computation of discrete Fourier transforms.

if then compute the answer using a direct algorithm and return it

If the

are all bounded, then this algorithm requires

operations in

. For practical purposes it is best though to pick

such that

and

are as close to

as possible. This approach is most “cache friendly” in the sense that it keeps data as close as possible in memory [2].

4.Frobenius Fourier transforms

4.1The discrete Frobenius Fourier transform

Let

be an extension of finite fields. Let

denote the Frobenius automorphism of

over

. We recall that the group

generated by

has size

and that it coincides with the Galois group of

over

Let

be a primitive

-th root of unity with

. Recall that

and

. We will write

for the cyclic group of size

generated by

. The Frobenius automorphism

naturally acts on

and we denote by

the order of an element

under this action. Notice that

. A subset

is called a cross section if for every

, there exists exactly one

such that

for some

. We will denote this element

. We will also denote the corresponding set of indices by

Now consider a polynomial

. Recall that its discrete Fourier transform with respect to

is defined to be the vector

for all

. For any

, this means that we may retrieve

from

. Indeed, setting

with

, we obtain

. We call

Let

and

. Then

leaves

invariant, whence

. Since

is the subfield of

that is fixed under the action of

, this yields

In particular, the output and input size

of the discrete Frobenius Fourier transform coincide.

4.2Twiddled transforms

In the Cooley–Tukey algorithm from section 3.4, the second wave of FFTs operates on “twiddled” polynomials

instead of the polynomials

. An alternative point of view is to directly consider the evaluation of

at the points

with

and where

. We will call this operation a twiddled DFT.

The twiddled DFFT is defined in a similar manner. More precisely, assume that we are given an

-th root of unity

such that the set

is stable under the action of

. Assume also that we have a cross section

under this action and the corresponding set of indices

. Then the twiddled DFFT computes the family

4.3The naive strategy

is a small number, then we recall from section 3.1 that it is most efficient to compute ordinary DFTs of order

in a naive fashion, by evaluating

for

using Horner's method. From an implementation point of view, one may do this using specialized codelets for various small

The same naive strategy can also be used for the computation of DFFTs and twiddled DFFTs. Given a cross section

and the corresponding set of indices

, it suffices to evaluate

separately for each

. Let

be the order of

under the action of

. Then

actually belongs to

, so the evaluation of

can be done in time

. With the customary assumption that

is increasing as a function of

, it follows using (4.2) that the complete twiddled DFFT can be computed in time

Assuming that full DFTs of order

over

are also computed using the naive strategy, this yields

4.4Full Frobenius action

The next interesting case for study is when

is prime and the action of

is either transitive, or has only two orbits and one of them is trivial. If

, then

always forms an orbit of its own in

, but we can have

. If

, then it can happen that

is a singleton. This happens for instance for a

-th primitive root of unity

, for

, and

is a singleton, say

, then we necessarily have

and the DFFT reduces to a single evaluation

with

and

. This is precisely the problem of modular composition that we encountered in section 2, whence

. The speed-up

with respect to a full DFT is comprised between

and

, depending on the efficiency of our algorithm for modular composition. Theoretically speaking, we recall that

, which allows us to gain a factor

for any

In a similar way, if

and

, then the computation of one DFFT reduces to

additions (in order to compute

) and one composition modulo a polynomial of degree

(instead of

Remark 4.1. Assume that

is prime,

and

. If we are free to choose the representation of elements in

, then the DFFT becomes particularly efficient if we take

. In other words, if

is the monic minimal polynomial of

over

(so that

), then we represent elements of

as polynomials in

modulo

. Given

, just writing down

then yields the evaluation of

, whence

4.5Rader reduction

For large prime orders

and

, let us now show how to adapt Rader reduction to the computation of DFFTs in favourable cases. With the notations from section 3.2, we have

Let us assume for simplicity that the order

of the additive subgroup generated by

is comprime with

. Then there exists an element

of order

and

. Consequently,

We may thus regard the product

as a bivariate polynomial in

and

. Let

be the constant term in

of this bivariate polynomial. From

, we can recover

for every

. By construction, the

with

actually form a section of

under the action of

. This reduces the computation of a DFFT for the section

to the computation of

As to the computation of

, assume for simplicity that

contains a primitive

-th root of unity. Recall that

has coefficients in

whereas

admits coefficients in

. Notice also that

does not depend on the input

, whence it can be precomputed. We compute

as follows in the FFT-model with respect to

. We first transform

and

into two vectors

and

. The computation of

may again be regarded as a precomputation. The computation of the Fourier transform

now comes down to

modular compositions of degree

over

. We finally transform the result back into

. Altogether, this yields

It is instructive to compare this with the cost of a full DFT of length

over

using Rader's algorithm:

Depending on the efficiency of modular composition, the speed-up for DFFTs therefore varies between

and

. Notice that the algorithm from the previous subsection becomes a special case with parameters

and

5.Multiple Fourier transforms

5.1Parallel lifting

Let

be an extension of

and assume that its elements are represented as polynomials in a primitive element

of degree

. Given

polynomials

, we may then form the polynomial

In other words, the discrete Fourier transform operates coefficientwise with respect to the basis

Recall that

stands for the cost of computing

Fourier transforms of length

over

. The maps

and its inverse boil down to matrix transpositions in memory. On a Turing machine, they can therefore be computed in time

. From (5.1), it follows that

This relation does not give us anything really new, since we made the assumption in section 2 that

is increasing in both

and

. Nevertheless, it provides us with an elegant and explicit algorithm in a special case.

5.2Symmetric multiplexing

Let us next consider an

-th root of unity

and a primitive

-th root of unity

. Given

, we have for all

Now given the twiddled discrete Fourier transform

, we in particular know the values

. Letting the Frobenius automorphism

act on these values, we obtain the vector

. Using one matrix-vector product

, we may then retrieve the values of the individual twiddled transforms

. Doing this for each

in a cross section of

under the action of

, this yields a way to retrieve the twiddled DFFTs of

from the twiddled DFT of

Recall that

denotes the complexity of computing the DFFTs of

elements of

over

. Since

is a Vandermonde matrix, the matrix-vector product

can be computed in time

using polynomial interpolation [8, Chapter 10]. Given

, we may compute each individual vector

in time

using the Frobenius automorphism

. Since there are

orbits in

under the action of

, we may thus retrieve the

from

in time

. In other words,

5.3Multiplexing over an intermediate field

In the extreme case when

, every

has order one under the action of

and

. In that case, the bound (5.3) is not very sharp, but (5.2) already provides us with a good alternative bound for this special case. For

with

, let us now consider the case when

has order at most

under the action of

for all

, so that

. Let

be a primitive element in

over

, so that

and

. Given our

polynomials

, we may form

Moreover, we may compute

from

for each

, using the algorithm from the previous subsection, but with the additional property that at least one

has maximal order

under the action of

is prime and

, then the above discussion shows that, without loss of generality, we may assume that

has order

under the action of

. This means that

has maximal order

for every

, whereas

has order one. Hence,

. Similarly, if

is prime and

, then we obtain a reduction to the case when

has maximal order

for all

, whence

. In both cases, we obtain the following:

Remark 5.2. We recall that

, whence

. When using the best known bound (2.1) for

and

, it follows for some constant

that

In other words, we cannot hope to gain more than an asymptotic factor of

with respect to a full DFT.

is not necessarily prime, then the technique from this section can still be used for every individual

. However, the rewritings of elements in

as elements in

and

have to be done individually for each

using modular compositions. Denoting by

the order of

under

, it follows that

Notice that conversions of the same type correspond to modular compositions with the same modulus. If

is smooth, then it follows that we may use the algorithms from [17] and keep the cost of the conversions quasi-linear.

6.The Frobenius FFT

Let

and

be as in section 3.3 and let

be a primitive

-th root of unity. In this section we present an analogue of the FFT from section 3.4: the Frobenius FFT (or FFFT). This algorithm uses the same mirrored indexation as the Cooley–Tukey FFT: given

, it thus returns the family

. We start with the description of the index set

that corresponds to the so-called “privileged cross section”

6.1Privileged cross sections

Then

is a cross section of

under the action of

; we call it the privileged cross section for

(and

Now let

and

be as above. Then

is a primitive

-th root of unity. Let

be the corresponding privileged cross section of

Inversely, given

with

for some

, we claim that there exists a

such that

for

. Indeed,

implies that

, whence

is an

-th root of unity. This means that there exists a

with

, i.e.

Now assume that

is minimal with

for some

. Given

such that

, the above claim shows that there exists a

such that

for

. But then

, whence

. This shows that

implies

Inversely, assume that

is minimal such that

. Then the above claim implies that there exists a

such that

for

. Without loss of generality we may assume that

was chosen minimal while satisfying this property. Given

and

with

, we have

. Consequently,

and

whenever

. In other words,

. This shows that

implies the existence of a

with

and

6.2The main algorithm

We are now in a position to adapt the Cooley–Tukey FFT. In the case when

is prime, we will use one of the algorithms from section 4 for the computation of twiddled DFFTs.

if then return

where is the order of

For the usual FFT, it was possible to choose

in such a way that

, and we recall that this improves the cache efficiency of practical implementations. However, this optimization is more problematic in our setting since it would require the development of an efficient recursive algorithm for twiddled FFFTs. This is an interesting topic, but beyond the scope of the present paper.

6.3Complexity analysis

Let us now perform the complexity analysis of

. For

, we first focus on all FFTs and twiddled DFFTs that are computed at “stage

” using a fallback algorithm. These FFTs and twiddled DFFTs are all of length

. Given

, let

be the number of transforms of a polynomial in

over

. From (4.2), it follows that

Now a naive bound for the cost of an FFT or twiddled DFFT of a polynomial in

over

The total cost of all FFTs and twiddled DFFTs of prime length is therefore bounded by

Strictly speaking, if

is a proper divisor of

, then the output of a twiddled DFFT of a polynomial in

over

does not use the “standard” representation for

, so the result must be converted. The cost of all such conversions at stage

is bounded by

For a number

with prime factorization

, let

. We also denote

and notice that

. Let us show by induction over

that

This is clear for

, since

and

for

. Assuming that the relation holds for a given

, we get

This completes the induction and we conclude that the total conversion cost is bounded by

Remark 6.3. If the

are very small, then we have shown in section 4.3 that (6.1) can be further sharpened into

Remark 6.4. It is also possible to treat the case when

apart in the bound for

, thereby avoiding the factor

whenever possible. A similar analysis as for the cost of the conversions then yields

Whenever we can manage to keep

and

small with respect to

, this means that we gain a factor

with respect to a full DFT.

6.4Frobenius FFTs over

Let us now study the special case when

and

. This case is useful for the multiplication of polynomials

with

. The idea is to compute the Frobenius FFTs

and

, to perform the pointwise multiplications

and to retrieve

by doing one inverse Frobenius FFT.

Modulo zero padding, we have some flexibility in the choice of

, which should divide

. In particular, for polynomials

and

of large degree, we may choose

to be a multiple of

and take

and

(instead of

) for the top call of the algorithm FFFT from section 6.2. By selecting the representation of

as explained in Remark 4.1, this means that

for a single

and

for the remaining

indices

. In other words, one Frobenius FFT of length

reduces to one Frobenius FFT of length

and one full FFT over

of length

We choose the remaining

to be small prime numbers, so as to minimize the cost of the full FFT over

of length

. For the remaining FFFT of length

, we may then apply Theorem 6.2 and Remark 6.3. From a practical point of view, the computation of this remaining FFFT should take about

times the time the computation of a full FFT of length

for some small constant

We conclude that one entire FFFT of length

can be performed approximately

times faster as a full FFT of length

. We hope that this will lead to practical implementations for polynomial multiplication in

that are approximately twice as efficient as the implementation from [12].

7.References

[1]: L. Auslander, J. R. Johnson, and R. W. Johnson. An equivariant Fast Fourier Transform algorithm. Drexel University Technical Report DU-MCS-96-02, 1996.
[2]: David H Bailey. Ffts in external of hierarchical memory. In Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pages 234–242. ACM, 1989.
[3]: G.D. Bergland. A fast Fourier transform algorithm for real-valued series. Communications of the ACM, 11(10):703–710, 1968.
[4]: R. Bergmann. The fast Fourier transform and fast wavelet transform for patterns on the torus. Applied and Computational Harmonic Analysis, 2012. In Press.
[5]: Dieter Blessenohl. On the normal basis theorem. Note di Matematica, 27(1):5–10, 2007.
[6]: Leo I. Bluestein. A linear filtering approach to the computation of discrete Fourier transform. IEEE Transactions on Audio and Electroacoustics, 18(4):451–455, 1970.
[7]: J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
[8]: J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, New York, NY, USA, 3rd edition, 2013.
[9]: C. F. Gauss. Nachlass: Theoria interpolationis methodo nova tractata. In Werke, volume 3, pages 265–330. Königliche Gesellschaft der Wissenschaften, Göttingen, 1866.
[10]: I. J. Good. The interaction algorithm and practical Fourier analysis. Journal of the Royal Statistical Society, Series B. 20(2):361–372, 1958.
[11]: A. Guessoum and R. Mersereau. Fast algorithms for the multidimensional discrete Fourier transform. IEEE Transactions on Acoustics, Speech and Signal Processing, 34(4):937–943, 1986.
[12]: D. Harvey, J. van der Hoeven, and G. Lecerf. Fast polynomial multiplication over . In Proc. ISSAC '16, pages 255–262, New York, NY, USA, 2016. ACM.
[13]: D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
[14]: J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proc. ISSAC 2004, pages 290–296, Univ. of Cantabria, Santander, Spain, July 4–7 2004.
[15]: J. van der Hoeven. Notes on the Truncated Fourier Transform. Technical Report 2005-5, Université Paris-Sud, Orsay, France, 2005.
[16]: J. van der Hoeven, R. Lebreton, and É. Schost. Structured FFT and TFT: symmetric and lattice polynomials. In Proc. ISSAC '13, pages 355–362, Boston, USA, June 2013.
[17]: J. van der Hoeven and G. Lecerf. Modular composition via factorization. Technical report, HAL, 2017. http://hal.archives-ouvertes.fr/hal-01457074.
[18]: J. Johnson and X. Xu. Generating symmetric DFTs and equivariant FFT algorithms. In ISSAC'07, pages 195–202. ACM, 2007.
[19]: K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
[20]: A Kudlicki, M. Rowicka, and Z. Otwinowski. The crystallographic Fast Fourier Transform. Recursive symmetry reduction. Acta Cryst., A63:465–480, 2007.
[21]: R. Larrieu. The truncated fourier transform for mixed radices. Technical report, HAL, 2016. http://hal.archives-ouvertes.fr/hal-01419442.
[22]: Jean-Louis Nicolas and Guy Robin. Highly composite numbers by Srinivasa Ramanujan. The Ramanujan Journal, 1(2):119–153, 1997.
[23]: C.H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
[24]: C.M. Rader. Discrete Fourier transforms when the number of data samples is prime. Proc. IEEE, 56:1107–1108, 1968.
[25]: H.V. Sorensen, D.L. Jones, M.T. Heideman, and C.S. Burrus. Real-valued fast Fourier transform algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing, 35(6):849–863, 1987.
[26]: L. F. Ten Eyck. Crystallographic Fast Fourier Transform. Acta Cryst., A29:183–191, 1973.
[27]: A. Vince and X. Zheng. Computing the Discrete Fourier Transform on a hexagonal lattice. Journal of Mathematical Imaging and Vision, 28:125–133, 2007.

			(2.2)

Abstract

Keywords

1.Introduction

2.Finite field arithmetic

3.Fast Fourier transforms

3.1The discrete Fourier transform

3.2Rader reduction

3.3Generalized bitwise mirrors

3.4The Cooley–Tukey FFT

4.Frobenius Fourier transforms

4.1The discrete Frobenius Fourier transform

4.2Twiddled transforms

4.3The naive strategy

4.4Full Frobenius action

4.5Rader reduction

5.Multiple Fourier transforms

5.1Parallel lifting

5.2Symmetric multiplexing

5.3Multiplexing over an intermediate field

6.The Frobenius FFT

6.1Privileged cross sections

6.2The main algorithm

6.3Complexity analysis

6.4Frobenius FFTs over

7.References