Modular composition via factorization

Let

be an effective field. This means that elements of

can be represented on a computer and that we have algorithms for performing the field operations. Given polynomials

, the problem of modular composition is to compute

modulo

. Modular composition is an important problem in complexity theory because of its applications to polynomial factorization [29-31]. It also occurs very naturally whenever one wishes to perform polynomial computations over

inside an algebraic extension of

. Given two different representations

of an algebraic extension of

, the implementation of an explicit isomorphism boils down to modular composition as well.

In this paper, we study the problem of composition modulo a fixed polynomial

mostly in the case when

is a finite field. We assume that

is separable; the case of moduli of the form

is studied in a separate paper [24]. Our results are based on the following simple observation: if a factorization

is known, then composition modulo

reduces to

composition modulo the

with

. Curiously, this observation does not seem to be exploited in the standard literature on modular composition. In the case when

is irreducible over

, but

admits a non trivial divisor

, then the second crucial observation is that

factors over

. We may then apply the first observation in order to obtain an efficient algorithm for composition modulo

. Finding the factorizations of

over

can be quite expensive in general, but such computations can be regarded as pre-computations if the modulus

is fixed.

Besides modular composition, we also study the related problem of computing the characteristic polynomial

modulo

. More precisely, we understand

to be the characteristic polynomial of the multiplication endomorphism by

. In particular, we have

modulo

. Theoretically speaking, the fastest current method relies on the computation of so-called power projections (see for instance [16, section 2.5]); this task turns out to be of the same complexity as modular composition in all known cases.

1.1.Previous work

Denote by

the number of operations in

required to multiply two polynomials of

. Let

and

be polynomials in

of degrees

and

. The naive modular composition algorithm takes

operations in

. In 1978, Brent and Kung gave an algorithm with cost

: their algorithm from [5, section 2.1] actually concerns the case when

, but it naturally extends to general

. It uses the baby-step giant-step technique due to Paterson and Stockmeyer [40], and even yields a sub-quadratic cost

when using fast linear algebra (see [28, p. 185]). The constant

is such that a

matrix over

may be multiplied with another

rectangular matrix in time

. The best current bound

is due to Huang and Pan [26, Theorem 10.1].

When linear algebra benefits from very fast implementations, its contribution is expected to be smaller than the other polynomial operations, on a certain bounded range for

. In fact, for fixed values of

and

, the sizes of the “baby” and “giant” steps may be optimized in order to balance cost contributions of matrix and polynomial operations (this was studied for finite fields in an unpublished preprint of Shoup and Smolensky in 1992). Further improvements have been proposed in [27], based on the Lagrange inversion formula for the reversion of formal power series.

A major breakthrough has been achieved by Kedlaya and Umans [30, 31] in the case when

is the finite field

. For any positive

, they have shown that the composition

modulo

can be computed using

bit operations. Unfortunately, it remains a major open problem to turn this theoretical bit complexity bound into practically useful implementations.

In the special case of power series composition (i.e. when

), the best known complexity bound is again due to Brent and Kung: in [5], they showed that this requires

operations in

, under the condition that

is invertible and that the characteristic is at least

, where

. The variant proposed by van der Hoeven [20, section 3.4.3] removes the condition on

. For fields of small characteristic, Bernstein [1] proposed an algorithm that is softly linear in the precision

but linear in the characteristic. These algorithms are generalized to moduli

of the form

in [24]; we show there that the composition reduces to one power series composition at order

, plus

compositions modulo

, and one characteristic polynomial computation modulo

. Let us finally mention that series with integer, rational or floating point coefficients can often be composed in quasi-linear time as well in suitable bit complexity models, as shown by Ritzmann [43]; see also [21].

The expression

is linear in

. It is well known that the transposition of the application

corresponds to the power projection task (see section 2.7), which is an important ingredient for computing minimal and characteristic polynomials. In [46], Shoup studied the computation of minimal polynomials in algebraic extensions of the form

, explicitly given by defining polynomials. He designed fast practical algorithms with low memory consumption, based on a smart combination of “baby-step giant-step” and transposed algorithms. However his method does not improve upon Brent and Kung's one from the asymptotic complexity point of view.

The characteristic polynomial of

modulo

may be obtained from suitable power projections of

modulo

thanks to the well-known Newton–Girard identities, which involve solving a first order differential equation to precision

. This is rather elementary when the characteristic is zero or sufficiently large. Otherwise,

-adic arithmetic is needed. A complete solution is described in [16]. More generally, a framework for using

-adic arithmetic to solve ordinary differential equations in positive characteristic may be found in [34].

1.2.Contributions and outline of the article

The aim of this article is to show how suitable factorizations of the modulus

can be exploited to speed up compositions modulo

, as well as the computation of characteristic polynomials modulo

. Most of the new results are derived from the following simple observation: if

splits into linear factors in

, and if its roots are given, then modular composition basically reduces to evaluating

at the roots of

, evaluating

at these values of

, and interpolate

. It is well-known that each of these steps can be done in softly linear time using multiple point evaluation and interpolation. More generally, whenever

can be factored into

, the computation of

reduces to the computations of

for

Of course, the existence of factorizations of

heavily depends on

itself and on fields over which we allow ourselves to factor

. For instance, if

, then we might consider computing the roots of

using a sufficient precision, or factoring

over the

-adic numbers

for some well chosen prime number

. If

is a finite field, and

is an irreducible polynomial of composite degree

, then we may consider factorizations over the intermediate field

. In a separate paper, we study the case when

is the field of computable complex numbers [25]. In this paper, we mainly focus on the finite field case.

Concerning the organization of this paper, we first revisit several known techniques in section 2. This includes the introduction of cost functions for modular composition, power projection, and the computation of characteristic polynomials. In section 3, we examine how modular composition can be accelerated when a factorization of

is given. More precisely, if the irreducible factorization

of the modulus is available, then our method reduces the composition modulo

to several compositions modulo

in softly linear time. A key ingredient, reused several times in the article, is the simultaneous computation of characteristic polynomials and modular compositions.

In section 4, we turn our attention to the specific situation of an irreducible modulus

of composite degree

over a finite field

. We show how to exploit the existence of factorizations of

over the intermediate fields

into factors of degree

The natural generalization to degrees

with

will be the subject of sections 5, 6 and 7. One important question is how to represent the elements of the intermediate fields

and it is convenient to introduce the special concept of an effective algebraic tower for this purpose. We also introduce the notion of a composition tower for

, which formalizes the requirement that we are given factorizations of

over each of the intermediate fields

. In section 5, we generalize the algorithm from section 4 to arbitrary composition towers. In section 6 we also give a detailed complexity analysis in the case of triangular towers when each intermediate field

admits the form

for suitable

Section 7 is dedicated to primitive towers, in which case each

is generated by a single primitive element

over

. If the

are pairwise coprime (see section 7.4), then the field

can be taken to be the composed product of

, and computations in the tower become particularly efficient: in the case when

is “super-smooth” (in the sense that the largest prime-power divisor

satisfies

), we will show that composition modulo

can be done in quasi-linear time (modulo precomputations). In this very particular situation, our method is asymptotically faster than Kedlaya–Umans' algorithm. If

admits a small characteristic

and

, then a similar result holds when using so called Artin–Schreier towers (see section 7.5). For general smooth

, one may also consider nested towers (see section 7.3) for which the primitive elements

are compositions of polynomials of degrees

over

. The existence of such towers for given

and

is an interesting open problem, with a generally positive answer in practice.

Our main complexity bounds for modular composition are summarized in Table 1.1. In this table,

is a fixed irreducible polynomial of degree

and

. The entries correspond to the various types of towers that are considered in sections 6 and 7, while assuming that all necessary precomputations that depend on

have been done.

This leaves us with the issue of how to conduct the precomputations. This is the subject of section 8, where we analyze the cost of building composition towers of various types. We will see that the construction of composition towers for prescribed composite extension degrees

can usually be done fast. On the other hand, building composition towers for a prescribed modulus

is of the same order of difficulty as factoring

over the intermediate fields

, or finding a root of

for a given representation of the elements of

. Practical algorithms for this task are well known but their cost is quadratic in

; see [38] for recent theoretical complexity bounds.

Whether our approach leads to competitive algorithms for modular composition thus depends on the question whether we require the factorization of

to be part of the complexity or not. Indeed, if we are doing a large polynomial computation over

in the algebraic extension

, then we typically need to perform many modular compositions

for different

and

, but for a fixed modulus

. In such cases, it is reasonable to regard the factorization of

as a precomputation. Furthermore, if we want to perform computations in a large finite field extension

and if we are free to select a suitable representation for elements of this finite field, then we may build a modulus

with

using dedicated algorithms; from the asymptotic complexity point of view, these algorithms are usually faster than finding an irreducible modulus at random.

In fact, we recall that testing the irreducibility of

reduces to

modular compositions in degree

over

, plus

bit operations (see [31, section 8.2], based on Rabin's algorithm [41]). In practice, for the fast construction of an irreducible polynomial of degree

over

, it is recommended to use Shoup's algorithm [45], which uses an expected number of

arithmetic operations in

. If

is much smaller than

, then one may also resort to Couveignes and Lercier's algorithm [9]. Theoretically speaking, this algorithm admits an expected bit complexity

, but it relies on Kedlaya and Umans' algorithm for modular composition.

2.Preliminaries

2.1.Complexity models

Recall that an effective ring is a ring

with unity whose elements can be represented on a computer and such that we have algorithms for performing the ring operations. Effective fields

and effective algebras over an effective ring are defined similarly.

Given an effective ring

, algebraic complexity models express running times in terms of the number of operations in

. Unless otherwise stated, we will analyze the costs of the algorithms in this paper in this way. More precisely, our results both apply for the straight-line program and computation tree models [6, chapter 4].

For randomized algorithms over a finite effective ring

, we assume a special instruction that uniformly generates random elements in

. For simplicity we assume that this instruction has constant cost. For a given input, the cost of an algorithm is thus a random variable. The expected cost of an algorithm for input size

is defined as the maximum of the averages of these random variables over all possible inputs of size

When working over the finite field

with

elements, we may also analyze the costs of algorithms in the bit complexity model, which relies on Turing machines with a sufficient number of tapes [39]. We will not explicitly consider this model in our paper, but most of our complexity bounds can easily be converted to this setting.

2.2.Polynomial multiplication

We write

for a cost function such that two polynomials in

can be multiplied using

operations in

. The schoolbook algorithm allows us to take

. The fastest currently known algorithm [7] yields

. Here, the soft-Oh notation

means that

; see [13, section 25.7] for technical details. If

is a field of finite characteristic, then it has been shown [18, 19] that

, where

denotes the iterated logarithm function. In what follows, we will always assume that

is an increasing function in

. This customary assumption implies the super-additivity of

, namely

for all

and

More generally, if

is an effective

-algebra, then it is sometimes convient to denote by

a cost function such that two polynomials in

can be multiplied using

operations in

2.3.Univariate arithmetic

Let

be an effective field. The remainder (resp. quotient) of the euclidean division of

is denoted by

(resp. by

). For a fixed modulus of degree

, euclidean divisions by

are usually performed by computing a pre-inverse

. More precisely,

is the inverse of

, computed at precision

. Given

, one obtains the quotient

by multiplying

with

and the remainder as

. Given

we may thus compute the modular product

using

operations in

We recall that the greatest common divisor of two polynomials of degrees at most

over

can be computed using

operations in

[13, Algorithm 11.4].

Let

and consider

points

. Then the evaluations

can be computed using

operations in

[13, chapter 10]. This operation is also called multipoint evaluation. The inverse operation is the interpolation, which consists of recovering

from

; it can be performed with a similar cost. If the

are fixed, then it is often possible to gain a factor

using FFT trading [22].

More generally, if

are polynomials with

, then all the remainders

can be computed using

operations in

. The inverse problem, called Chinese remaindering, again admits the same complexity

2.4.Bivariate arithmetic

Let

be an effective ring. Given a bivariate polynomial

, we define its bidegree to be the pair

with

and

. Using Kronecker substitution [13, section 8.4], two polynomials of bidegree

may be multiplied using

ring operations in

is a monic polynomial of degree

, and if

and

are two polynomials in

then their canonical preimages in

may be multiplied with

operations in

before being projected into

with

additional operations. Consequently each ring operation in

in degree

reduces to

operations in

. If

is monic in

then

also takes

operations in

The computation of bivariate subresultants usually relies on fast evaluation/interpolation, as in the following well known proposition.

Proof. The subresultant polynomial

of degree

and

has degree

. It can be computed by evaluating

and

values for

, computing

subresultants in

of degree

, and interpolating the coefficients of

. In total this costs

Proof. This result corresponds to [36, Corollary 26]. The number of inversions comes from the fact that the underlying algorithm only needs to perform exact divisions by subresultant coefficients in

. Each division requires to invert the leading coefficient of the divisor. There exist at most

such leading coefficients.

2.5.Finite field arithmetic

Let

be the finite field with

elements. One way to represent elements of a finite field extension

is as remainder classes of polynomials in

modulo a monic reducible polynomial

of degree

. We write

in order to emphasize that we use this representation. Multiplication in

can be done using

operations in

. Given an element

of degree

over

, we write

for the subfield of

generated by

over

, where we understand that elements in

are represented as evaluations of polynomials in

For the bulk of the algorithms in this paper, we will work over the finite field

. In that case, it can be shown that two polynomials in

can be multiplied in time

on a Turing machine with a sufficient number of tapes [19, section 8.1]. The algebraic complexity bounds in this paper are easy to adapt to this model: it mainly suffices to replace

in all bounds.[update references and bounds]

2.6.Matrix multiplication

The constant

represents a feasible exponent for the multiplication cost of matrices: two square matrices of size

can be multiplied using

operations in their coefficient ring. The constant

is defined similarly but for multiplying a

matrix by a

rectangular one. At present time the best known bound

is due to Le Gall [35]. This naturally yields

. However the latter bound does not improve upon the earlier bound

due to Huang and Pan [26, Theorem 10.1].

2.7.Modular composition and related operations

Let

be an effective ring. Let

an effective

-algebra of dimension

whose elements are represented by vectors of size

in a given basis. We introduce the following cost functions:

Proof. The first bound is immediate. The proof of the second bound is detailed in [13, section 12.2].

For a fixed monic polynomial

of degree

, the modular composition

is a linear operation in

. For

and

of degrees

, the corresponding transposed application is precisely the operation of modular power projections. If a modular composition algorithm with cost

can be transposed in the sense of [3], then this leads to a power projection algorithm with cost

3.Modular composition via factorization

3.1.Separable moduli over algebraically closed fields

Let

be an effective field. A monic polynomial

is said to be separable if

. We may use the following algorithm for composition modulo

whenever

is separable and its

pairwise distinct roots

are given:

Input
Output: , where .

Compute using fast multi-point evaluation.
Compute using fast multi-point evaluation.
Retrieve with using fast interpolation.
Return .

Proof. By construction,

for

. Since

and the

are pairwise distinct, it follows that

. This proves the correctness of the algorithm. The complexity bound follows from the fact that each of the steps 1, 2 and 3 can be performed in time

3.2.Pairwise coprime moduli

Let

again be a general effective field. The algorithm from the previous section may be generalized to composition modulo a polynomial

that can be factored partially as

, where the polynomials

are pairwise coprime (although not necessarily irreducible).

Input
Output: , and the characteristic polynomial of modulo .

Use a multi-remainder algorithm to compute , for all .
For all , compute the characteristic polynomial of modulo .
Use a multi-remainder algorithm to compute , for all .
For all , perform the modular composition .
Use Chinese remaindering to compute in of degree such that for all .
Return and .

Proof. For all

, the Cayley–Hamilton theorem gives us

, which implies

, whence the correctness of

. The correctness of the characteristic polynomial of

follows from the usual isomorphism of

-algebras

The costs of steps 1, 3, 5, and 6 are

. Step 2 costs

, and step 4 takes

operations in

Example 3.3. For some families of polynomials the irreducible factorization is explicitly known. For instance, the following result is due to Serret [44, section III, chapitre III, pp. 158–162] (see also [11, pp. 24–27], [37, Theorem 3.2.18]):

For example, with

, the polynomial

factors into irreducible polynomials of degree

. Taking

instead leads to

and

4.Exploiting factorizations over algebraic extensions

4.1.Degree reduction

Let

still be an effective field and assume that we wish to compute a modular composition

, where

and

is monic. Let us study what happens if the polynomials

and

to be composed have degrees larger than

. We clearly have

and we may compute

using

operations in

[13, Exercise 9.16]. Without loss of generality we may therefore assume that

Alternatively, given a polynomial

with

, the following formula provides us with a more efficient way to reduce the degree of

Taking

to be the characteristic polynomial

modulo

, its computation usually admits a similar cost as modular composition. Therefore it is worth using this method unless

. This is actually one key ingredient for the upcoming algorithms for modular composition: in order to reduce a composition modulo

to compositions modulo a factor

, we in particular need to compute the characteristic polynomial of

modulo

. At the end of the recursive calls, one should nevertheless keep in mind that we only need annihilating polynomials, so that we may also use minimal polynomials. Shoup has given a probabilistic

algorithm for computing minimal polynomials [46], which is useful for actual implementations.

4.2.Normal factorizations

In the case when we wish to compute a composition modulo an irreducible polynomial

, we cannot apply the algorithms from sections 3.1 and 3.2. Nevertheless, it might happen that

admits a non trivial factorization over an algebraic extension of

. This generically happens when

is a finite field and

is composite. Indeed, we recall the following well known result.

Proof. Since

is irreducible,

is isomorphic to

, which contains

. We may thus take a generator

of the image of

, and write

its minimal polynomial over

. Let

be the class of

and let

be its monic minimal polynomial over

. Then

divides

and so do its conjugates

for

. On the other hand, since

, we have

, so any root of

is a root of one of the

, which proves the equality

We call factorizations as in this proposition “normal factorizations”. This concept can actually be defined over arbitrary fields, as follows. Let

be a monic separable polynomial in

, let

be a divisor of

, and let

be a monic separable irreducible polynomial of degree

. We set

, and write

for the class of

. We say that

admits a normal factorization over

if there exists a bivariate polynomial

that is monic in

, of bidegree

, and such that

factors into

over

, with

and

coprime whenever

. We call

the normal factor of

over

. Notice that the polynomials

and

are not required to be irreducible here.

Example 4.2. With

is irreducible, and we have

. For

we take

and present

. Then

factors over

where

are the two roots of

. More precisely for

we may take the class of

, whereas

. The normal factor of

is thus

Example 4.3. When

the situation is different from the case of finite fields. For instance

is irreducible, but it remains irreducible over

, etc. Nevertheless, for a prescribed extension degree

, we may randomly pick an irreducible

of degree

and a polynomial

of bidegree

that is irreducible as a polynomial in

, and build

. If

is separable, then it is irreducible in

, and

is a normal factor of

over

4.3.Single extensions

Assume that

admits a normal factorization as above. Then the Chinese remainder theorem yields a natural isomorphism

and we may define

as the unique polynomial in

that satisfies

and

. We may now adapt the algorithm from section 3.2 as follows:

Input
Output: , and the characteristic polynomial of modulo .

Compute the remainder in .
Compute the characteristic polynomial of modulo in .
Compute in .
Perform the modular composition in .
Return and .

It follows that

, whence

is computed correctly. As to the characteristic polynomial of

, the argument is the same as for Algorithm 3.2, thanks to the Poisson formula

Now the multiplication of two polynomials in

of degree

using Kronecker substitution requires

operations in

. This way, steps 1 and 3 take

operations in

. Steps 2 and 4 respectively cost

and

operations in

. The computation of

in the last step may be done naively using

operations in

. The computation of

requires

further operations by Proposition 2.2.

Proof. We simply apply Algorithm 4.1. For

we use

, which takes

operations in

by Proposition 2.2. We perform the computations in step 4 naively, which yields

5.Composition towers

5.1.Effective towers

Corollary 4.5 already illustrates the potential of our ability to factor

non trivially over an extension field

. This idea can be pushed even farther whenever the factors

with

can be factored recursively over a tower of extension fields of

. In order to carry out this generalization, we first need to decide how to compute with elements in the successive fields of such a tower. Instead of privileging particular representations, we rely on the abstract concept of an effective tower.

We denote by

the cost of multiplying two polynomials in

in terms of the number of required operations in

. Similarly, we write

for the cost of inverting an element in

in terms of the number of operations in

. We always assume that additions and subtractions can be done in linear time. We also let

upper bound the costs of both the upward and downward conversions at level

5.2.Composition towers

Let us now return to our particular modulus

and assume that its degree

admits the factorization

with

for

. If

, then Algorithm 4.1 shows how to reduce modular composition modulo

to composition modulo a polynomial over

of degree

, provided a normal factor of

over

exists and is given. In order to generalize this idea to the case when

, it is useful to introduce the concept of a composition tower.

5.3.Modular composition using composition towers

Given

monic and separable along with a composition tower

, we may now apply Algorithm 4.1 recursively. Unrolling the recursive calls yields the following algorithm for modular composition.

Input
Output: , and the characteristic polynomial of modulo .

Set .
Let be the characteristic polynomial of modulo over .
For do
For , compute in .
Let .
For do
Return and .

Proof. Let

, and for

, let

. The polynomial

in step 3 is the characteristic polynomial of

modulo

over

and we notice that

in step 6. The correctness of the algorithm results from these observations.

The computation of the resultant

in step 3 can be performed in time

by Proposition 2.2. It follows that the complete step 3 requires

The computation of

in step 4 can be done in time

. The computation of the remainder of its division by

can be done in time

. Therefore step 4 amounts to

In step 6, the naive evaluation of

modulo

using Horner's method requires

operations in

. Consequently, step 6 requires

operations in

. The conclusion follows by adding up the above bounds for the costs of the individual steps.

Remark 5.4. For certain applications, it might be useful to generalize the algorithm to the case when

is not necessarily irreducible. In that case, we assume that the tower

is only a “partial composition tower”, meaning that we no longer require that

. In Algorithm 5.1, we then need to make two adjustments:

These computations lead to an additional term

in the complexity bound of Theorem 5.3.

6.Triangular towers

of finite fields such that

for each

. One obvious way to represent an element of

is to write it as

, where

is a polynomial with

. An effective tower that uses this representation for the elements of the fields

is called a triangular tower. For such towers, the costs of the upward and downward conversions are zero. Throughout this section it will be convenient to make the relatively harmless assumption that

for all effective rings

. We always assume available the following precomputed data:

6.1.Complexity analysis for triangular towers

Proof. Let us first show how to reduce polynomial multiplication over

to polynomial multiplication over

. So consider two polynomials

and

, represented as polynomials in

evaluated at

. We may compute their product in

as follows: we first substitute

and

, which yields two polynomials

. We next compute their product in

. Now

for each

. Since each remainder can be computed using two multiplications of elements in

using the pre-inverse of

, we obtain

for a sufficiently large constant

independent of

. On the other hand, applying Karatsuba's trick, there exists a constant

such that

holds for all

Remark 6.2. We do not claim the constant

to be optimal in the latter lemma, but it is sufficient for our purposes.

Proof. By Proposition 2.2 the inverse of an element in

takes

operations in

. Combined with the previous lemma, we obtain

6.2.Smooth degrees over finite fields

Recall that an integer

is said to be -smooth whenever all its prime factors are at most

Proof. It suffices to gather prime factors, counted with multiplicities, into

products in the range

. Then

implies

Proof. We appeal to the previous lemma to construct the integer sequence

for which we precompute a triangular composition tower for

. The cost of Proposition 6.4 simplifies to

Notice that

minimizes the latter exponent, which leads to the cost

provided that

-smooth.

Remark 6.7. It is well known that the number of

-smooth integers below an integer

is asymptotically equal to

, where

is the Dickman–de Bruijn function defined by

with initial condition

for all

(see [42] for the original proof). For

, we have

. Then

decreases rapidly with

, etc. Another important consequence of Proposition 6.4 is the following: for more than 30% of large values of

, modular compositions and characteristic polynomials for irreducible

of degree

over

may be computed using

operations in

6.3.Cyclic modulus of prime degree over a finite field

An interesting application of Proposition 6.4 concerns cyclic moduli

, where

is a prime number different from the characteristic

Proof. Assume that

divides

, and write

. It is well known that the polynomial

is the product of the monic irreducible polynomials of

whose degrees divide

. We obtain

, by using Fermat's little theorem, which asserts that

Proof. The modulus

is separable, and the precomputations first involve the irreducible factorization of

into

, whose respective degrees

divide

Since

-smooth, each

-smooth, where

. Consequently, for the modulus

, the cost of Corollary 6.6 simplifies to

The total cost for all

is thus

. Applying Proposition 3.2, this leads to the cost

for one composition or characteristic polynomial modulo

7.Primitive towers

Proposition 6.4 shows that the overhead of triangular set arithmetic rapidly grows with the height

of the tower. In this section we consider an alternative representation for elements in the fields

. This representation allows for faster multiplication inside the fields

, but the upward and downward conversions may become more expensive. One major goal of this section is to provide a more precise analysis of the cost of these conversions and to isolate particular situations in which they can be computed fast.

7.1.Primitive towers

An effective tower

with

is said to be primitive if

for each

. In that case, we assume that we precomputed the minimal polynomial

of each

over

. It follows that

for

. On the other hand, the upward and downward conversions are more expensive than in the case of triangular towers. The following consequence of (7.1), (7.2) and Theorem 5.3 will be of frequent use.

7.2.Arbitrary primitive elements

For computing with arbitrary primitive towers, we recall that we always assume available the following precomputed data:

Assume that

for some arbitrary primitive element

. For a natural morphism

for

-algebras

and

, let

denote the cost of applying the morphism once in terms of the number of required operations in

Proof. Let

with

. We may convert

into

with

using

operations in

. Then we use the precomputed polynomial

with

, and also the minimal polynomial

over

, which has degree

. We now compute

using Horner's method. This requires

operations in

. The evaluation

is the natural image of

. This proves (7.3).

For the opposite direction, consider

and reinterpret

as an element of

with

and

. We use the precomputed minimal polynomial

over

, which has degree

. We next compute

. This requires

operations in

. We finally convert

coefficientwise into an element

. This requires

operations in

and yields the natural image of

. This completes the proof of (7.4).

For

, we have

, so all possible conversions are trivial and (7.7) holds. Now assume that the result holds until

. Let us first consider the case when

. Then (7.6) yields

in a similar way. This also deals with the case when

. We conclude by induction.

Comparing to Proposition 6.4, using primitive towers thus turns out to be more efficient than using triangular towers, although it requires more precomputations. Therefore the costs for the two particular cases studied in sections 6.2 and 6.3 may be revisited and slightly improved.

7.3.Nested towers

We say that a primitive tower

with

is a nested tower, if there exist

with

and

coprime, and

so that

has degree

. For this specific type of towers we require the following precomputations:

Proof. Consider an element

with

. Let

be the smallest power of two above

. Then we may compute

so that

. This computation follows the natural “divide and conquer” strategy: given

, we split it into

with

, we compute recursively

and

, and deduce

We may end the recurrence when

, which incurs a cost

that is repeated

times in total. For each depth of the recursive calls the total cost remains

operations in

. The overall cost of the method is

. The computation of the inverse of

requires

additional operations in

Conversely, let

with

, and let

be the smallest power of two above

. We compute

and look for an expansion of the form

At the end we have

, as required. Thanks to the precomputations this expansion requires

operations in

. Now we observe that

, where

and

. A final reduction by

takes

operations in

The above corollary makes nested composition towers extremely attractive from a complexity point of view. The existence of such towers only depends on

and the degrees

, but not on

. A practical way to construct nested towers is to pick random monic

of degrees

and to check that

is irreducible for each

. We repeat this process for random choices of

until we find a suitable tower. From a heuristic point of view, we will show below that the probability that we eventually obtain a nested tower is nonzero in most cases of interest. From a theoretical point of view, the existence problem of nested towers remains an interesting problem.

In order to analyze the probability that random choices of

provide us with a nested tower, we rely on

possible choices for the tuple

. For a fixed value of

, we maximize

by taking

. Setting

, we then have

The existence of a nested tower over

with extension degrees

is likely whenever

. Taking logarithms, this happens as soon as

The algorithm for finding

needs

runs before finding a suitable tower. An obvious optimization is to make a better use of successful guesses of

for which

is irreducible for all

: instead of starting everything over after one unsuccessful guess of

, we try at least

times for some fixed constant

. The expected number of guesses then drops to

It is an interesting question whether there exist finite fields

and sequences

for which it is possible to construct monic

of degrees

such that

is irreducible for each

. The literature contains specific constructions of nested towers over finite fields based on [8, Lemma 1] which relates the irreducibility of

where

. We refer the reader to [37, section 3.2] for a nice survey. Let us exemplify two useful constructions.

Example 7.8. Following [33, Theorem 4], if

is a prime power, and

, where

, and

is a nonzero square and

is a non-square in

. Then we may build a nested tower with

. For instance we may take

, and

Example 7.9. This following construction is also due to Kyuregyan [32, Theorem 7]. Let

be a monic irreducible polynomial of degree

over

, where

is even if

. Let

such that

is a non-square in

. Then we may take

and

for all

. In this way we have

. For instance with

, we may take

. Notice that computations with nested towers simplify a bit in this situation where

for all

Example 7.10. One may wonder whether the above examples admit generalizations for which the

are of odd degree

. A non trivial candidate example of this kind is the sequence

. We verified

to be irreducible over

for

, but does this hold for all

7.4.Composed towers

Over finite fields, the situation when the

are pairwise coprime may be exploited to construct special towers, relying on the following well-known lemma:

Remark 7.12. Given primitive elements

and

of coprime degrees

and

over

, an alternative way to state the lemma is that

is again a primitive element of degree

over

Remark 7.13. Composed products can be computed in softly linear time [2], by means of the Newton–Girard identities (see also [16] for handling these identities in small characteristic).

Let

be a primitive tower and let

and

be as usual. We say that

is a composed tower if the

are pairwise coprime and if there exist monic irreducible polynomials

of degrees

such that

and

for

In that case, the minimal polynomial

over

is given by

and

for

: if

were reducible over

then

would have a proper gcd with

which is impossible (see Proposition 4.1).

By construction, we thus have

and

. For each

, let

be a root of

and

, so that

. For such composed towers, we assume that the following precomputations have been done for

Proof. Let

with

. We wish to compute

with

. For this purpose we first calculate

Conversely, let

with

. The direct image

with

is obtained via Chinese remaindering:

which takes

operations in

, when using our assumption that

and its inverse modulo

and

have been precomputed.

Given a number

, we say that an integer

is -super-smooth if for every prime power

that divides

, we have

. For instance, the product of the first

prime numbers grows as

(see for instance [17, chapter 22]) and is therefore

-super-smooth for sufficiently large

. Similarly, the number

-super-smooth for every

. For a fixed modulus of super-smooth degree, the following corollary shows that modular composition can be done in softly linear time in this specific situation:

7.5.Artin–Schreier towers

Using the composed tower approach, we are left with the question how to deal with algebraic extensions of prime power degree

. In the case when

coincides with the characteristic

of the field

, one may use Artin–Schreier towers instead, as outlined below.

An Artin–Schreier polynomial over a field

of characteristic

, is an irreducible polynomial of

of the form

. An Artin–Schreier tower of height

over

is a tower of field extensions

where each extension

is explicitly constructed from an Artin–Schreier polynomial over

In order to simplify the presentation, we restrict ourselves to the case when

. For the minimal polynomials of the successive extensions, we take

In [10, Theorem 2], it is shown that this defines a primitive tower. The polynomials

may be computed using

operations in

, according to [10, Theorems 12]. We assume that the following precomputations have been done for

(see [10, end of section 4]):

The costs of these precomputations are

and

, respectively. We now have the following complexity bound for the upward and downward conversions.

From Lemma 7.1, it follows that

can be computed using

operations in

. Using the fact that

, the result follows.

8.Building composition towers

Let

be a given finite field and let

be a given monic irreducible polynomial of degree

over

. In this section we consider the task of constructing composition towers for

. In fact, we may distinguish three different problems of increasing complexity:

Moreover, one may study these problems for each type of towers that we have encountered so far. From now on, we drop the study of triangular towers, for simplicity and because primitive towers are more efficient anyway. Given an effective tower, we notice that the construction of a primitive tower still requires computing the minimal polynomials

over

of the

. In general, it is not known how to do this in quasilinear time without using the power projection algorithm by Kedlaya and Umans.

The traditional solution to the first problem involves computing irreducible polynomials

of “small” degrees

over “large” field extensions

. For instance, when using Shoup's algorithm [45], the number of operations in

grows with

(in this case, Couveignes and Lercier's algorithm [9] is less competitive). In [14, Proposition 4.6] von zur Gathen and Seroussi proved the lower bound

for factoring polynomials of degree

over

in the arithmetic circuit model. Consequently the quadratic cost in

might be difficult to decrease in general. Theorem 8.2 below shows how to achieve a cost that is quasi-linear in

in the case when

In this section we mainly focus on the efficient construction of primitive composition towers for prescribed

. We first describe the general algorithm and then explain possible speed-ups for nested, composed, and Artin-Schreier towers. Altogether, this yields an efficient answer to the second problem. Notice that composition towers with no prescribed modulus are sufficient if we merely need a representation for the finite field

such that polynomials in

can be evaluated efficiently at points in

The third problem is more difficult. So far we have not been able to apply the techniques of this paper to obtain more efficient solutions, even when

is very smooth or in the extremely favourable case of Artin–Schreier towers. In practice, a straightforward strategy for the construction of composition towers for

is to factor

over all intermediate fields

using standard available algorithms. This is discussed at the end of the section.

8.1.Building primitive composition towers

The naive construction of a primitive composition tower with prescribed extension degrees

proceeds by induction. Assume the tower is built up to height

. We first construct an irreducible polynomial

and set

. If

is “sufficiently random”, then the class

is a primitive element over

. Its minimal polynomial

over

may simply be obtained as

where

is the natural preimage with bidegree

that satisfies

. The latter resultant requires roughly

operations in

when using the best known algorithms. To complete the composition tower of height

, it remains to compute the sequence of normal factors of

over

. Factorization algorithms based on modular composition [31] can achieve this in time

, roughly speaking.

In fact, we will show how to build primitive composition towers far more efficiently. We still proceed by induction. The composition towers naturally share the same underlying effective subtowers of

. More precisely, the subtower of height

consists of the fields

and forms a composition tower for

; the successive normal factors are written

, and the auxiliary polynomials

of Definition 5.2 are written

We begin with constructing an irreducible polynomial

of degree

, so the first primitive tower is made of

, and it is a composition tower for

with normal factor

. The auxiliary polynomial

satisfies

For the second composition tower we construct an irreducible polynomial

of degree

, which defines

as the class of

. If

is “sufficiently random”, then

is a primitive element of

over

. We obtain the minimal polynomial of

over

, where

and

. The normal factor of

over

, and the one of

over

. We clearly have

and we obtain

from the subresultant of degree

and

(it necessarily exists because

is a primitive element of

over

, which implies that

is separable; this will be detailed below in the general situation). In this way we obtain a composition tower of height

The third tower again requires building an irreducible polynomial

. This defines

and

so we have

. We assume that

generates

over

and we wish to obtain the normal factorizations of its minimal polynomial

. Over

and

the normal factors are respectively

and

. For

, we use the second composition tower: we compute

such that

and

, then

. Then we obtain

, where

. The auxiliary polynomials

and

are obtained from the corresponding first subresultants.

For general heights, we use the following algorithm for the construction of composition towers.

Input
Output: The primitive composition tower for with , and where is the minimal polynomial of over .
Notation: The normal factors of over are written .

Set , , .
For from down to do
1. Compute over , where ;
2. Compute the subresultant of degree in of and written , and then set .
Set to the class of in , and let .

Notice that the precomputations PRE-P1 correspond to ; concerning PRE-P2, we have already computed , and is simply ; the precomputations PRE-P3 correspond to .
Return the composition tower for made from , , , , and the other precomputed auxiliary data.

Proof. By decreasing induction on

we prove that

is a composition tower for

which has degree

. This is clear for

and

. Assume the induction hypothesis holds for

. At the end of step 2.a the polynomial

has degree

and is the minimal polynomial of

over

. Since

is a normal factor of

over

, the degree of the ideal generated by

Since

is separable it belongs to the Gröbner basis of

for the lexicographic order induced by

. In particular a polynomial with leading monomial

belongs to

. This proves that the subresultant of degree

and

is nonzero, and that

is well defined, which implies the requested conditions:

As to the complexity analysis, we first notice that

. The conversions in step 2.a therefore take

operations in

. The resultant in step 2.a and the subresultant in step 2.b require

further operations, by Proposition 2.2, where

. The inversion of

modulo

takes

further operations.

Proof. The tower is constructed by natural successive applications of Algorithm 8.1. By Equation (7.8) the bound from Proposition 8.1 simplifies to

It remains to take the construction of the

into account for

. By Theorem A.4 a random polynomial

can be computed using an expected number of

compositions modulo

plus

operations in

. Thanks to Corollary 7.5 each composition modulo

may be done using

operations in

. On the other hand we simply take

. The sum over

yields

We claim that, with a small uniformly bounded probability, the roots of

do not have maximal degree

over

. Such a bad

is easily detected since the corresponding

is not separable. In that case we build another

at random and the average number of failures is bounded.

To prove the claim, we notice that the roots of

do not have maximal degree

over

if, and only if, the

conjugates of

over

are not pairwise distinct. Equivalently this means that the coefficients of

belong to a proper subfield of

. For each prime factor

the field

is a maximal proper subfield of

. All maximal proper subfields of

are obtained is this way, so there exist at most

many of them. The number of monic irreducible polynomials of degree

with coefficients in such a subfield is at most

; the number of monic irreducible polynomials of degree

over

is at least

(see for instance [13, Lemma 14.38]). Consequently the probability that the roots of

do not have maximal degree

over

is at most

8.2.Particular cases

Let us now investigate the consequences of Proposition 8.1 in the particular cases of nested, composed and Artin–Schreier towers. In our analysis, we actually construct all intermediate subtowers along with the composition tower itself. If one just needs the highest tower, then there are cases that can be optimized by exploiting the specificities of the types of towers under consideration.

Nested towers

Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PRE-N1. These precomputations amount to

. By Equation (7.9), the cost of Proposition 8.1 simplifies to

Composed towers

Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PRE-C1, PRE-C2, and PRE-C3. These precomputations amount to

. By Equation (7.10), the cost of Proposition 8.1 simplifies to

Artin–Schreier towers

Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PRE-A1 and PRE-A2, which amount to

. By Equation (7.11), the cost of Proposition 8.1 becomes

8.3.Building composition towers from roots

Let

be an algebraic tower. So far we have shown how to build composition towers for

. Let us now explain how to deduce composition towers for a given

monic and irreducible of degree

. By standard algorithms, we first compute a root

(for example with Rabin's algorithm; see [13, chapter 14]). Unfortunately we do not know how to exploit a composition tower to compute

more efficiently. We are interested in finding a monic normal factor

over each field

in the tower. In the extreme case when

, we take

Proof. We have

. We compute

and

by descending induction on

from

down to

. First we have

where

. The first subresultant

and

is well defined and

is invertible modulo

. Therefore we have

According to Proposition 2.2, the resultant and the first subresultant can be computed using

operations in

. Then

is deduced with

operations in

. Summing over

, the result follows.

Remark 8.7. Inversely, given a composition tower

, we notice that the polynomial

is of the form

with

. In other words, the computation of a composition tower for

is as least as hard as finding a root of

9.Applications and perspectives

We have shown that modular composition and characteristic polynomials for a fixed irreducible modulus

can be computed fast if

is smooth, when allowing for precomputations that depend solely on

. From a practical point of view, we expect that efficient implementations of our algorithms will lead to speed-ups with respect to state of the art methods as soon as

is composite and sufficiently large. From a theoretical point of view, the algorithms by Kedlaya and Umans generally outperform our new algorithms. One notable exception occurs when

is very smooth, in which case we were able to prove a quasi-linear complexity bound: see Corollary 7.16.

Our new algorithms are only more efficient for fixed moduli

. Nevertheless, the cost of the required precomputations as a function of

is of a similar order of magnitude as one composition modulo

without our methods. In other words, if we need to compute

compositions modulo the same modulus

, then our new methods are already of interest for small values of

. This problem of multiple modular compositions occurs in various applications:

Factorization over finite fields

Roots over large finite fields

Conversions

Given a non trivial divisor

, yet another representation of elements of

was needed in [23]. Let

be a primitive element of

and

be an element whose minimal polynomial has degree

. Then

and

are both isomorphic to

and correspond to two different representations. If

and

can be chosen “nicely”, then conversions between these representations can be computed fast, using similar methods as in sections 7.3, 7.4 and 7.5. Using modular composition, we may reduce the general case to this special case in which we are allowed to choose

and

Frobenius maps

Many intriguing questions remain to be answered. One major open problem is whether our new techniques can be used to build composition towers for a prescribed irreducible modulus

of smooth degree

in quasi-linear time. This would give us an unconditional algorithm for composition modulo

of quasi-linear time complexity. Another natural question is whether there exists an efficient way to reduce general modular composition to composition modulo irreducible polynomials of smooth degree. This question may be related to generalizations of our algorithms to modular composition for multivariate polynomials. Besides these fundamental issues, we finally think that there remains a lot of room for more minor improvements of the techniques in this paper and for working out various applications in more detail. It would also be interesting to have efficient implementations of the multiple algorithms that we have presented and investigate under which conditions they outperform previous algorithms in practice.

Appendix A.Pseudo-trace over finite fields

This appendix is devoted to Kaltofen and Shoup's fast algorithm for pseudo-traces over finite fields [28], along with its application to building irreducible polynomials (required in our section 8). We recall the algorithm for completeness, but also in order to make it work more generally over any finite field

Input
Output: such that ; of bidegree such that ; of bidegree such that .

If then return .
Let , and recursively compute .
Compute .
Compute over .
Compute over .
If is even, then return .
Compute and return , over , and over .

Proof. The proof proceeds by induction on

. The algorithm is clearly correct if

. Assume that it is correct for

. By linearity of the

-th power we have

This already proves the correctness by induction when

is even. Otherwise

and similar computations yield

The cost for obtaining

from

is essentially one or two compositions modulo

. If we write

with

, then we are led to compute

for all

and then

over

. Overall

requires

compositions modulo

plus

operations in

. The same bound holds for

. If

is odd, then step 7 takes again

compositions modulo

plus

operations in

. Finally, the depth of the recursion is

Proof. If suffices to compute

for Algorithm A.1 using

operations in

, in order to apply the preceding proposition.

Proof. The polynomial

is irreducible if, and only if,

and

for all prime divisor

. We may thus apply the above corollary.

Proof. We appeal to Ben-Or's algorithm [13, Algorithm 14.40], but use modular composition instead of the usual modular powering. Let

be a monic random polynomial of degree

over

. For

from

we compute

and stop as soon as

. In this case

is reducible, since it admits an irreducible factor of degree

(that divides

). If all the

are

then

is necessarily irreducible.

With the notations from Algorithm A.1, the

modulo

are computed as follows. First, we compute

and

using

operations in

. Second, we obtain

and

with

compositions modulo

and

with

, so we are led to compute

for all

and then

over

. The overall computation of

requires

compositions modulo

plus

operations in

. If the smallest degree of the irreducible factors of

, then testing the irreducibility of

takes

compositions modulo

plus

In average,

random trials for

are necessary to find an irreducible one, and the average value of the smallest degree

. Therefore the total expected cost is

Taking

, the expected cost in Theorem A.4 rewrites into

compositions modulo

and

field operations in

(thus, without using the Kedlaya–Umans algorithm). In favorable cases studied in the present paper, we may assume that one composition modulo

amounts to say

operations in

for a fixed, small number

. The expected cost then reduces to

operations in

As a first comparison, Shoup's algorithm [45] needs

expected operations in

(still without using the Kedlaya–Umans algorithm). For bounded values of

and large

, the availability of a fast algorithm for composition modulo

makes it possible to improve on Shoup's algorithm, as was already observed in [28]. In this specific case, Theorem A.4 indeed leads to a bit cost

instead of

with Shoup's algorithm.

As a second comparison, let

be a function such that

in the neighborhood of infinity. Based on Kedlaya and Umans' algorithm, Couveignes and Lercier designed an algorithm [9] that builds an irreducible polynomial of degree

over

in time

, when ignoring the cost of conversions to put elements of

into their standard representation in

. To be fair, we may assume fast modular composition for

, so the expected bit cost of Theorem A.4 further reduces to

. So roughly speaking, if

, then the Couveignes–Lercier algorithm is asymptotically faster; if

, then Theorem A.4 is competitive. A comparison with Shoup's algorithm [45] would not be fair here unless enhancing it with the Kedlaya–Umans algorithm; for large

, Shoup's algorithm [45] also remains competitive, of course.

Similar observations hold concerning the complexity of the factorization of polynomials of degree

. In fact, Kaltofen and Shoup, still in [28], applied their technique (namely Algorithm A.1) to reduce the cost of the Cantor–Zassenhaus factorization algorithm mostly to compositions modulo

and to

. For bounded

and large

, fast composition modulo a fixed modulus

is therefore particularly relevant to factorization. We intend to work out the details of this application in a forthcoming paper.

Acknowledgments

Bibliography

[1]: D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.
[2]: A. Bostan, Ph. Flajolet, B. Salvy, and É. Schost. Fast computation of special resultants. J. Symbolic Comput., 41(1):1–29, 2006.
[3]: A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.
[4]: J. V. Brawley and L. Carlitz. Irreducibles and the composed product for polynomials over a finite field. Discrete Math., 65(2):115–139, 1987.
[5]: R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
[6]: P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.
[7]: D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.
[8]: S. D. Cohen. On irreducible polynomials of certain types in finite fields. Proc. Camb. Phil. Soc., 66(2):335–344, 1969.
[9]: J.-M. Couveignes and R. Lercier. Fast construction of irreducible polynomials over finite fields. Israel J. Math., 194(1):77–105, 2013.
[10]: L. De Feo and É. Schost. Fast arithmetics in Artin-Schreier towers over finite fields. J. Symbolic Comput., 47(7):771–792, 2012.
[11]: L. E. Dickson. Linear Groups: With an Exposition of the Galois Field Theory. Dover Publication Inc., 1958.
[12]: J. Doliskani and É. Schost. Taking roots over high extensions of finite fields. Math. Comp., 83(285):435–446, 2014.
[13]: J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 3rd edition, 2013.
[14]: J. von zur Gathen and G. Seroussi. Boolean circuits versus arithmetic circuits. Inform. and Comput., 91(1):142–154, 1991.
[15]: J. von zur Gathen and V. Shoup. Computing Frobenius maps and factoring polynomials. Comput. Complexity, 2(3):187–224, 1992.
[16]: B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Alg. Eng. Comm. Comp., 27(3):237–257, 2016.
[17]: G. H. Hardy and E. M. Wright. An introduction to the theory of numbers. Oxford University Press, Oxford, 6th edition, 2008.
[18]: D. Harvey and J. van der Hoeven. Faster integer and polynomial multiplication using cyclotomic coefficient rings. Technical report, ArXiv, 2017. http://arxiv.org/abs/1712.03693.
[19]: D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
[20]: J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.
[21]: J. van der Hoeven. Fast composition of numeric power series. Technical Report 2008-09, Université Paris-Sud, Orsay, France, 2008.
[22]: J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://hal.archives-ouvertes.fr/hal-01403810.
[23]: J. van der Hoeven and R. Larrieu. The Frobenius FFT. In M. Burr, editor, Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '17, pages 437–444, New York, NY, USA, 2017. ACM.
[24]: J. van der Hoeven and G. Lecerf. Composition modulo powers of polynomials. In M. Burr, editor, Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '17, pages 445–452, New York, NY, USA, 2017. ACM.
[25]: J. van der Hoeven and G. Lecerf. Modular composition via complex roots. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01455731.
[26]: Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.
[27]: F. Johansson. A fast algorithm for reversion of power series. Math. Comp., 84:475–484, 2015.
[28]: E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.
[29]: E. Kaltofen and V. Shoup. Subquadratic-time factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.
[30]: K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.
[31]: K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
[32]: M. K. Kyuregyan. Recurrent methods for constructing irreducible polynomials over of odd characteristics. Finite Fields Appl., 9(1):39–58, 2003.
[33]: M. K. Kyuregyan. Recurrent methods for constructing irreducible polynomials over of odd characteristics, II. Finite Fields Appl., 12(3):357–378, 2006.
[34]: P. Lairez and T. Vaccon. On p-adic differential equations with separation of variables. In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '16, pages 319–323, New York, NY, USA, 2016. ACM.
[35]: F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 296–303, New York, NY, USA, 2014. ACM.
[36]: G. Lecerf. On the complexity of the Lickteig–Roy subresultant algorithm. J. Symbolic Comput., 2018. https://doi.org/10.1016/j.jsc.2018.04.017.
[37]: G. L. Mullen and D. Panario. Handbook of Finite Fields. Discrete Mathematics and Its Applications. Chapman and Hall/CRC, 2013.
[38]: A. K. Narayanan. Fast computation of isomorphisms between finite fields using elliptic curves. Technical report, arXiv:1604.03072, 2016. https://arxiv.org/abs/1604.03072.
[39]: C. H. Papadimitriou. Computational Complexity. Addison-Wesley, 1994.
[40]: M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.
[41]: M. O. Rabin. Probabilistic algorithms in finite fields. SIAM J. Comput., 9(2):273–280, 1980.
[42]: V. Ramaswami. On the number of positive integers less than and free of prime divisors greater than . Bull. Amer. Math. Soc., 55:1122–1127, 1949.
[43]: P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.
[44]: J.-A. Serret. Cours d'algèbre supérieure, volume 2 of Les grands classiques Gauthier-Villars. Jacques Gabay, 4th edition, 1992. Réimpression du tome second de la 4 édition du Cours d'algèbre supérieure de Joseph-Alfred Serret, publiée par Gauthier-Villars en 1879.
[45]: V. Shoup. Fast construction of irreducible polynomials over finite fields. J. Symbolic Comput., 17(5):371–391, 1994.
[46]: V. Shoup. Efficient computation of minimal polynomials in algebraic extensions of finite fields. In Proceedings of the 1999 International Symposium on Symbolic and Algebraic Computation, ISSAC '99, pages 53–58, New York, NY, USA, 1999. ACM.

			(7.1)
			(7.2)

			(7.3)
			(7.4)

			(7.5)
			(7.6)


			(i=2,p=2)
			(all other cases).

1.Introduction

1.1.Previous work

1.2.Contributions and outline of the article

2.Preliminaries

2.1.Complexity models

2.2.Polynomial multiplication

2.3.Univariate arithmetic

2.4.Bivariate arithmetic

2.5.Finite field arithmetic

2.6.Matrix multiplication

2.7.Modular composition and related operations

3.Modular composition via factorization

3.1.Separable moduli over algebraically closed fields

3.2.Pairwise coprime moduli

4.Exploiting factorizations over algebraic extensions

4.1.Degree reduction

4.2.Normal factorizations

4.3.Single extensions

5.Composition towers

5.1.Effective towers

5.2.Composition towers

5.3.Modular composition using composition towers

6.Triangular towers

6.1.Complexity analysis for triangular towers

6.2.Smooth degrees over finite fields

6.3.Cyclic modulus of prime degree over a finite field

7.Primitive towers

7.1.Primitive towers

7.2.Arbitrary primitive elements

7.3.Nested towers

7.4.Composed towers

7.5.Artin–Schreier towers

8.Building composition towers

8.1.Building primitive composition towers

8.2.Particular cases

Nested towers

Composed towers

Artin–Schreier towers

8.3.Building composition towers from roots

9.Applications and perspectives

Factorization over finite fields

Roots over large finite fields

Conversions

Frobenius maps

Appendix A.Pseudo-trace over finite fields

Acknowledgments

Bibliography