Modular composition via factorization

 Joris van der Hoevena, Grégoire Lecerfb

 CNRS (UMR 7161, LIX) Laboratoire d'informatique de l'École polytechnique Campus de l'École polytechnique 1, rue Honoré d'Estienne d'Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau, France

 a. Email: vdhoeven@lix.polytechnique.fr

 b. Email: lecerf@lix.polytechnique.fr

 Revised version, October 21, 2020
 Modular composition is the problem to compute the composition of two univariate polynomials modulo a third one. For polynomials with coefficients in a finite field, Kedlaya and Umans proved in 2008 that the theoretical bit complexity for performing this task could be made arbitrarily close to linear. Unfortunately, beyond its major theoretical impact, this result has not led to practically faster implementations yet. In this article, we explore how polynomial factorization may help modular composition. Most of our algorithms assume that the modulus is fixed, which allows us to precompute factorizations of the modulus over suitable extension fields. This significant restriction with respect to Kedlaya and Umans' algorithm is acceptable whenever we need to perform many compositions modulo the same modulus. For polynomials over a finite field, we show that compositions modulo a fixed modulus of composite degree are cheaper than general modular compositions. In some very favourable cases of fixed moduli of very smooth degree, we show that the complexity even becomes quasi-linear.

## 1.Introduction

Let be an effective field. This means that elements of can be represented on a computer and that we have algorithms for performing the field operations. Given polynomials , the problem of modular composition is to compute modulo . Modular composition is an important problem in complexity theory because of its applications to polynomial factorization [29, 30, 31]. It also occurs very naturally whenever one wishes to perform polynomial computations over inside an algebraic extension of . Given two different representations of an algebraic extension of , the implementation of an explicit isomorphism boils down to modular composition as well.

In this paper, we study the problem of composition modulo a fixed polynomial mostly in the case when is a finite field. We assume that is separable; the case of moduli of the form is studied in a separate paper [24]. Our results are based on the following simple observation: if a factorization is known, then composition modulo reduces to composition modulo the with . Curiously, this observation does not seem to be exploited in the standard literature on modular composition. In the case when is irreducible over , but admits a non trivial divisor , then the second crucial observation is that factors over . We may then apply the first observation in order to obtain an efficient algorithm for composition modulo . Finding the factorizations of over can be quite expensive in general, but such computations can be regarded as pre-computations if the modulus is fixed.

Besides modular composition, we also study the related problem of computing the characteristic polynomial of modulo . More precisely, we understand to be the characteristic polynomial of the multiplication endomorphism by in . In particular, we have modulo . Theoretically speaking, the fastest current method relies on the computation of so-called power projections (see for instance [16, section 2.5]); this task turns out to be of the same complexity as modular composition in all known cases.

### 1.1.Previous work

Denote by the number of operations in required to multiply two polynomials of . Let and be polynomials in of degrees , and . The naive modular composition algorithm takes operations in . In 1978, Brent and Kung gave an algorithm with cost : their algorithm from [5, section 2.1] actually concerns the case when , but it naturally extends to general . It uses the baby-step giant-step technique due to Paterson and Stockmeyer [40], and even yields a sub-quadratic cost when using fast linear algebra (see [28, p. 185]). The constant is such that a matrix over may be multiplied with another rectangular matrix in time . The best current bound is due to Huang and Pan [26, Theorem 10.1].

When linear algebra benefits from very fast implementations, its contribution is expected to be smaller than the other polynomial operations, on a certain bounded range for . In fact, for fixed values of and , the sizes of the “baby” and “giant” steps may be optimized in order to balance cost contributions of matrix and polynomial operations (this was studied for finite fields in an unpublished preprint of Shoup and Smolensky in 1992). Further improvements have been proposed in [27], based on the Lagrange inversion formula for the reversion of formal power series.

A major breakthrough has been achieved by Kedlaya and Umans [30, 31] in the case when is the finite field . For any positive , they have shown that the composition modulo can be computed using bit operations. Unfortunately, it remains a major open problem to turn this theoretical bit complexity bound into practically useful implementations.

In the special case of power series composition (i.e. when ), the best known complexity bound is again due to Brent and Kung: in [5], they showed that this requires operations in , under the condition that is invertible and that the characteristic is at least , where . The variant proposed by van der Hoeven [20, section 3.4.3] removes the condition on . For fields of small characteristic, Bernstein [1] proposed an algorithm that is softly linear in the precision but linear in the characteristic. These algorithms are generalized to moduli of the form in [24]; we show there that the composition reduces to one power series composition at order in , plus compositions modulo , and one characteristic polynomial computation modulo . Let us finally mention that series with integer, rational or floating point coefficients can often be composed in quasi-linear time as well in suitable bit complexity models, as shown by Ritzmann [43]; see also [21].

The expression is linear in . It is well known that the transposition of the application corresponds to the power projection task (see section 2.7), which is an important ingredient for computing minimal and characteristic polynomials. In [46], Shoup studied the computation of minimal polynomials in algebraic extensions of the form or , explicitly given by defining polynomials. He designed fast practical algorithms with low memory consumption, based on a smart combination of “baby-step giant-step” and transposed algorithms. However his method does not improve upon Brent and Kung's one from the asymptotic complexity point of view.

The characteristic polynomial of modulo may be obtained from suitable power projections of modulo thanks to the well-known Newton–Girard identities, which involve solving a first order differential equation to precision . This is rather elementary when the characteristic is zero or sufficiently large. Otherwise, -adic arithmetic is needed. A complete solution is described in [16]. More generally, a framework for using -adic arithmetic to solve ordinary differential equations in positive characteristic may be found in [34].

### 1.2.Contributions and outline of the article

The aim of this article is to show how suitable factorizations of the modulus can be exploited to speed up compositions modulo , as well as the computation of characteristic polynomials modulo . Most of the new results are derived from the following simple observation: if splits into linear factors in , and if its roots are given, then modular composition basically reduces to evaluating at the roots of , evaluating at these values of , and interpolate . It is well-known that each of these steps can be done in softly linear time using multiple point evaluation and interpolation. More generally, whenever can be factored into , the computation of reduces to the computations of for .

Of course, the existence of factorizations of heavily depends on itself and on fields over which we allow ourselves to factor . For instance, if , then we might consider computing the roots of in using a sufficient precision, or factoring over the -adic numbers for some well chosen prime number . If is a finite field, and is an irreducible polynomial of composite degree , then we may consider factorizations over the intermediate field . In a separate paper, we study the case when is the field of computable complex numbers [25]. In this paper, we mainly focus on the finite field case.

Concerning the organization of this paper, we first revisit several known techniques in section 2. This includes the introduction of cost functions for modular composition, power projection, and the computation of characteristic polynomials. In section 3, we examine how modular composition can be accelerated when a factorization of in is given. More precisely, if the irreducible factorization of the modulus is available, then our method reduces the composition modulo to several compositions modulo in softly linear time. A key ingredient, reused several times in the article, is the simultaneous computation of characteristic polynomials and modular compositions.

In section 4, we turn our attention to the specific situation of an irreducible modulus of composite degree over a finite field . We show how to exploit the existence of factorizations of over the intermediate fields into factors of degree .

The natural generalization to degrees with will be the subject of sections 5, 6 and 7. One important question is how to represent the elements of the intermediate fields and it is convenient to introduce the special concept of an effective algebraic tower for this purpose. We also introduce the notion of a composition tower for , which formalizes the requirement that we are given factorizations of over each of the intermediate fields . In section 5, we generalize the algorithm from section 4 to arbitrary composition towers. In section 6 we also give a detailed complexity analysis in the case of triangular towers when each intermediate field admits the form for suitable .

Section 7 is dedicated to primitive towers, in which case each is generated by a single primitive element over . If the are pairwise coprime (see section 7.4), then the field can be taken to be the composed product of , and computations in the tower become particularly efficient: in the case when is “super-smooth” (in the sense that the largest prime-power divisor of satisfies ), we will show that composition modulo can be done in quasi-linear time (modulo precomputations). In this very particular situation, our method is asymptotically faster than Kedlaya–Umans' algorithm. If admits a small characteristic and , then a similar result holds when using so called Artin–Schreier towers (see section 7.5). For general smooth , one may also consider nested towers (see section 7.3) for which the primitive elements are compositions of polynomials of degrees over . The existence of such towers for given and is an interesting open problem, with a generally positive answer in practice.

Our main complexity bounds for modular composition are summarized in Table 1.1. In this table, is a fixed irreducible polynomial of degree and . The entries correspond to the various types of towers that are considered in sections 6 and 7, while assuming that all necessary precomputations that depend on have been done.

This leaves us with the issue of how to conduct the precomputations. This is the subject of section 8, where we analyze the cost of building composition towers of various types. We will see that the construction of composition towers for prescribed composite extension degrees can usually be done fast. On the other hand, building composition towers for a prescribed modulus is of the same order of difficulty as factoring over the intermediate fields , or finding a root of in for a given representation of the elements of . Practical algorithms for this task are well known but their cost is quadratic in ; see [38] for recent theoretical complexity bounds.

 Tower type Expected number of operations in Triangular Proposition 6.4 Primitive Corollary 7.5 Nested Corollary 7.7 Composed Corollary 7.15 Artin–Schreier when Corollary 7.18

Table 1.1. Complexity bound for modular composition for various types of towers.

Whether our approach leads to competitive algorithms for modular composition thus depends on the question whether we require the factorization of to be part of the complexity or not. Indeed, if we are doing a large polynomial computation over in the algebraic extension , then we typically need to perform many modular compositions for different and , but for a fixed modulus . In such cases, it is reasonable to regard the factorization of as a precomputation. Furthermore, if we want to perform computations in a large finite field extension and if we are free to select a suitable representation for elements of this finite field, then we may build a modulus with using dedicated algorithms; from the asymptotic complexity point of view, these algorithms are usually faster than finding an irreducible modulus at random.

In fact, we recall that testing the irreducibility of in reduces to modular compositions in degree over , plus bit operations (see [31, section 8.2], based on Rabin's algorithm [41]). In practice, for the fast construction of an irreducible polynomial of degree over , it is recommended to use Shoup's algorithm [45], which uses an expected number of arithmetic operations in . If is much smaller than , then one may also resort to Couveignes and Lercier's algorithm [9]. Theoretically speaking, this algorithm admits an expected bit complexity , but it relies on Kedlaya and Umans' algorithm for modular composition.

## 2.Preliminaries

### 2.1.Complexity models

Recall that an effective ring is a ring with unity whose elements can be represented on a computer and such that we have algorithms for performing the ring operations. Effective fields and effective algebras over an effective ring are defined similarly.

Given an effective ring , algebraic complexity models express running times in terms of the number of operations in . Unless otherwise stated, we will analyze the costs of the algorithms in this paper in this way. More precisely, our results both apply for the straight-line program and computation tree models [6, chapter 4].

For randomized algorithms over a finite effective ring , we assume a special instruction that uniformly generates random elements in . For simplicity we assume that this instruction has constant cost. For a given input, the cost of an algorithm is thus a random variable. The expected cost of an algorithm for input size is defined as the maximum of the averages of these random variables over all possible inputs of size .

When working over the finite field with elements, we may also analyze the costs of algorithms in the bit complexity model, which relies on Turing machines with a sufficient number of tapes [39]. We will not explicitly consider this model in our paper, but most of our complexity bounds can easily be converted to this setting.

### 2.2.Polynomial multiplication

Let be an effective ring with unity, let , and denote

Given a polynomial or power series and , it is convenient to write

We write for a cost function such that two polynomials in can be multiplied using operations in . The schoolbook algorithm allows us to take . The fastest currently known algorithm [7] yields . Here, the soft-Oh notation means that ; see [13, section 25.7] for technical details. If is a field of finite characteristic, then it has been shown [18, 19] that , where denotes the iterated logarithm function. In what follows, we will always assume that is an increasing function in . This customary assumption implies the super-additivity of , namely for all and .

More generally, if is an effective -algebra, then it is sometimes convient to denote by a cost function such that two polynomials in can be multiplied using operations in .

### 2.3.Univariate arithmetic

Let be an effective field. The remainder (resp. quotient) of the euclidean division of by in is denoted by (resp. by ). For a fixed modulus of degree , euclidean divisions by are usually performed by computing a pre-inverse of . More precisely, is the inverse of in , computed at precision . Given , one obtains the quotient by multiplying with and the remainder as . Given we may thus compute the modular product using operations in .

We recall that the greatest common divisor of two polynomials of degrees at most over can be computed using operations in [13, Algorithm 11.4].

Let and consider points . Then the evaluations can be computed using operations in [13, chapter 10]. This operation is also called multipoint evaluation. The inverse operation is the interpolation, which consists of recovering from ; it can be performed with a similar cost. If the are fixed, then it is often possible to gain a factor using FFT trading [22].

More generally, if are polynomials with , then all the remainders can be computed using operations in . The inverse problem, called Chinese remaindering, again admits the same complexity .

### 2.4.Bivariate arithmetic

Let be an effective ring. Given a bivariate polynomial , we define its bidegree to be the pair with and . Using Kronecker substitution [13, section 8.4], two polynomials of bidegree may be multiplied using ring operations in .

If is a monic polynomial of degree in , and if and are two polynomials in then their canonical preimages in may be multiplied with operations in before being projected into with additional operations. Consequently each ring operation in in degree reduces to operations in . If is monic in then also takes operations in .

The computation of bivariate subresultants usually relies on fast evaluation/interpolation, as in the following well known proposition.

Proposition 2.1. [13, Corollary 11.18] Let be an effective field with elements. Any polynomial subresultant in of two polynomials and in of bidegrees can be computed using operations in .

Proof. The subresultant polynomial of degree of and in has degree in . It can be computed by evaluating and at values for in , computing subresultants in of degree , and interpolating the coefficients of . In total this costs .

However for the sake of generality we will rely on the following result.

Proposition 2.2. Let be an effective field. Any polynomial subresultant in of two polynomials and in of bidegrees , with the corresponding Bézout relation, can be computed using operations in that comprise at most inversions in .

Proof. This result corresponds to [36, Corollary 26]. The number of inversions comes from the fact that the underlying algorithm only needs to perform exact divisions by subresultant coefficients in . Each division requires to invert the leading coefficient of the divisor. There exist at most such leading coefficients.

### 2.5.Finite field arithmetic

Let be the finite field with elements. One way to represent elements of a finite field extension is as remainder classes of polynomials in modulo a monic reducible polynomial of degree . We write in order to emphasize that we use this representation. Multiplication in can be done using operations in . Given an element of degree over , we write for the subfield of generated by over , where we understand that elements in are represented as evaluations of polynomials in at .

For the bulk of the algorithms in this paper, we will work over the finite field . In that case, it can be shown that two polynomials in can be multiplied in time on a Turing machine with a sufficient number of tapes [19, section 8.1]. The algebraic complexity bounds in this paper are easy to adapt to this model: it mainly suffices to replace by in all bounds.[update references and bounds]

### 2.6.Matrix multiplication

The constant represents a feasible exponent for the multiplication cost of matrices: two square matrices of size can be multiplied using operations in their coefficient ring. The constant is defined similarly but for multiplying a matrix by a rectangular one. At present time the best known bound is due to Le Gall [35]. This naturally yields . However the latter bound does not improve upon the earlier bound due to Huang and Pan [26, Theorem 10.1].

### 2.7.Modular composition and related operations

Let be an effective ring. Let an effective -algebra of dimension whose elements are represented by vectors of size in a given basis. We introduce the following cost functions:

• : the cost of computing the modular composition , where is a monic polynomial of degree , and .

• : the cost of computing the modular composition in terms of operations in , where is a monic polynomial of degree , and .

• : the cost of computing the characteristic polynomial of modulo a monic polynomial of degree . This is the characteristic polynomial of the multiplication endomorphism by in .

• : the cost of modular power projections, i.e. the cost to compute , where is monic of degree and is an -linear form on .

Let us recall a few known results about these functions.

Theorem 2.3. Let be a monic polynomial of degree over a ring , and let . The composed polynomial may be computed with

1. operations in , or

2. or operations in .

Proof. The first bound is immediate. The proof of the second bound is detailed in [13, section 12.2].

For a fixed monic polynomial in of degree , the modular composition is a linear operation in . For and of degrees , the corresponding transposed application is precisely the operation of modular power projections. If a modular composition algorithm with cost can be transposed in the sense of [3], then this leads to a power projection algorithm with cost .

Theorem 2.4. Let be a monic polynomial of degree over a ring , and let . The characteristic polynomial of modulo can be computed using

1. operations in , including divisions in (the partial division in is supposed to be implemented), or

2. operations in , if is a field, or

3. operations in , if is a field with elements, or

4. operations in , if there exist given inverses of in .

Proof. See [24, section 2.2].

## 3.Modular composition via factorization

### 3.1.Separable moduli over algebraically closed fields

Let be an effective field. A monic polynomial is said to be separable if . We may use the following algorithm for composition modulo whenever is separable and its pairwise distinct roots in are given:

Algorithm 3.1

Input
Polynomials and pairwise distinct .

Output

, where .

1. Compute using fast multi-point evaluation.

2. Compute using fast multi-point evaluation.

3. Retrieve with using fast interpolation.

4. Return .

Theorem 3.1. Algorithm 3.1 is correct and requires operations in .

Proof. By construction, for . Since and the are pairwise distinct, it follows that . This proves the correctness of the algorithm. The complexity bound follows from the fact that each of the steps 1, 2 and 3 can be performed in time .

### 3.2.Pairwise coprime moduli

Let again be a general effective field. The algorithm from the previous section may be generalized to composition modulo a polynomial that can be factored partially as in , where the polynomials are pairwise coprime (although not necessarily irreducible).

Algorithm 3.2

Input
Pairwise coprime polynomials in such that has degree ; Polynomials in .

Output

, and the characteristic polynomial of modulo .

1. Use a multi-remainder algorithm to compute , for all .

2. For all , compute the characteristic polynomial of modulo .

3. Use a multi-remainder algorithm to compute , for all .

4. For all , perform the modular composition .

5. Use Chinese remaindering to compute in of degree such that for all .

6. Return and .

Proposition 3.2. Algorithm 3.2 is correct and takes operations in , where .

Proof. For all , the Cayley–Hamilton theorem gives us , which implies , whence the correctness of . The correctness of the characteristic polynomial of follows from the usual isomorphism of -algebras .

The costs of steps 1, 3, 5, and 6 are . Step 2 costs , and step 4 takes operations in .

Example 3.3. For some families of polynomials the irreducible factorization is explicitly known. For instance, the following result is due to Serret [44, section III, chapitre III, pp. 158–162] (see also [11, pp. 24–27], [37, Theorem 3.2.18]):

Let be a finite field of characteristic such that with and odd. Let be an element of order . Let be a multiple of having all its prime factors dividing but not . Then the polynomial factors into irreducible polynomials of degrees . The irreducible factors may be described explicitly.

For example, with , , , , , , the polynomial factors into irreducible polynomials of degree . Taking instead leads to and .

## 4.Exploiting factorizations over algebraic extensions

### 4.1.Degree reduction

Let still be an effective field and assume that we wish to compute a modular composition , where and is monic. Let us study what happens if the polynomials and to be composed have degrees larger than . We clearly have

and we may compute using operations in [13, Exercise 9.16]. Without loss of generality we may therefore assume that .

If exceeds , then it suffices to perform modular compositions:

This requires additional compositions modulo in size , plus operations in .

Alternatively, given a polynomial with , the following formula provides us with a more efficient way to reduce the degree of :

Taking to be the characteristic polynomial of modulo , its computation usually admits a similar cost as modular composition. Therefore it is worth using this method unless . This is actually one key ingredient for the upcoming algorithms for modular composition: in order to reduce a composition modulo to compositions modulo a factor of , we in particular need to compute the characteristic polynomial of modulo . At the end of the recursive calls, one should nevertheless keep in mind that we only need annihilating polynomials, so that we may also use minimal polynomials. Shoup has given a probabilistic algorithm for computing minimal polynomials [46], which is useful for actual implementations.

### 4.2.Normal factorizations

In the case when we wish to compute a composition modulo an irreducible polynomial , we cannot apply the algorithms from sections 3.1 and 3.2. Nevertheless, it might happen that admits a non trivial factorization over an algebraic extension of . This generically happens when is a finite field and is composite. Indeed, we recall the following well known result.

Proposition 4.1. Let be a monic irreducible polynomial in of degree , and let be an integer dividing . Then there exist an irreducible polynomial of degree , and a polynomial of bidegree , monic in , such that the irreducible factorization of over is exactly .

Proof. Since is irreducible, is isomorphic to , which contains . We may thus take a generator of the image of in , and write its minimal polynomial over . Let be the class of in and let be its monic minimal polynomial over . Then divides and so do its conjugates for . On the other hand, since , we have , so any root of is a root of one of the , which proves the equality .

We call factorizations as in this proposition “normal factorizations”. This concept can actually be defined over arbitrary fields, as follows. Let be a monic separable polynomial in , let be a divisor of , and let be a monic separable irreducible polynomial of degree . We set , and write for the class of in . We say that admits a normal factorization over if there exists a bivariate polynomial that is monic in , of bidegree , and such that factors into over , with and coprime whenever . We call the normal factor of over . Notice that the polynomials and are not required to be irreducible here.

Example 4.2. With , is irreducible, and we have . For we take and present as . Then factors over as

where are the two roots of in . More precisely for we may take the class of in , whereas . The normal factor of is thus .

Example 4.3. When the situation is different from the case of finite fields. For instance is irreducible, but it remains irreducible over , , etc. Nevertheless, for a prescribed extension degree , we may randomly pick an irreducible of degree and a polynomial of bidegree that is irreducible as a polynomial in , and build as . If is separable, then it is irreducible in , and is a normal factor of over .

### 4.3.Single extensions

Assume that admits a normal factorization as above. Then the Chinese remainder theorem yields a natural isomorphism

and we may define as the unique polynomial in that satisfies and . We may now adapt the algorithm from section 3.2 as follows:

Algorithm 4.1

Input
Polynomials , , , as above, and in .

Output

, and the characteristic polynomial of modulo .

1. Compute the remainder in .

2. Compute the characteristic polynomial of modulo in .

3. Compute in .

4. Perform the modular composition in .

5. Return and .

Proposition 4.4. Algorithm 4.1 is correct, and takes

operations in .

Proof. We first observe that

It follows that , whence is computed correctly. As to the characteristic polynomial of , the argument is the same as for Algorithm 3.2, thanks to the Poisson formula .

Now the multiplication of two polynomials in of degree using Kronecker substitution requires operations in . This way, steps 1 and 3 take operations in . Steps 2 and 4 respectively cost and operations in . The computation of in the last step may be done naively using operations in . The computation of requires further operations by Proposition 2.2.

Corollary 4.5. With the above notations, and given a normal factorization of with irreducible, for and , the modular composition can be computed using operations in .

Proof. We simply apply Algorithm 4.1. For we use , which takes operations in by Proposition 2.2. We perform the computations in step 4 naively, which yields .

## 5.Composition towers

### 5.1.Effective towers

Corollary 4.5 already illustrates the potential of our ability to factor non trivially over an extension field . This idea can be pushed even farther whenever the factors with can be factored recursively over a tower of extension fields of . In order to carry out this generalization, we first need to decide how to compute with elements in the successive fields of such a tower. Instead of privileging particular representations, we rely on the abstract concept of an effective tower.

Definition 5.1. An effective tower over is a tower of fields

with the following properties:

• Each field comes with a specific way to represent its elements and algorithms for the field operations.

• For , the field is a finite separable algebraic extension of , and we have precomputed an element along with its minimal polynomial over , such that and . We set .

• For , we have algorithms for computing the natural bijection , given by

and its inverse . We call and the upward and downward conversions at level . The coefficientwise extensions of and yield mappings and that we still denote by and .

We denote by the cost of multiplying two polynomials in in terms of the number of required operations in . Similarly, we write for the cost of inverting an element in in terms of the number of operations in . We always assume that additions and subtractions can be done in linear time. We also let upper bound the costs of both the upward and downward conversions at level .

### 5.2.Composition towers

Let us now return to our particular modulus and assume that its degree admits the factorization with for . If , then Algorithm 4.1 shows how to reduce modular composition modulo to composition modulo a polynomial over of degree , provided a normal factor of over exists and is given. In order to generalize this idea to the case when , it is useful to introduce the concept of a composition tower.

Definition 5.2. Let be an effective tower. Let be a monic separable polynomial of degree . We say that is a composition tower for over if the following properties are satisfied:

• We let , and for each , we have precomputed a monic polynomial such that is a normal factor of over . We let . For convenience, is also called a normal factor of .

• For , we have the isomorphism

and we assume we have precomputed a polynomial such that

### 5.3.Modular composition using composition towers

Given monic and separable along with a composition tower , we may now apply Algorithm 4.1 recursively. Unrolling the recursive calls yields the following algorithm for modular composition.

Algorithm 5.1

Input
of degrees and a composition tower for .

Output

, and the characteristic polynomial of modulo .

1. Set .

2. Let be the characteristic polynomial of modulo over .

3. For do

Compute .

4. For , compute in .

5. Let .

6. For do

Compute .

7. Return and .

Theorem 5.3. Algorithm 5.1 is correct and takes

operations in .

Proof. Let , and for , let in . The polynomial in step 3 is the characteristic polynomial of modulo over and we notice that in step 6. The correctness of the algorithm results from these observations.

The computation of the resultant in step 3 can be performed in time by Proposition 2.2. It follows that the complete step 3 requires

operations in .

The computation of in step 4 can be done in time . The computation of the remainder of its division by can be done in time . Therefore step 4 amounts to

operations in .

In step 6, the naive evaluation of at modulo using Horner's method requires operations in . Consequently, step 6 requires

operations in . The conclusion follows by adding up the above bounds for the costs of the individual steps.

Remark 5.4. For certain applications, it might be useful to generalize the algorithm to the case when is not necessarily irreducible. In that case, we assume that the tower is only a “partial composition tower”, meaning that we no longer require that . In Algorithm 5.1, we then need to make two adjustments:

• In step 2, we compute to be the characteristic polynomial of modulo over .

• In step 5, we compute in .

These computations lead to an additional term in the complexity bound of Theorem 5.3.

## 6.Triangular towers

Assume that we are given a tower

of finite fields such that for each . One obvious way to represent an element of is to write it as , where is a polynomial with . An effective tower that uses this representation for the elements of the fields is called a triangular tower. For such towers, the costs of the upward and downward conversions are zero. Throughout this section it will be convenient to make the relatively harmless assumption that for all effective rings . We always assume available the following precomputed data:

PRE-T1

For all , the pre-inverse of .

### 6.1.Complexity analysis for triangular towers

Lemma 6.1. Let be a triangular tower. Then

for all and .

Proof. Let us first show how to reduce polynomial multiplication over to polynomial multiplication over . So consider two polynomials and in , represented as polynomials in evaluated at . We may compute their product in as follows: we first substitute in and , which yields two polynomials . We next compute their product in . Now for each . Since each remainder can be computed using two multiplications of elements in using the pre-inverse of , we obtain

for a sufficiently large constant independent of . On the other hand, applying Karatsuba's trick, there exists a constant such that holds for all .

If then we havewhich yields

We claim that

The proof is done by induction assuming the inequality holds for :

Remark 6.2. We do not claim the constant to be optimal in the latter lemma, but it is sufficient for our purposes.

Lemma 6.3. Let be a triangular tower. Then inverting an element in may be done with operations in .

Proof. By Proposition 2.2 the inverse of an element in takes operations in . Combined with the previous lemma, we obtain

for some sufficiently large constant . This yields the claimed bound.

Proposition 6.4. Let be a triangular composition tower for with . Given , we may then compute and the characteristic polynomial of modulo using

operations in .

Proof. By Lemma 6.1 we have

Then Lemma 6.3 gives

The conclusion follows from Theorem 5.3.

### 6.2.Smooth degrees over finite fields

Recall that an integer is said to be -smooth whenever all its prime factors are at most .

Lemma 6.5. Let . If is -smooth and sufficiently large, then there exist integers such that , for all , and , where .

Proof. It suffices to gather prime factors, counted with multiplicities, into products in the range . Then implies .

Corollary 6.6. Let . If is -smooth, and given a suitable triangular composition tower for of degree , then one composition or one characteristic polynomial modulo may be computed using operations in .

Proof. We appeal to the previous lemma to construct the integer sequence for which we precompute a triangular composition tower for . The cost of Proposition 6.4 simplifies to

Notice that minimizes the latter exponent, which leads to the cost provided that is -smooth.

Remark 6.7. It is well known that the number of -smooth integers below an integer is asymptotically equal to , where is the Dickman–de Bruijn function defined by with initial condition for all (see [42] for the original proof). For , we have . Then decreases rapidly with , , etc. Another important consequence of Proposition 6.4 is the following: for more than 30% of large values of , modular compositions and characteristic polynomials for irreducible of degree over may be computed using operations in .

### 6.3.Cyclic modulus of prime degree over a finite field

An interesting application of Proposition 6.4 concerns cyclic moduli in , where is a prime number different from the characteristic .

Lemma 6.8. If is a prime number different from , and if divides , then the degrees of the irreducible factors of in divide .

Proof. Assume that divides , and write . It is well known that the polynomial is the product of the monic irreducible polynomials of whose degrees divide . We obtain , by using Fermat's little theorem, which asserts that .

Corollary 6.9. Let . If is a prime number different from , and if is -smooth, then we may precompute suitable triangular composition towers for all irreducible factors of , so one composition or characteristic polynomial modulo may be obtained using operations in .

Proof. The modulus is separable, and the precomputations first involve the irreducible factorization of into , whose respective degrees divide .

Since is -smooth, each is -smooth, where . Consequently, for the modulus , the cost of Corollary 6.6 simplifies to

The total cost for all is thus . Applying Proposition 3.2, this leads to the cost for one composition or characteristic polynomial modulo .

## 7.Primitive towers

Proposition 6.4 shows that the overhead of triangular set arithmetic rapidly grows with the height of the tower. In this section we consider an alternative representation for elements in the fields . This representation allows for faster multiplication inside the fields , but the upward and downward conversions may become more expensive. One major goal of this section is to provide a more precise analysis of the cost of these conversions and to isolate particular situations in which they can be computed fast.

### 7.1.Primitive towers

An effective tower with is said to be primitive if for each . In that case, we assume that we precomputed the minimal polynomial of each over . It follows that

 (7.1) (7.2)

for . On the other hand, the upward and downward conversions are more expensive than in the case of triangular towers. The following consequence of (7.1), (7.2) and Theorem 5.3 will be of frequent use.

Lemma 7.1. For a primitive composition tower for with , Algorithm 5.1 takes

operations in .

Proof. We have

so the conclusion follows from Theorem 5.3.

### 7.2.Arbitrary primitive elements

For computing with arbitrary primitive towers, we recall that we always assume available the following precomputed data:

PRE-P1

For all , the minimal polynomial of over ,

PRE-P2

For all , the minimal polynomial of over ,

PRE-P3

For all , the polynomial expression of in terms of over .

Assume that for some arbitrary primitive element . For a natural morphism for -algebras and , let denote the cost of applying the morphism once in terms of the number of required operations in .

Lemma 7.2. Modulo the above precomputations, we have for all ,

 (7.3) (7.4)

Proof. Let with . We may convert into with using operations in . Then we use the precomputed polynomial with , and also the minimal polynomial of over , which has degree . We now compute using Horner's method. This requires operations in . The evaluation is the natural image of in . This proves (7.3).

For the opposite direction, consider and reinterpret as an element of with and . We use the precomputed minimal polynomial of over , which has degree . We next compute . This requires operations in . We finally convert coefficientwise into an element of . This requires operations in and yields the natural image of in . This completes the proof of (7.4).

Lemma 7.3. Modulo precomputations, we have for all and ,

Proof. It will be convenient to use the following abbreviations:

Let be a constant such that

 (7.5) (7.6)

in the previous lemma and let us show by induction over that

 (7.7)

For , we have , so all possible conversions are trivial and (7.7) holds. Now assume that the result holds until . Let us first consider the case when . Then (7.6) yields

Since , this also deals with the case when . If , then (7.5) yields

in a similar way. This also deals with the case when . We conclude by induction.

Corollary 7.4. Modulo precomputations, we have for all ,

Corollary 7.5. Let be a primitive composition tower for with . Given , we may then compute one composition or characteristic polynomial modulo using

operations in .

Proof. From Corollary 7.4 we deduce

 (7.8)

so the conclusion follows from Lemma 7.1.

Comparing to Proposition 6.4, using primitive towers thus turns out to be more efficient than using triangular towers, although it requires more precomputations. Therefore the costs for the two particular cases studied in sections 6.2 and 6.3 may be revisited and slightly improved.

### 7.3.Nested towers

We say that a primitive tower with is a nested tower, if there exist with , , , and coprime, and

Setting and , this also means that

so that has degree . For this specific type of towers we require the following precomputations:

PRE-N1

and for and , where is the smallest power of two above .

Lemma 7.6. Given a nested tower , we have

for all .

Proof. Consider an element with . Let be the smallest power of two above . Then we may compute so that . This computation follows the natural “divide and conquer” strategy: given , we split it into with , we compute recursively and , and deduce

We may end the recurrence when , which incurs a cost that is repeated times in total. For each depth of the recursive calls the total cost remains operations in . The overall cost of the method is . The computation of the inverse of requires additional operations in .

Conversely, let with , and let be the smallest power of two above . We compute and look for an expansion of the form

where the . We again use the “divide and conquer” strategy:

• Compute , and recursively compute the expansion

• Compute , and recursively compute the expansion

At the end we have , as required. Thanks to the precomputations this expansion requires operations in . Now we observe that , where and . A final reduction by takes operations in .

Corollary 7.7. Let be a nested composition tower for with . Then we may compute one composition or characteristic polynomial modulo using

operations in .

Proof. Lemma 7.6 implies

 (7.9)

The result now follows from Lemma 7.1.

The above corollary makes nested composition towers extremely attractive from a complexity point of view. The existence of such towers only depends on and the degrees , but not on . A practical way to construct nested towers is to pick random monic of degrees and to check that is irreducible for each . We repeat this process for random choices of until we find a suitable tower. From a heuristic point of view, we will show below that the probability that we eventually obtain a nested tower is nonzero in most cases of interest. From a theoretical point of view, the existence problem of nested towers remains an interesting problem.

In order to analyze the probability that random choices of provide us with a nested tower, we rely on

• the fact that a random polynomial over of degree is irreducible with probability ;

• the heuristic assumption that is again random for .

In this framework, the probability that is irreducible for each is given by

On the other hand, if has cardinality , then we have

possible choices for the tuple . For a fixed value of , we maximize by taking . Setting , we then have

The existence of a nested tower over with extension degrees is likely whenever . Taking logarithms, this happens as soon as

The algorithm for finding needs runs before finding a suitable tower. An obvious optimization is to make a better use of successful guesses of for which is irreducible for all : instead of starting everything over after one unsuccessful guess of , we try at least times for some fixed constant . The expected number of guesses then drops to .

It is an interesting question whether there exist finite fields and sequences for which it is possible to construct monic of degrees such that is irreducible for each . The literature contains specific constructions of nested towers over finite fields based on [8, Lemma 1] which relates the irreducibility of to where . We refer the reader to [37, section 3.2] for a nice survey. Let us exemplify two useful constructions.

Example 7.8. Following [33, Theorem 4], if is a prime power, and , where , and is a nonzero square and is a non-square in . Then we may build a nested tower with , , . For instance we may take , , and .

Example 7.9. This following construction is also due to Kyuregyan [32, Theorem 7]. Let be a monic irreducible polynomial of degree over , where is even if . Let such that is a non-square in . Then we may take and for all . In this way we have . For instance with , we may take , , . Notice that computations with nested towers simplify a bit in this situation where for all .

Example 7.10. One may wonder whether the above examples admit generalizations for which the are of odd degree . A non trivial candidate example of this kind is the sequence . We verified to be irreducible over for , but does this hold for all ?

### 7.4.Composed towers

Over finite fields, the situation when the are pairwise coprime may be exploited to construct special towers, relying on the following well-known lemma:

Lemma 7.11. Let be a finite field and consider two monic irreducible polynomials and in whose degrees are coprime. Then the composed product , defined by

is irreducible in and of degree .

Proof. See for instance [4, Theorem 2].

Remark 7.12. Given primitive elements and of coprime degrees and over , an alternative way to state the lemma is that is again a primitive element of degree over .

Remark 7.13. Composed products can be computed in softly linear time [2], by means of the Newton–Girard identities (see also [16] for handling these identities in small characteristic).

Let be a primitive tower and let , and be as usual. We say that is a composed tower if the are pairwise coprime and if there exist monic irreducible polynomials of degrees such that and for .

In that case, the minimal polynomial of over is given by and for : if were reducible over then would have a proper gcd with which is impossible (see Proposition 4.1).

By construction, we thus have and . For each , let be a root of and , so that . For such composed towers, we assume that the following precomputations have been done for :

PRE-C1

,

PRE-C2

the trace map of over (as a vector of ),

PRE-C3

and its inverse modulo and .

Lemma 7.14. Let be a composed tower. Then we have

for all .

Proof. Let with . We wish to compute with . For this purpose we first calculate

using operations in . Then

can be computed using additional operations.

Conversely, let with . The direct image with is obtained via Chinese remaindering:

In fact we just verify that . So if we let , then we may calculate

which takes operations in , when using our assumption that and its inverse modulo and have been precomputed.

Corollary 7.15. Let be a composed composition tower for with . Then, given , we may compute using

operations in .

Proof. Lemma 7.14 implies

 (7.10)

We conclude by Lemma 7.1.

Given a number , we say that an integer is -super-smooth if for every prime power that divides , we have . For instance, the product of the first prime numbers grows as (see for instance [17, chapter 22]) and is therefore -super-smooth for sufficiently large . Similarly, the number is -super-smooth for every . For a fixed modulus of super-smooth degree, the following corollary shows that modular composition can be done in softly linear time in this specific situation:

Corollary 7.16. Let be a fixed irreducible polynomial of -super-smooth degree over a finite field , and assume a composed composition tower has been precomputed for . Then, given any , we may compute using operations in .

Proof. We apply the previous corollary with and .

### 7.5.Artin–Schreier towers

Using the composed tower approach, we are left with the question how to deal with algebraic extensions of prime power degree . In the case when coincides with the characteristic of the field , one may use Artin–Schreier towers instead, as outlined below.

An Artin–Schreier polynomial over a field of characteristic , is an irreducible polynomial of of the form . An Artin–Schreier tower of height over is a tower of field extensions where each extension is explicitly constructed from an Artin–Schreier polynomial over .

In order to simplify the presentation, we restrict ourselves to the case when . For the minimal polynomials of the successive extensions, we take

 (i=2,p=2) (all other cases).

In [10, Theorem 2], it is shown that this defines a primitive tower. The polynomials may be computed using operations in , according to [10, Theorems 12]. We assume that the following precomputations have been done for (see [10, end of section 4]):

PRE-A1

the trace map on over in the canonical basis,

PRE-A2

.

The costs of these precomputations are and , respectively. We now have the following complexity bound for the upward and downward conversions.

Lemma 7.17. [10, Theorem 13] Let be an Artin–Schreier tower. Then, modulo precomputations, we have

for all .

Corollary 7.18. Let be an Artin–Schreier composition tower for with . Then, given , we may compute using

operations in .

Proof. Lemma 7.17 implies

 (7.11)

From Lemma 7.1, it follows that can be computed using operations in . Using the fact that , the result follows.

## 8.Building composition towers

Let be a given finite field and let be a given monic irreducible polynomial of degree over . In this section we consider the task of constructing composition towers for . In fact, we may distinguish three different problems of increasing complexity:

1. Building an algebraic tower with the prescribed extension degrees ;

2. Building a composition tower for some monic irreducible of degree ;

3. Building a composition tower for the prescribed modulus .

Moreover, one may study these problems for each type of towers that we have encountered so far. From now on, we drop the study of triangular towers, for simplicity and because primitive towers are more efficient anyway. Given an effective tower, we notice that the construction of a primitive tower still requires computing the minimal polynomials over of the . In general, it is not known how to do this in quasilinear time without using the power projection algorithm by Kedlaya and Umans.

The traditional solution to the first problem involves computing irreducible polynomials of “small” degrees over “large” field extensions . For instance, when using Shoup's algorithm [45], the number of operations in grows with (in this case, Couveignes and Lercier's algorithm [9] is less competitive). In [14, Proposition 4.6] von zur Gathen and Seroussi proved the lower bound for factoring polynomials of degree over in the arithmetic circuit model. Consequently the quadratic cost in might be difficult to decrease in general. Theorem 8.2 below shows how to achieve a cost that is quasi-linear in in the case when .

In this section we mainly focus on the efficient construction of primitive composition towers for prescribed . We first describe the general algorithm and then explain possible speed-ups for nested, composed, and Artin-Schreier towers. Altogether, this yields an efficient answer to the second problem. Notice that composition towers with no prescribed modulus are sufficient if we merely need a representation for the finite field such that polynomials in can be evaluated efficiently at points in .

The third problem is more difficult. So far we have not been able to apply the techniques of this paper to obtain more efficient solutions, even when is very smooth or in the extremely favourable case of Artin–Schreier towers. In practice, a straightforward strategy for the construction of composition towers for is to factor over all intermediate fields using standard available algorithms. This is discussed at the end of the section.

### 8.1.Building primitive composition towers

The naive construction of a primitive composition tower with prescribed extension degrees proceeds by induction. Assume the tower is built up to height . We first construct an irreducible polynomial in and set . If is “sufficiently random”, then the class of in is a primitive element over . Its minimal polynomial over may simply be obtained as where is the natural preimage with bidegree that satisfies . The latter resultant requires roughly operations in when using the best known algorithms. To complete the composition tower of height , it remains to compute the sequence of normal factors of over . Factorization algorithms based on modular composition [31] can achieve this in time , roughly speaking.

In fact, we will show how to build primitive composition towers far more efficiently. We still proceed by induction. The composition towers naturally share the same underlying effective subtowers of . More precisely, the subtower of height consists of the fields and forms a composition tower for ; the successive normal factors are written , and the auxiliary polynomials of Definition 5.2 are written .

We begin with constructing an irreducible polynomial of degree , so the first primitive tower is made of , and it is a composition tower for with normal factor . The auxiliary polynomial satisfies .

For the second composition tower we construct an irreducible polynomial of degree , which defines as the class of in . If is “sufficiently random”, then is a primitive element of over . We obtain the minimal polynomial of over as , where and . The normal factor of over is , and the one of over is . We clearly have and we obtain from the subresultant of degree in of and (it necessarily exists because is a primitive element of over , which implies that is separable; this will be detailed below in the general situation). In this way we obtain a composition tower of height .

The third tower again requires building an irreducible polynomial . This defines and so we have . We assume that generates over and we wish to obtain the normal factorizations of its minimal polynomial . Over and the normal factors are respectively and . For , we use the second composition tower: we compute such that and , then . Then we obtain , where . The auxiliary polynomials and are obtained from the corresponding first subresultants.

For general heights, we use the following algorithm for the construction of composition towers.

Algorithm 8.1

Input
Primitive composition towers for for ; irreducible of degree such that any root of has degree over .

Output

The primitive composition tower for with , and where is the minimal polynomial of over .

Notation

The normal factors of over are written .

1. Set , , .

2. For from down to do

1. Compute over , where ;

2. Compute the subresultant of degree in of and written , and then set .

3. Set to the class of in , and let .

Notice that the precomputations PRE-P1 correspond to ; concerning PRE-P2, we have already computed , and is simply ; the precomputations PRE-P3 correspond to .

4. Return the composition tower for made from , , , , and the other precomputed auxiliary data.

Proposition 8.1. Algorithm 8.1 is correct and takes

operations in .

Proof. By decreasing induction on we prove that is a composition tower for which has degree . This is clear for and . Assume the induction hypothesis holds for . At the end of step 2.a the polynomial has degree and is the minimal polynomial of over . Since is a normal factor of over , the degree of the ideal generated by is .

Since is separable it belongs to the Gröbner basis of for the lexicographic order induced by . In particular a polynomial with leading monomial belongs to . This proves that the subresultant of degree of and is nonzero, and that is well defined, which implies the requested conditions:

The induction hypothesis is thus satisfied for .

As to the complexity analysis, we first notice that . The conversions in step 2.a therefore take

operations in . The resultant in step 2.a and the subresultant in step 2.b require

further operations, by Proposition 2.2, where . The inversion of modulo takes further operations.

Theorem 8.2. Let . A primitive composition tower of degrees may be built using

expected operations in .

Proof. The tower is constructed by natural successive applications of Algorithm 8.1. By Equation (7.8) the bound from Proposition 8.1 simplifies to

It remains to take the construction of the into account for . By Theorem A.4 a random polynomial can be computed using an expected number of compositions modulo plus

operations in . Thanks to Corollary 7.5 each composition modulo may be done using operations in . On the other hand we simply take . The sum over yields

We claim that, with a small uniformly bounded probability, the roots of do not have maximal degree over . Such a bad is easily detected since the corresponding is not separable. In that case we build another at random and the average number of failures is bounded.

To prove the claim, we notice that the roots of do not have maximal degree over if, and only if, the conjugates of over are not pairwise distinct. Equivalently this means that the coefficients of belong to a proper subfield of . For each prime factor of the field is a maximal proper subfield of . All maximal proper subfields of are obtained is this way, so there exist at most many of them. The number of monic irreducible polynomials of degree with coefficients in such a subfield is at most ; the number of monic irreducible polynomials of degree over is at least (see for instance [13, Lemma 14.38]). Consequently the probability that the roots of do not have maximal degree over is at most

which is uniformly bounded by as soon as or is sufficiently large.

### 8.2.Particular cases

Let us now investigate the consequences of Proposition 8.1 in the particular cases of nested, composed and Artin–Schreier towers. In our analysis, we actually construct all intermediate subtowers along with the composition tower itself. If one just needs the highest tower, then there are cases that can be optimized by exploiting the specificities of the types of towers under consideration.

##### Nested towers

Corollary 8.3. Let and assume that we are given a nested tower as in section 7.3, defined by and . Then the nested composition tower for may be built using

operations in .

Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PRE-N1. These precomputations amount to . By Equation (7.9), the cost of Proposition 8.1 simplifies to .

##### Composed towers

Corollary 8.4. Let and assume that we are given a composed tower as in section 7.4, defined by . Then the composed composition tower for may be built using

operations in .

Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PRE-C1, PRE-C2, and PRE-C3. These precomputations amount to . By Equation (7.10), the cost of Proposition 8.1 simplifies to .

##### Artin–Schreier towers

Corollary 8.5. Let and assume that we are given an Artin–Schreier tower as in section 7.5, defined by . Then the Artin–Schreier composition tower for may be built using

operations in .

Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PRE-A1 and PRE-A2, which amount to . By Equation (7.11), the cost of Proposition 8.1 becomes

### 8.3.Building composition towers from roots

Let be an algebraic tower. So far we have shown how to build composition towers for . Let us now explain how to deduce composition towers for a given monic and irreducible of degree . By standard algorithms, we first compute a root of in (for example with Rabin's algorithm; see [13, chapter 14]). Unfortunately we do not know how to exploit a composition tower to compute more efficiently. We are interested in finding a monic normal factor of over each field in the tower. In the extreme case when , we take .

Proposition 8.6. Given a primitive composition tower for , given an irreducible , together with a root of , we may compute the minimal polynomials of over along with the from Definition 5.2 for , with cost

Proof. We have . We compute and by descending induction on from down to . First we have

where . The first subresultant of and is well defined and is invertible modulo . Therefore we have

According to Proposition 2.2, the resultant and the first subresultant can be computed using operations in . Then is deduced with operations in . Summing over , the result follows.

Remark 8.7. Inversely, given a composition tower , we notice that the polynomial is of the form with . In other words, the computation of a composition tower for is as least as hard as finding a root of in .

## 9.Applications and perspectives

We have shown that modular composition and characteristic polynomials for a fixed irreducible modulus can be computed fast if is smooth, when allowing for precomputations that depend solely on . From a practical point of view, we expect that efficient implementations of our algorithms will lead to speed-ups with respect to state of the art methods as soon as is composite and sufficiently large. From a theoretical point of view, the algorithms by Kedlaya and Umans generally outperform our new algorithms. One notable exception occurs when is very smooth, in which case we were able to prove a quasi-linear complexity bound: see Corollary 7.16.

Our new algorithms are only more efficient for fixed moduli . Nevertheless, the cost of the required precomputations as a function of is of a similar order of magnitude as one composition modulo without our methods. In other words, if we need to compute compositions modulo the same modulus , then our new methods are already of interest for small values of . This problem of multiple modular compositions occurs in various applications:

##### Factorization over finite fields
A standard application of modular composition over finite fields is the irreducible factorization of univariate polynomials [29, 31]. In general, the cost of the precomputations is too expensive for our algorithms to be interesting. Nevertheless, Kaltofen and Shoup showed in [28] how to reduce the cost of the Cantor–Zassenhaus factorization algorithm in mostly to compositions modulo the defining polynomial of over , and to . Our new algorithms should therefore be useful for factoring polynomials of small degree over . In our appendix we recall the main underlying technique and show how it concretely applies to the fast construction of irreducible polynomials in this context. We plan to work out the application to factorization in a future paper.

##### Roots over large finite fields
One particular instance of factorization over finite fields is the extraction of -th roots (i.e. finding the roots of polynomials of the form ). In that case, one may use [12, Theorem 1.1] in order to reduce the problem of root extraction to modular composition and the computation of minimal polynomials. The advantage of this method with respect to a direct use of [28] is that it typically remains fast for larger values of .

##### Conversions
It frequently happens that one has to work with different representations of elements of a finite field. For instance, given distinct irreducible polynomials and of degree over , elements of can both be represented as elements of and . Converting from the representation in to the one in boils down to composition modulo , assuming that the image of in is given. We emphasize that this conversion problem is distinct from finding the polynomial , which comes down to computing a root of in . Fast algorithms have recently been designed for the latter purpose by Narayanan [38], on top of the Kedlaya–Umans algorithm.

Given a non trivial divisor of , yet another representation of elements of was needed in [23]. Let be a primitive element of and be an element whose minimal polynomial has degree . Then and are both isomorphic to and correspond to two different representations. If and can be chosen “nicely”, then conversions between these representations can be computed fast, using similar methods as in sections 7.3, 7.4 and 7.5. Using modular composition, we may reduce the general case to this special case in which we are allowed to choose and .

##### Frobenius maps
Thanks to von zur Gathen and Shoup's algorithm [15], the Frobenius maps can be computed efficiently when fast modular composition is available. More precisely, assume we wish to compute in . If , then we use binary powering to get the preimage of . Then, by induction on , we may compute the canonical preimage of , so is the preimage of . If is even, then we are done. Otherwise, we compute to obtain the preimage of . Overall this method uses compositions modulo , plus operations in .

Many intriguing questions remain to be answered. One major open problem is whether our new techniques can be used to build composition towers for a prescribed irreducible modulus of smooth degree in quasi-linear time. This would give us an unconditional algorithm for composition modulo of quasi-linear time complexity. Another natural question is whether there exists an efficient way to reduce general modular composition to composition modulo irreducible polynomials of smooth degree. This question may be related to generalizations of our algorithms to modular composition for multivariate polynomials. Besides these fundamental issues, we finally think that there remains a lot of room for more minor improvements of the techniques in this paper and for working out various applications in more detail. It would also be interesting to have efficient implementations of the multiple algorithms that we have presented and investigate under which conditions they outperform previous algorithms in practice.

## Appendix A.Pseudo-trace over finite fields

This appendix is devoted to Kaltofen and Shoup's fast algorithm for pseudo-traces over finite fields [28], along with its application to building irreducible polynomials (required in our section 8). We recall the algorithm for completeness, but also in order to make it work more generally over any finite field .

Algorithm A.1

Input
irreducible of degree ; of bidegree monic in ; of bidegree ; an integer ; such that ; of bidegree such that ; of bidegree such that .

Output

such that ; of bidegree such that ; of bidegree such that .

1. If then return .

2. Let , and recursively compute .

3. Compute .

4. Compute over .

5. Compute over .

6. If is even, then return .

7. Compute and return , over , and over .

Proposition A.1. Let and . Algorithm A.1 is correct and takes compositions modulo and

Proof. The proof proceeds by induction on . The algorithm is clearly correct if . Assume that it is correct for . By linearity of the -th power we have

This already proves the correctness by induction when is even. Otherwise and similar computations yield ,

This proves the correctness.

The cost for obtaining from is essentially one or two compositions modulo . If we write with , then we are led to compute for all and then over . Overall requires compositions modulo plus operations in . The same bound holds for . If is odd, then step 7 takes again compositions modulo plus operations in . Finally, the depth of the recursion is .

Corollary A.2. Let , let be a monic irreducible polynomial in of degree , and let . For any monic polynomial of degree , and any , the pseudo-trace may be computed using compositions modulo plus

operations in .

Proof. If suffices to compute for Algorithm A.1 using operations in , in order to apply the preceding proposition.

Theorem A.3. Let , let be a monic irreducible polynomial in of degree , and let . Given and the set of its prime factors of cardinality , we may check whether a monic polynomial of degree is irreducible using compositions modulo plus

operations in .

Proof. The polynomial is irreducible if, and only if, and for all prime divisor of . We may thus apply the above corollary.

Theorem A.4. Let , let be a monic irreducible polynomial in of degree , and let . A random irreducible polynomial of degree over may be computed using an expected number of compositions modulo plus an expected number of

operations in .

Proof. We appeal to Ben-Or's algorithm [13, Algorithm 14.40], but use modular composition instead of the usual modular powering. Let be a monic random polynomial of degree over . For from to we compute and stop as soon as . In this case is reducible, since it admits an irreducible factor of degree (that divides ). If all the are then is necessarily irreducible.

With the notations from Algorithm A.1, the modulo are computed as follows. First, we compute and using operations in . Second, we obtain and with compositions modulo and

additional operations in , by Proposition A.1.

Then we compute of bi-degree such that

Notice that . In order to compute from we write

with , so we are led to compute for all and then over . The overall computation of requires compositions modulo plus operations in . If the smallest degree of the irreducible factors of is , then testing the irreducibility of takes compositions modulo plus

In average, random trials for are necessary to find an irreducible one, and the average value of the smallest degree is . Therefore the total expected cost is

Taking , the expected cost in Theorem A.4 rewrites into compositions modulo and field operations in (thus, without using the Kedlaya–Umans algorithm). In favorable cases studied in the present paper, we may assume that one composition modulo amounts to say operations in for a fixed, small number . The expected cost then reduces to operations in .

As a first comparison, Shoup's algorithm [45] needs expected operations in (still without using the Kedlaya–Umans algorithm). For bounded values of and large , the availability of a fast algorithm for composition modulo makes it possible to improve on Shoup's algorithm, as was already observed in [28]. In this specific case, Theorem A.4 indeed leads to a bit cost instead of with Shoup's algorithm.

As a second comparison, let be a function such that in the neighborhood of infinity. Based on Kedlaya and Umans' algorithm, Couveignes and Lercier designed an algorithm [9] that builds an irreducible polynomial of degree over in time , when ignoring the cost of conversions to put elements of into their standard representation in . To be fair, we may assume fast modular composition for , so the expected bit cost of Theorem A.4 further reduces to . So roughly speaking, if , then the Couveignes–Lercier algorithm is asymptotically faster; if , then Theorem A.4 is competitive. A comparison with Shoup's algorithm [45] would not be fair here unless enhancing it with the Kedlaya–Umans algorithm; for large , Shoup's algorithm [45] also remains competitive, of course.

Similar observations hold concerning the complexity of the factorization of polynomials of degree in . In fact, Kaltofen and Shoup, still in [28], applied their technique (namely Algorithm A.1) to reduce the cost of the Cantor–Zassenhaus factorization algorithm mostly to compositions modulo and to . For bounded and large , fast composition modulo a fixed modulus is therefore particularly relevant to factorization. We intend to work out the details of this application in a forthcoming paper.

## Bibliography

[1]

D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.

[2]

A. Bostan, Ph. Flajolet, B. Salvy, and É. Schost. Fast computation of special resultants. J. Symbolic Comput., 41(1):1–29, 2006.

[3]

A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.

[4]

J. V. Brawley and L. Carlitz. Irreducibles and the composed product for polynomials over a finite field. Discrete Math., 65(2):115–139, 1987.

[5]

R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.

[6]

P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.

[7]

D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.

[8]

S. D. Cohen. On irreducible polynomials of certain types in finite fields. Proc. Camb. Phil. Soc., 66(2):335–344, 1969.

[9]

J.-M. Couveignes and R. Lercier. Fast construction of irreducible polynomials over finite fields. Israel J. Math., 194(1):77–105, 2013.

[10]

L. De Feo and É. Schost. Fast arithmetics in Artin-Schreier towers over finite fields. J. Symbolic Comput., 47(7):771–792, 2012.

[11]

L. E. Dickson. Linear Groups: With an Exposition of the Galois Field Theory. Dover Publication Inc., 1958.

[12]

J. Doliskani and É. Schost. Taking roots over high extensions of finite fields. Math. Comp., 83(285):435–446, 2014.

[13]

J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 3rd edition, 2013.

[14]

J. von zur Gathen and G. Seroussi. Boolean circuits versus arithmetic circuits. Inform. and Comput., 91(1):142–154, 1991.

[15]

J. von zur Gathen and V. Shoup. Computing Frobenius maps and factoring polynomials. Comput. Complexity, 2(3):187–224, 1992.

[16]

B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Alg. Eng. Comm. Comp., 27(3):237–257, 2016.

[17]

G. H. Hardy and E. M. Wright. An introduction to the theory of numbers. Oxford University Press, Oxford, 6th edition, 2008.

[18]

D. Harvey and J. van der Hoeven. Faster integer and polynomial multiplication using cyclotomic coefficient rings. Technical report, ArXiv, 2017. http://arxiv.org/abs/1712.03693.

[19]

D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.

[20]

J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.

[21]

J. van der Hoeven. Fast composition of numeric power series. Technical Report 2008-09, Université Paris-Sud, Orsay, France, 2008.

[22]

J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://hal.archives-ouvertes.fr/hal-01403810.

[23]

J. van der Hoeven and R. Larrieu. The Frobenius FFT. In M. Burr, editor, Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '17, pages 437–444, New York, NY, USA, 2017. ACM.

[24]

J. van der Hoeven and G. Lecerf. Composition modulo powers of polynomials. In M. Burr, editor, Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '17, pages 445–452, New York, NY, USA, 2017. ACM.

[25]

J. van der Hoeven and G. Lecerf. Modular composition via complex roots. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01455731.

[26]

Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.

[27]

F. Johansson. A fast algorithm for reversion of power series. Math. Comp., 84:475–484, 2015.

[28]

E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.

[29]

E. Kaltofen and V. Shoup. Subquadratic-time factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.

[30]

K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.

[31]

K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.

[32]

M. K. Kyuregyan. Recurrent methods for constructing irreducible polynomials over of odd characteristics. Finite Fields Appl., 9(1):39–58, 2003.

[33]

M. K. Kyuregyan. Recurrent methods for constructing irreducible polynomials over of odd characteristics, II. Finite Fields Appl., 12(3):357–378, 2006.

[34]

P. Lairez and T. Vaccon. On p-adic differential equations with separation of variables. In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '16, pages 319–323, New York, NY, USA, 2016. ACM.

[35]

F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 296–303, New York, NY, USA, 2014. ACM.

[36]

G. Lecerf. On the complexity of the Lickteig–Roy subresultant algorithm. J. Symbolic Comput., 2018. https://doi.org/10.1016/j.jsc.2018.04.017.

[37]

G. L. Mullen and D. Panario. Handbook of Finite Fields. Discrete Mathematics and Its Applications. Chapman and Hall/CRC, 2013.

[38]

A. K. Narayanan. Fast computation of isomorphisms between finite fields using elliptic curves. Technical report, arXiv:1604.03072, 2016. https://arxiv.org/abs/1604.03072.

[39]

[40]

M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.

[41]

M. O. Rabin. Probabilistic algorithms in finite fields. SIAM J. Comput., 9(2):273–280, 1980.

[42]

V. Ramaswami. On the number of positive integers less than and free of prime divisors greater than . Bull. Amer. Math. Soc., 55:1122–1127, 1949.

[43]

P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.

[44]

J.-A. Serret. Cours d'algèbre supérieure, volume 2 of Les grands classiques Gauthier-Villars. Jacques Gabay, 4th edition, 1992. Réimpression du tome second de la 4 édition du Cours d'algèbre supérieure de Joseph-Alfred Serret, publiée par Gauthier-Villars en 1879.

[45]

V. Shoup. Fast construction of irreducible polynomials over finite fields. J. Symbolic Comput., 17(5):371–391, 1994.

[46]

V. Shoup. Efficient computation of minimal polynomials in algebraic extensions of finite fields. In Proceedings of the 1999 International Symposium on Symbolic and Algebraic Computation, ISSAC '99, pages 53–58, New York, NY, USA, 1999. ACM.