
Modular composition is the problem to compute the composition of two univariate polynomials modulo a third one. For polynomials with coefficients in a finite field, Kedlaya and Umans proved in 2008 that the theoretical bit complexity for performing this task could be made arbitrarily close to linear. Unfortunately, beyond its major theoretical impact, this result has not led to practically faster implementations yet. In this article, we explore particular cases of moduli over finite fields for which modular composition turns out to be cheaper than in the general case. In the most favourable cases, our algorithms achieve quasilinear costs. 
Let be an effective field, and let be polynomials in . The problem of modular composition is to compute modulo . Modular composition is an important problem in complexity theory because of its applications to polynomial factorization [27, 28, 29]. It also occurs very naturally whenever one wishes to perform polynomial computations over inside an algebraic extension of . Given two different representations of an algebraic extension of , the implementation of an explicit isomorphism also boils down to modular composition.
In this paper, we study the problem of composition modulo a fixed polynomial mostly in the case when is a finite field. We assume that is separable; the case of moduli of the form is studied in a separate paper [22]. Our results are based on the following simple observation: if a factorization is known, then composition modulo reduces to composition modulo the with . Curiously, this observation does not seem to be exploited in the standard literature on modular composition. In the case when is irreducible over , but admits a non trivial divisor , then the second crucial observation is that factors over . We may then apply the first observation in order to obtain an efficient algorithm for composition modulo . Finding the factorizations of over can be quite expensive in general, but such computations can be regarded as precomputations if the modulus is fixed.
Besides modular composition, we also study the related problem of computing the characteristic polynomial of modulo . More precisely, we understand to be the characteristic polynomial of the multiplication endomorphism by in . In particular, we have modulo . Theoretically speaking, asymptotically fast algorithms for these tasks are due to Kedlaya and Umans [28, 29]. The advantage of our new algorithms lies in their practical efficiency.
Denote by the number of operations in required to multiply two polynomials of degrees in . Let and be polynomials in of degrees , and . The naive modular composition algorithm takes operations in . In 1978, Brent and Kung [5] gave an algorithm with cost . It uses the babystep giantstep technique due to Paterson and Stockmeyer [37], and even yields a subquadratic cost when using fast linear algebra (see [26, p. 185]). The constant is such that a matrix over may be multiplied with another rectangular matrix in time . The best current bound is due to Huang and Pan [24, Theorem 10.1].
When linear algebra benefits from very fast implementations, its contribution is expected to be smaller than the other polynomial operations, on a certain bounded range for . In fact, for fixed values of and , the sizes of the “baby” and “giant” steps may be optimized in order to balance cost contributions of matrix and polynomial operations (this was studied for finite fields in an unpublished preprint of Shoup and Smolensky in 1992). Further improvements have been proposed in [25], based on the Lagrange inversion formula for the reversion of formal power series.
A major breakthrough has been achieved by Kedlaya and Umans [28, 29] in the case when is the finite field . For any positive , they have shown that the composition modulo can be computed using bit operations. Unfortunately, it remains a major open problem to turn this theoretical bit complexity bound into practically useful implementations.
In the special case of power series composition (i.e. when ), the best known complexity bound is again due to Brent and Kung: in [5], they showed that this requires operations in , under the condition that is invertible and that the characteristic is at least , where . The variant proposed by van der Hoeven [18, section 3.4.3] removes the condition on . For fields of small characteristic, Bernstein [1] proposed an algorithm that is softly linear in the precision but linear in the characteristic. These algorithms are generalized to moduli of the form in [22]; we show there that the composition reduces to one power series composition at order in , plus compositions modulo , and one characteristic polynomial computation modulo . Let us finally mention that series with integer, rational or floating point coefficients can often be composed in quasilinear time as well in suitable bit complexity models, as shown by Ritzmann [40]; see also [19].
The expression is linear in . It is well known that the transposition of the application corresponds to the power projection task (see section 2.7), which is an important ingredient for computing minimal and characteristic polynomials. In [43], Shoup studied the computation of minimal polynomials in algebraic extensions of the form or , explicitly given by defining polynomials. He designed fast practical algorithms with low memory consumption, built from the smart combination of “babystep giantstep” to transposed algorithms. However his method does not improve upon the one of Brent and Kung from the asymptotic complexity point of view.
The characteristic polynomial of modulo may be obtained from suitable power projections of modulo thanks to the wellknown Newton–Girard identities, which involve solving a first order differential equation to precision . This is rather elementary when the characteristic is zero or sufficiently large. Otherwise, adic arithmetic is needed. A complete solution is described in [15]. More generally, a framework for using adic arithmetic to solve ordinary differential equations in positive characteristic may be found in [32].
The aim of this article is to achieve practical speedups for modular composition and the computation of characteristic polynomials. Most of the new results are derived from the following simple observation: if splits into linear factors in , and if its roots are given, then modular composition basically reduces to evaluating at the roots of , evaluating at these values of , and interpolate . It is wellknown that each of these steps can be done in softly linear time using multiple point evaluation and interpolation. More generally, whenever can be factored into , the computation of reduces to the computations of for .
Of course, the existence of factorizations of heavily depends on itself and on fields over which we allow ourselves to factor . For instance, if , then we might consider computing the roots of in using a sufficient precision, or factoring over the adic numbers for some well chosen prime number . If is a finite field, and is an irreducible polynomial of composite degree , then we may consider factorizations over the intermediate field . In a separate paper, we study the case when is the field of computable complex numbers [23]. In this paper, we mainly focus on the finite field case.
Whether the approach leads to competitive algorithms for modular composition also depends on the question whether we require the factorization of to be part of the complexity or not. Indeed, if we are doing a large polynomial computation over in the algebraic extension , then we typically need to perform many modular compositions for different and , but for a fixed modulus . In such cases, it is reasonable to regard the factorization of as a precomputation. Furthermore, if we want to perform computations in a large finite field extension and if we are free to select a suitable representation for elements of this finite field, then we may build a modulus with using dedicated algorithms; these algorithms are much faster than finding an irreducible modulus at random. In fact, testing the irreducibility of in reduces to modular compositions in degree over , plus bit operations (see [29, section 8.2], based on Rabin's algorithm [38]).
In section 2, we begin with revisiting known techniques. We introduce cost functions for modular composition, power projection, and the computation of characteristic polynomials. In section 3, we examine the benefit for modular composition when a factorization of in is given. More precisely, if the irreducible factorization of the modulus is available, then our method reduces the composition modulo to several compositions modulo in softly linear time. A key ingredient, reused several times in the article, is the simultaneous computation of characteristic polynomials and modular compositions.
In section 4, we turn our attention to the specific situation of an irreducible modulus of composite degree over a finite field . We show how to exploit the existence of factorizations of over the intermediate fields into factors of degree .
The natural generalization to degrees with will be the subject of sections 5, 6 and 7. One important question is how to represent the elements of the intermediate fields and it is convenient to introduce the special concept of an effective algebraic tower for this purpose. We also introduce the notion of a composition tower for , which formalizes the requirement that we are given factorizations of over each of the intermediate fields . In section 5, we generalize the algorithm from section 4 to arbitrary composition towers. In section 6 we also give a detailed complexity analysis in the case of triangular towers when each intermediate field admits the form for suitable .
Section 7 is dedicated to primitive towers, in which case each is generated by a single primitive element over . If the are pairwise coprime (see section 7.4), then the field can be taken to be the composed product of , and computations in the tower become particularly efficient: in the case when is “supersmooth” (in the sense that the largest primepower divisor of satisfies ), we will show that composition modulo can be done in quasilinear time (modulo precomputations). In this very particular situation, we notice that our method outperforms Kedlaya–Umans' algorithm. If and is small, then a similar result holds when using so called Artin–Schreier towers (see section 7.5). For general smooth , one may also consider nested towers (see section 7.3) for which the primitive elements are compositions of polynomials of degrees over . The existence of such towers for given and is an interesting open problem, with a generally positive answer in practice.
Our main complexity bounds for modular composition are summarized in Table 1.1. In this table, is a fixed irreducible polynomial of degree and . The entries correspond to the various types of towers that are considered in sections 6 and 7, while assuming that all necessary precomputations that depend on have been done.
This leaves us with the issue of how to conduct the precomputations. This will be the subject of section 8, where we will analyze the cost of building composition towers of various types. We will see that the construction of composition towers for prescribed composite extension degrees can usually be done fast. On the other hand, building composition towers for a prescribed modulus is of the same order of difficulty as factoring over the intermediate fields , or finding a root of in for a given representation of the elements of . Practical algorithms for this task are then well known but of a cost that is quadratic in .


Table 1.1. Complexity bound for modular composition for various types of towers. 
Recall that an effective ring is a ring with unity whose elements can be represented on a computer and such that we have algorithms for performing the ring operations. Effective fields and effective algebras over an effective ring are defined similarly.
Given an effective ring , algebraic complexity models express running times in terms of the number of operations in . Unless otherwise stated, we will analyze the costs of the algorithms in this paper in this way. More precisely, our results both apply for the straightline program and computation tree models [6, chapter 4].
For randomized algorithms over a finite effective ring , we assume a special instruction that uniformly generates random elements in . For simplicity we assume that this instruction has constant cost. The expected cost of a randomized algorithm and a given input is the average cost taken over all the possible executions.
When working over the finite field with elements, we may also analyze the costs of algorithms in the bit complexity model, which relies on Turing machines with a sufficient number of tapes [36]. We will not explicitly consider this model in our paper, but most of our complexity bounds can easily be converted to this setting.
Let be an effective ring with unity, let , and denote
Given a polynomial or power series and , it is convenient to write
We write for a cost function such that two polynomials in can be multiplied using operations in . The schoolbook algorithm allows us to take . The fastest currently known algorithm [7] yields . Here, the softOh notation means that (we refer the reader to [12, chapter 25, section 7] for technical details). If is a field of finite characteristic, then it has been shown [17] that , where denotes the iterated logarithm function. In what follows, we will always assume that is an increasing function in . This customary assumption implies the superadditivity of , namely for all and .
More generally, if is an effective algebra, then it is sometimes convient to denote by a cost function such that two polynomials in can be multiplied using operations in .
Let be an effective field. The remainder (resp. quotient) of the euclidean division of by in is denoted by (resp. by ). For a fixed modulus of degree , euclidean divisions by are usually performed by computing a preinverse of . More precisely, is the inverse of in , computed at precision . Given , one obtains the quotient by multiplying with and the remainder as . Given we may thus compute the modular product using operations in .
We recall that the greatest common divisor of two polynomials of degrees at most over can be computed using operations in [12, Algorithm 11.4].
Let and consider points . Then the evaluations can be computed using operations in [12, chapter 10]. This operation is also called multipoint evaluation. The inverse operation is the interpolation, which consists of recovering from ; it can be performed with a similar cost. If the are fixed, then it is often possible to gain a factor using FFT trading [20].
More generally, if are polynomials with , then all the remainders can be computed using operations in . The inverse problem, called Chinese remaindering, again admits the same complexity .
Let be an effective ring. Given a bivariate polynomial , we define its bidegree to be the pair with and . Using Kronecker substitution [12, chapter 8, section 4], two polynomials of bidegree may be multiplied using ring operations in .
If is a monic polynomial of degree in , and if and are two polynomials in then their preimages may be multiplied in with operations in before being projected into with additional operations. Consequently each ring operation in in degree reduces to operations in . If is monic in then also takes operations in .
The computation of bivariate subresultants usually relies on fast evaluation/interpolation, as in the following well known proposition.
However for the sake of generality we will rely on the following result.
Let be the finite field with elements. One way to represent elements of a finite field extension is as remainder classes of polynomials in modulo a monic reducible polynomial of degree . We write in order to emphasize that we use this representation. Multiplication in can be done using operations in . Given an element of degree over , we write for the subfield of generated by over , where we understand that elements in are represented as evaluations of polynomials in at .
For the bulk of the algorithms in this paper, we will work over the finite field . In that case, it can be shown that two polynomials in can be multiplied in time on a Turing machine with a sufficient number of tapes [17]. The algebraic complexity bounds in this paper are easy to adapt to this model: it mainly suffices to replace by in all bounds.
The constant represents a feasible exponent for the multiplication cost of matrices: two square matrices of size can be multiplied using operations in their coefficient ring. The constant is defined similarly but for multiplying a matrix by a rectangular one. At present time the best known bound is due to Le Gall [33]. This naturally yields . However the latter bound does not improve upon the earlier bound due to Huang and Pan [24, Theorem 10.1].
Let be an effective ring. Let an effective algebra of dimension whose elements are represented by vectors of size in a given basis. We introduce the following cost functions:
: the cost of computing the modular composition , where is a monic polynomial of degree , and .
: the cost of computing the modular composition in terms of operations in , where is a monic polynomial of degree , and .
: the cost of computing the characteristic polynomial of modulo a monic polynomial of degree . This is the characteristic polynomial of the multiplication endomorphism by in .
: the cost of modular power projections, i.e. the cost to compute , where is monic of degree and is a linear form on .
Let us recall a few known results about these functions.
operations in , or
or operations in .
For a fixed monic polynomial in of degree , the modular composition is a linear operation in . For and of degrees , the corresponding transposed application is precisely the operation of modular power projections. If a modular composition algorithm with cost can be transposed in the sense of [3], then this leads to a power projection algorithm with cost .
operations in , including divisions in (the partial division in is supposed to be implemented), or
operations in , if is a field, or
operations in , if is a field with elements, or
operations in , if there exist given inverses of in .
Let be an effective algebraically closed field. A monic polynomial is said to be separable if . Since is algebraically closed, this implies that admits pairwise distinct roots in , and we may use the following algorithm for composition modulo :
Algorithm
Compute using fast multipoint evaluation.
Compute using fast multipoint evaluation.
Retrieve with using fast interpolation.
Return .
Let again be a general effective field. The algorithm from the previous section may be generalized to composition modulo a polynomial that can be factored partially as in , where the polynomials are pairwise coprime (although not necessarily irreducible).
Algorithm
Use a multiremainder algorithm to compute , for all .
For all , compute the characteristic polynomial of modulo .
Use a multiremainder algorithm to compute , for all .
For all , perform the modular composition .
Use Chinese remaindering to compute in of degree such that for all .
Return and .
Proof. For all , the Cayley–Hamilton theorem gives us , which implies , whence the correctness of . The correctness of the characteristic polynomial of follows from the usual isomorphism of algebras .
Example
Let be a finite field of characteristic such that with and odd. Let be an element of order . Let be a multiple of having all its prime factors dividing but not . Then the polynomial factors into irreducible polynomials of degrees . The irreducible factors may be described explicitly.
For example, with , , , , , , the polynomial factors into irreducible polynomials of degree . Taking instead leads to and .
Let still be an effective field and assume that we wish to compute a modular composition , where and is monic. Let us study what happens if the polynomials and to be composed have degrees larger than . We clearly have
and we may compute using operations in [12, Exercise 9.16]. Without loss of generality we may therefore assume that .
If exceeds , then it suffices to perform modular compositions:
This requires additional compositions modulo in size , plus operations in .
Alternatively, given a polynomial with , the following formula provides us with a more efficient way to reduce the degree of :
Taking to be the characteristic polynomial of modulo , its computation usually admits a similar cost as modular composition. Therefore it is worth using this method unless . This is actually one key ingredient for the upcoming algorithms for modular composition: in order to reduce a composition modulo to compositions modulo a factor of , we in particular need to compute the characteristic polynomial of modulo . At the end of the recursive calls, one should nevertheless keep in mind that we only need annihilating polynomials, so that we may also use minimal polynomials. Shoup has given a probabilistic algorithm for computing minimal polynomials [43], which is useful for actual implementations.
In the case when we wish to compute a composition modulo an irreducible polynomial , we cannot apply the algorithms from sections 3.1 and 3.2. Nevertheless, it might happen that admits a non trivial factorization over an algebraic extension of . This generically happens when is a finite field and is composite. Indeed, we recall the following well known result.
We call factorizations as in this proposition “normal factorizations”. This concept can actually be defined over arbitrary fields, as follows. Let be a monic separable polynomial in , let be a divisor of , and let be a monic separable irreducible polynomial of degree . We set , and write for the class of in . For all roots of in , we write for the map from to that sends to . We say that admits a normal factorization over if there exists a bivariate polynomial that is monic in , of bidegree , and such that factors into over , with and coprime whenever . We call the normal factor of over . Notice that the polynomials and are not required to be irreducible here.
Example
where are the two roots of in . More precisely for we may take the class of in , whereas . The normal factor of is thus .
Example
Assume that admits a normal factorization as above. Then the Chinese remainder theorem yields a natural isomorphism
and we may define as the unique polynomial in that satisfies and . We may now adapt the algorithm from section 3.2 as follows:
Algorithm
Compute the remainder in .
Compute the characteristic polynomial of modulo in .
Compute in .
Perform the modular composition in .
Return and .
operations in .
Proof. We first observe that
It follows that , whence is computed correctly. As to the characteristic polynomial of , the argument is the same as for Algorithm 3.2, thanks to the Poisson formula .
Corollary 4.5 already illustrates the potential of our ability to factor non trivially over an extension field . This idea can be pushed even farther whenever the factors with can be factored recursively over a tower of extension fields of . In order to carry out this generalization, we first need to decide how to compute with elements in the successive fields of such a tower. Instead of privileging particular representations, we rely on the abstract concept of an effective tower.
with the following properties:
Each field comes with a specific way to represent its elements and algorithms for the field operations.
For , the field is a finite separable algebraic extension of , and we have precomputed an element along with its minimal polynomial over , such that and . We set .
For , we have algorithms for computing the natural bijection , given by
and its inverse . We call and the upward and downward conversions at level . The coefficientwise extensions of and yield mappings and that we still denote by and .
We denote by the cost of multiplying two polynomials in in terms of the number of required operations in . Similarly, we write for the cost of inverting an element in in terms of the number of operations in . We always assume that additions and subtractions can be done in linear time. We also let upper bound the costs of both the upward and downward conversions at level .
Let us now return to our particular modulus and assume that its degree admits the factorization with for . If , then Algorithm 4.1 shows how to reduce modular composition modulo to composition modulo a polynomial over of degree , provided a normal factor of over exists and is given. In order to generalize this idea to the case when , it is useful to introduce the concept of a composition tower.
We let , and for each , we have precomputed a monic normal factor of . We let .
For , we have the isomorphism
where is a shorthand for and we assume we have precomputed a polynomial such that
Given monic and separable along with a composition tower , we may now apply Algorithm 4.1 recursively. Unrolling the recursive calls yields the following algorithm for modular composition.
Algorithm
operations in .
Proof. Since for , the computation of in step 2 can be done in time . The computation of the remainder of its division by can be done in time . Step 2 therefore amounts to
operations in , and similarly for step 5.
The computation of the resultant in step 4 can be performed in time by Proposition 2.2. It follows that the complete step 4 requires
operations in . In step 7, the naive evaluation of at modulo using Horner's method requires operations in . Consequently, step 7 requires
Remark
In step 3, we compute to be the characteristic polynomial of modulo over .
In step 6, we compute in .
These computations lead to an additional term in the complexity bound of Theorem 5.3.
Assume that we are given a tower
of finite fields such that for each . One obvious way to represent an element of is to write it as , where is a polynomial with . An effective tower that uses this representation for the elements of the fields is called a triangular tower. For such towers, the costs of the upward and downward conversions are zero. Throughout this section it will be convenient to make the relatively harmless assumption that for all effective rings . We always assume available the following precomputed data:
For all , the preinverse of .
for all and .
Proof. Let us first show how to reduce polynomial multiplication over to polynomial multiplication over . So consider two polynomials and in , represented as polynomials in evaluated at . We may compute their product in as follows: we first substitute in and , which yields two polynomials . We next compute their product in . Now for each . Since each remainder can be computed using two multiplications of elements in using the preinverse of , we obtain
for a sufficiently large constant independent of . On the other hand, applying Karatsuba's trick, there exists a constant such that holds for all .
If then we havewhich yields
We claim that
The proof is done by induction assuming the inequality holds for :
Remark
Proof. By Proposition 2.2 the inverse of an element in takes operations in . Combined with the previous lemma, we obtain
operations in .
Proof. By Lemma 6.1 we have
Then Lemma 6.3 gives
Recall that an integer is said to be smooth whenever all its prime factors are at most .
Proof. We appeal to the previous lemma to construct the integer sequence for which we precompute a triangular decomposition tower for . The cost of Proposition 6.4 simplifies to
Notice that minimizes the latter exponent, which leads to the cost provided that is smooth.
Remark
An interesting application of Proposition 6.4 concerns cyclic moduli in , where is a prime number different from the characteristic .
Proof. The modulus is separable, and the precomputations first involve the irreducible factorization of into , whose respective degrees divide .
Since is smooth, each is smooth, where . Consequently, for the modulus , the cost of Corollary 6.6 simplifies to
Proposition 6.4 shows that the overhead of triangular set arithmetic rapidly grows with the height of the tower. In this section we consider an alternative representation for elements in the fields . This representation allows for faster multiplication inside the fields , but the upward and downward conversions may become more expensive. One major goal of this section is to provide a more precise analysis of the cost of these conversions and to isolate particular situations in which they can be computed fast.
An effective tower with is said to be primitive if for each . In that case, we assume that we precomputed the minimal polynomial of each over . It follows that
for . On the other hand, the upward and downward conversions are more expensive than in the case of triangular towers. The following consequence of (7.1), (7.2) and Theorem 5.3 will be of frequent use.
operations in .
Proof. We have
For computing with arbitrary primitive towers, we recall that we always assume available the following precomputed data:
For all , the minimal polynomial of over ,
For all , the minimal polynomial of over ,
For all , the polynomial expression of in terms of over .
Assume that for some arbitrary primitive element . For a natural morphism for algebras and , let denote the cost of applying the morphism once in terms of the number of required operations in .
Proof. Let with . We may convert into with using operations in . Then we use the precomputed polynomial with , and also the minimal polynomial of over , which has degree . We now compute using Horner's method. This requires operations in . The evaluation is the natural image of in . This proves (7.3).
Proof. It will be convenient to use the following abbreviations:
Let be a constant such that
in the previous lemma and let us show by induction over that
For , we have , so all possible conversions are trivial and (7.7) holds. Now assume that the result holds until . Let us first consider the case when . Then (7.6) yields
Since , this also deals with the case when . If , then (7.5) yields
operations in .
Proof. From Corollary 7.4 we deduce
(7.8) 
Comparing to Proposition 6.4, using primitive towers thus turns out to be more efficient than using triangular towers, although it requires more precomputations. Therefore the costs for the two particular cases studied in sections 6.2 and 6.3 may be revisited and slightly improved.
We say that a primitive tower with is a nested tower, if there exist with , , , and coprime, and
Setting and , this also means that
so that has degree . For this specific type of towers we require the following precomputations:
and for and , where is the smallest power of two above .
for all .
Proof. Consider an element with . Let be the smallest power of two above . Then we may compute so that . This computation follows the natural “divide and conquer” strategy: given , we split it into with , we compute recursively and , and deduce
We may end the recurrence when , which incurs a cost that is repeated times in total. For each depth of the recursive calls the total cost remains operations in . The overall cost of the method is . The computation of the inverse of requires additional operations in .
Conversely, let with , and let be the smallest power of two above . We compute and look for an expansion of the form
where the . We again use the “divide and conquer” strategy:
Compute , and recursively compute the expansion
Compute , and recursively compute the expansion
operations in .
Proof. Lemma 7.6 implies
(7.9) 
The above corollary makes nested composition towers extremely attractive from a complexity point of view. The existence of such towers only depends on and the degrees , but not on . A practical way to construct nested towers is to pick random monic of degrees and to check that is irreducible for each . We repeat this process for random choices of until we find a suitable tower. From a heuristic point of view, we will show below that the probability that we eventually obtain a nested tower is nonzero in most cases of interest. From a theoretical point of view, the existence problem of nested towers remains an interesting problem.
In order to analyze the probability that random choices of provide us with a nested tower, we rely on
the fact that a random polynomial over of degree is irreducible with probability ;
the heuristic assumption that is again random for .
In this framework, the probability that is irreducible for each is given by
On the other hand, if has cardinality , then we have
possible choices for the tuple . For a fixed value of , we maximize by taking . Setting , we then have
The existence of a nested tower over with extension degrees is likely whenever . Taking logarithms, this happens as soon as
The algorithm for finding needs runs before finding a suitable tower. An obvious optimization is to make a better use of successful guesses of for which is irreducible for all : instead of starting everything over after one unsuccessful guess of , we try at least times for some fixed constant . The expected number of guesses then drops to .
Remark
It is an interesting question whether there exist finite fields and sequences for which it is possible to construct monic of degrees such that is irreducible for each . The literature contains specific constructions of nested towers over finite fields based on [8, Lemma 1] which relates the irreducibility of to where . We refer the reader to [35, chapter 3, section 2] for a nice survey. Let us exemplify two useful constructions.
Example
Example
Example
Over finite fields, the situation when the are pairwise coprime may be exploited to construct special towers, relying on the following wellknown lemma:
is irreducible in and of degree .
Remark
Remark
Let be a primitive tower and let , and be as usual. We say that is a composed tower if the are pairwise coprime and if there exist monic irreducible polynomials of degrees such that and for .
In that case, the minimal polynomial of over is given by and for : if were reducible over then would have a proper gcd with which is impossible (see Proposition 4.1).
By construction, we thus have and . For each , let be a root of and , so that . For such composed towers, we assume that the following precomputations have been done for :
,
the trace map of over (as a vector of ),
and its inverse modulo and .
for all .
Proof. Let with . We wish to compute with . For this purpose we first calculate
using operations in . Then
can be computed using additional operations.
Conversely, let with . The direct image with is obtained via Chinese remaindering:
In fact we just verify that . So if we let , then we may calculate
operations in .
Proof. Lemma 7.15 implies
(7.10) 
Given a number , we say that an integer is supersmooth if for every prime power that divides , we have . For instance, the product of the first prime numbers grows as (see for instance [16, chapter 22]) and is therefore supersmooth for sufficiently large . Similarly, the number is supersmooth for every . For a fixed modulus of supersmooth degree, the following corollary shows that modular composition can be done in softly linear time in this specific situation:
Using the composed tower approach, we are left with the question how to deal with algebraic extensions of prime power degree . In the case when coincides with the characteristic of the field , one may use Artin–Schreier towers instead, as outlined below.
An Artin–Schreier polynomial over a field of characteristic , is an irreducible polynomial of of the form . An Artin–Schreier tower of height over is a tower of field extensions where each extension is explicitly constructed from an Artin–Schreier polynomial over .
In order to simplify the presentation, we restrict ourselves to the case when . For the minimal polynomials of the successive extensions, we take
In [9, Theorem 2], it is shown that this defines a primitive tower. The polynomials may be computed using operations in , according to [9, Theorems 12]. We assume that the following precomputations have been done for (see [9, end of section 4]):
the trace map on over in the canonical basis,
.
The costs of these precomputations are and , respectively. We now have the following complexity bound for the upward and downward conversions.
for all .
operations in .
Proof. Lemma 7.18 implies
Let be a given finite field and let be a given monic irreducible polynomial of degree over . In this section we consider the task of constructing composition towers for . In fact, we may distinguish three different problems of increasing complexity:
Building an algebraic tower with the prescribed extension degrees ;
Building a composition tower for some monic irreducible of degree ;
Building a composition tower for the prescribed modulus .
Moreover, one may study these problems for each type of towers that we have encountered so far. From now on, we drop the study of triangular towers, for simplicity and because primitive towers are more efficient anyway. Given an effective tower, we notice that the construction of a primitive tower still requires computing the minimal polynomials over of the . In general, it is not known how to do this in quasilinear time without using the modular composition algorithm by Kedlaya and Umans.
The traditional solution to the first problem involves computing irreducible polynomials of “small” degrees over “large” field extensions . The number of operations in then grows with , even with the fastest known algorithm of Shoup [42]. In total this leads to at least a quadratic cost in to build an effective tower. In [13, Proposition 4.6] von zur Gathen and Seroussi proved the lower bound for factoring polynomials of degree over in the arithmetic circuit model. Consequently the quadratic cost in might be difficult to decrease in general. Theorem 8.2 below shows how to achieve a cost that is quasilinear in in the case when .
In this section we mainly focus on the efficient construction of primitive composition towers for prescribed . We first describe the general algorithm and then explain possible speedups for nested, composed, and ArtinSchreier towers. Altogether, this yields an efficient answer to the second problem. Notice that composition towers with no prescribed modulus are sufficient if we merely need a representation for the finite field such that polynomials in can be evaluated efficiently at points in .
The third problem is more difficult. So far we have not been able to apply the techniques of this paper to obtain more efficient solutions, even when is very smooth or in the extremely favourable case of Artin–Schreier towers. In practice, a straightforward strategy for the construction of composition towers for is to factor over all intermediate fields using standard available algorithms. This is discussed at the end of the section.
The naive construction of a primitive composition tower with prescribed extension degrees proceeds by induction. Assume the tower is built up to height . We first construct an irreducible polynomial in and set . If is “sufficiently random”, then the class of in is a primitive element over . Its minimal polynomial over may simply be obtained as where is the natural preimage with bidegree that satisfies . The latter resultant requires roughly operations in when using the best known algorithms. To complete the decomposition tower of height , it remains to compute the sequence of normal factors of over . Factorization algorithms based on modular composition [29] can achieve this in time , roughly speaking.
In fact, we will show how to build primitive composition towers far more efficiently. We still proceed by induction. The composition towers naturally share the same underlying effective subtowers of . More precisely, the subtower of height consists of the fields and forms a composition tower for ; the successive normal factors are written , and the auxiliary polynomials of Definition 5.2 are written .
We begin with constructing an irreducible polynomial of degree , so the first primitive tower is made of , and it is a composition tower for with normal factor . The auxiliary polynomial satisfies .
For the second composition tower we construct an irreducible polynomial of degree , which defines as the class of in . If is “sufficiently random”, then is a primitive element of over . We obtain the minimal polynomial of over as , where and . The normal factor of over is , and the one of over is . We clearly have and we obtain from the subresultant of degree in of and (it necessarily exists because is a primitive element of over implies that is separable; this will be detailed below in the general situation). In this way we obtain a decomposition tower of height .
The third tower again requires building an irreducible polynomial . This defines and so we have . We assume that generates over and we wish to obtain the normal factorizations of its minimal polynomial . Over and the normal factors are respectively and . For , we use the second composition tower: we compute such that and , then . Then we obtain , where . The auxiliary polynomials and are obtained from the corresponding first subresultants.
For general heights, we use the following algorithm for the construction of composition towers.
Algorithm
Set , , .
For from down to do
Compute over , where ;
Compute the subresultant of degree in of and written , and then set .
Set to the class of in , and let .
Notice that precomputations PREP1 correspond to polynomials ; For PREP2 we have already computed , and is simply ; PREP3 correspond to .
Return the composition tower for made from , , , , and the other precomputed auxiliary data.
operations in .
Proof. By decreasing induction on we prove that is a composition tower for which has degree . This is clear for and . Assume the induction hypothesis holds for . At the end of step 2.a has degree and is the minimal polynomial of over . Since is a normal factor of over , the degree of the ideal generated by is .
Since is separable it belongs to the Gröbner basis of for the lexicographic order induced by . In particular a polynomial with leading monomial belongs to . This proves that the subresultant of degree of and is nonzero, and that is well defined, which implies the requested conditions:
The induction hypothesis is thus satisfied for .
As to the complexity analysis, we first notice that . The conversions in step 2.a therefore take
operations in . The resultant in step 2.a and the subresultant in step 2.b require
expected operations in .
Proof. The tower is constructed by natural successive applications of the Algorithm 8.1. By Equation (7.8) the bound from Proposition 8.1 simplifies to
It remains to take the construction of the into account for . By Theorem A.4 a random polynomial can be computed using an expected number of compositions modulo plus
operations in . Thanks to Corollary 7.5 each composition with may be done using operations in . On the other hand we simply take . The sum over yields
We claim that, with a small uniformly bounded probability, the roots of do not have maximal degree over . If we run into such a bad then it is easily detected by the algorithm since is not separable. In such a case we build an other at random and the average number of failure is bounded.
To prove the claim, we notice that the roots of do not have maximal degree over if, and only if, the conjugates of over are not pairwise distinct. Equivalently this means that the coefficients of belong to a proper subfield of . For each prime factor of the field is a maximal proper subfield of . All maximal proper subfields of are obtained is this way, so there exist at most many of them. The number of monic irreducible factors of degree with coefficients in such a subfield is at most . On the other hand the number of monic irreducible polynomials of degree over is at least (see for instance [12, Lemma 14.38]). Consequently the probability that the roots of do not have maximal degree over is at most
Let us now investigate the consequences of Proposition 8.1 in the particular cases of nested, composed and Artin–Schreier towers. In our analysis, we actually construct all intermediate subtowers along with the composition tower itself. If one just needs the highest tower, then there are cases that can be optimized by exploiting the specificities of the types of towers under consideration.
operations in .
operations in .
operations in .
Proof. The tower is again constructed by natural successive applications of Algorithm 8.1, except that we replace the precomputations in step 3 by those specified in PREA1–2, which amount to . By Equation (7.11) the cost of Proposition 8.1 becomes
Let be an algebraic tower. So far we have shown how to built composition towers for . Let us now explain how to deduce composition towers for a given monic and irreducible of degree . By standard algorithms, we first compute a root of in (for example with Rabin's algorithm; see [12, chapter 14]). Unfortunately we do not know how to exploit a composition tower to compute more efficiently. We are interested in finding a monic normal factor of over each field in the tower. In the extreme case when , we take .
Proof. We have . We compute and by descending induction on from down to . First we have
where . The first subresultant of and is well defined and is invertible modulo . Therefore we have
Remark
We have shown that modular composition and characteristic polynomials for a fixed irreducible modulus can be computed fast if is smooth. From a practical point of view, our algorithms lead to speedups with respect to state of the art methods as soon as is composite and not reasonably large. From a theoretical point of view, the algorithms by Kedlaya and Umans generally outperform the new algorithms. One notable exception occurs when is very smooth, in which case we were able to prove a quasilinear complexity bound: see Corollary 7.17.
Our new algorithms are only more efficient for fixed moduli . Nevertheless, the cost of the required precomputations as a function of is of a similar order of magnitude as one composition modulo without our methods. In other words, if we need to compute compositions modulo the same modulus , then our new methods are already of interest for small values of . This problem of multiple modular compositions occurs in various applications:
Many intriguing questions remain to be answered. One major open problem is whether our new techniques can be used to build composition towers for a prescribed irreducible modulus of smooth degree in quasilinear time. This would give us an unconditional algorithm for composition modulo of quasilinear time complexity. Another natural question is whether there exists an efficient way to reduce general modular composition to composition modulo irreducible polynomials of smooth degree. This question may be related to generalizations of our algorithms to modular composition for multivariate polynomials. Besides these fundamental issues, we finally think that there remains a lot of room for more minor improvements of the techniques in this paper and for working out various applications in more detail.
This appendix is devoted to a fast algorithm for pseudotrace computations designed by Kaltofen and Shoup in [26], together with its application to polynomial factorization over finite fields. We recall the algorithm for completeness and in order to make it work more generally over any finite field .
Algorithm
If then return .
Let , and recursively compute .
Compute .
Compute over .
Compute over .
If is even, then return .
Compute and return , over , and over .
additional operations in .
Proof. The proof proceeds by induction on . The algorithm is clearly correct if . Assume that it is correct for . By linearity of the th power we have
This already proves the correctness by induction when is even. Otherwise and similar computations yield ,
This proves the correctness.
operations in .
operations in .
operations in .
Proof. We appeal to BenOr's algorithm [12, Algorithm 14.40], but use modular composition instead of modular powering. Let be a monic random polynomial of degree over . For from to we compute and stop as soon as . In this case is reducible, since it admits an irreducible factor of degree (that divides ). If all the are then is necessarily irreducible.
With the notations from Algorithm A.1, the modulo are computed as follows. We first compute and using operations in . Then we compute of bidegree such that . Notice that . In order to compute from we write with , so we are led to compute for all and then over . The overall computation of requires compositions modulo plus operations in . If the smallest degree of the irreducible factors of is , then testing the irreducibility of takes compositions modulo plus
In average, random trials for are necessary to find an irreducible one, and the average value of the smallest degree is . Therefore the total expected cost is
[1] D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.
[2] A. Bostan, Ph. Flajolet, B. Salvy, and É. Schost. Fast computation of special resultants. J. Symbolic Comput., 41(1):1–29, 2006.
[3] A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.
[4] J. V. Brawley and L. Carlitz. Irreducibles and the composed product for polynomials over a finite field. Discrete Math., 65(2):115–139, 1987.
[5] R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
[6] P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. SpringerVerlag, 1997.
[7] D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.
[8] S. D. Cohen. On irreducible polynomials of certain types in finite fields. Proc. Camb. Phil. Soc., 66(2):335–344, 1969.
[9] L. De Feo and É. Schost. Fast arithmetics in ArtinSchreier towers over finite fields. J. Symbolic Comput., 47(7):771–792, 2012.
[10] L. E. Dickson. Linear Groups: With an Exposition of the Galois Field Theory. Dover Publication Inc., 1958.
[11] J. Doliskani and É. Schost. Taking roots over high extensions of finite fields. Math. Comp., 83(285):435–446, 2014.
[12] J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 2nd edition, 2003.
[13] J. von zur Gathen and G. Seroussi. Boolean circuits versus arithmetic circuits. Inform. and Comput., 91(1):142–154, 1991.
[14] J. von zur Gathen and V. Shoup. Computing Frobenius maps and factoring polynomials. Comput. Complexity, 2(3):187–224, 1992.
[15] B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Alg. Eng. Comm. Comp., 27(3):237–257, 2016.
[16] G. H. Hardy and E. M. Wright. An introduction to the theory of numbers. Oxford University Press, Oxford, 6th edition, 2008.
[17] D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
[18] J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.
[19] J. van der Hoeven. Fast composition of numeric power series. Technical Report 200809, Université ParisSud, Orsay, France, 2008.
[20] J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://hal.archivesouvertes.fr/hal01403810.
[21] J. van der Hoeven and R. Larrieu. The Frobenius FFT. Technical report, CNRS & École polytechnique, 2017. http://hal.archivesouvertes.fr/hal01453269.
[22] J. van der Hoeven and G. Lecerf. Composition modulo powers of polynomials. Technical report, CNRS & École polytechnique, 2017. http://hal.archivesouvertes.fr/hal01455722.
[23] J. van der Hoeven and G. Lecerf. Modular composition via complex roots. Technical report, CNRS & École polytechnique, 2017. http://hal.archivesouvertes.fr/hal01455731.
[24] Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.
[25] F. Johansson. A fast algorithm for reversion of power series. Math. Comp., 84:475–484, 2015.
[26] E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.
[27] E. Kaltofen and V. Shoup. Subquadratictime factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.
[28] K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.
[29] K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
[30] M. K. Kyuregyan. Recurrent methods for constructing irreducible polynomials over of odd characteristics. Finite Fields Appl., 9(1):39–58, 2003.
[31] M. K. Kyuregyan. Recurrent methods for constructing irreducible polynomials over of odd characteristics, II. Finite Fields Appl., 12(3):357–378, 2006.
[32] P. Lairez and T. Vaccon. On padic differential equations with separation of variables. In Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '16, pages 319–323, New York, NY, USA, 2016. ACM.
[33] F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 296–303, New York, NY, USA, 2014. ACM.
[34] G. Lecerf. On the complexity of the LickteigRoy subresultant algorithm. Technical report, CNRS & École polytechnique, 2017. https://hal.archivesouvertes.fr/hal01450869.
[35] G. L. Mullen and D. Panario. Handbook of Finite Fields. Discrete Mathematics and Its Applications. Chapman and Hall/CRC, 2013.
[36] C. H. Papadimitriou. Computational Complexity. AddisonWesley, 1994.
[37] M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.
[38] M. O. Rabin. Probabilistic algorithms in finite fields. SIAM J. Comput., 9(2):273–280, 1980.
[39] V. Ramaswami. On the number of positive integers less than and free of prime divisors greater than . Bull. Amer. Math. Soc., 55:1122–1127, 1949.
[40] P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.
[41] J.A. Serret. Cours d'algèbre supérieure, volume 2 of Les grands classiques GauthierVillars. Jacques Gabay, 4th edition, 1992. Réimpression du tome second de la 4^{} édition du Cours d'algèbre supérieure de JosephAlfred Serret, publiée par GauthierVillars en 1879.
[42] V. Shoup. Fast construction of irreducible polynomials over finite fields. J. Symbolic Comput., 17(5):371–391, 1994.
[43] V. Shoup. Efficient computation of minimal polynomials in algebraic extensions of finite fields. In Proceedings of the 1999 International Symposium on Symbolic and Algebraic Computation, ISSAC '99, pages 53–58, New York, NY, USA, 1999. ACM.