Accelerated tower arithmetic

Joris van der Hoevena, Grégoire Lecerfb

CNRS (UMR 7161, LIX)

Laboratoire d'informatique de l'École polytechnique

Campus de l'École polytechnique

1, rue Honoré d'Estienne d'Orves

Bâtiment Alan Turing, CS35003

91120 Palaiseau, France

a. Email: vdhoeven@lix.polytechnique.fr

b. Email: lecerf@lix.polytechnique.fr

Preliminary version of December 14, 2018

Nowadays, asymptotically fast algorithms are widely used in computer algebra for computations in towers of algebraic field extensions of small height. Yet it is still unknown how to reach softly linear time for products and inversions in towers of arbitrary height. In this paper we design the first algorithm for general ground fields with a complexity exponent that can be made arbitrarily close to one from the asymptotic point of view. We deduce new faster algorithms for changes of tower representations, including the computation of primitive element representations in subquadratic time.

Keywords: complexity, algorithm, computer algebra, algebraic extension, algebraic tower, triangular set, accelerated tower.

1.Introduction

1.1.Statement of the problem

Let be an effective commutative ring with unity. Here effective means that elements of can be represented by concrete data structures and that we have algorithms for the ring operations, including the zero test. Effective fields are defined in a similar way.

Given a monic polynomial of degree , the quotient ring is an effective ring and fast algorithms exist for the arithmetic operations in . More precisely, counting in terms of operations in , additions in require additions in , whereas multiplications can be done [15] in almost linear time . Here is a standard notation for the cost of multiplying two univariate polynomials of degree : see section 2.2.

Given another monic polynomial , we may build the effective ring and fast arithmetic operations in are available for the same reason. Doing so inductively, we obtain a tower of extensions of , with for . Such a tower is written and we call its height. Throughout this paper, we write for the class of in , we let , and . The are called the defining polynomials of the tower and its degree. Elements of are naturally represented by univariate polynomials in over of degrees . If all are fields, then we write instead of and instead of , for convenience.

Towers of this type naturally arise in several contexts of applied algebra, including cryptography (for instance in [2, 3, 18]), error correcting codes (for instance in [26]), and the resolution of differentially algebraic systems in the more general setting of triangular sets (for instance in [1, 12]).

By induction over , basic arithmetic operations in can be naively performed in time , for some constant . But it is well known that, for a sufficiently large constant , these operations can actually be carried out in time by means of Kronecker substitution (see for instance [27, Chapter 8]). However, if many of the individual degrees are small, then can become as large as , which leads to an overall complexity bound of the form . The main aim of this paper is to prove the sharper bound in the case when is a field and the are irreducible and separable. The top level complexity result is stated in Corollary 4.11.

In order to simplify the presentation of complexity bounds, we often use the soft-Oh notation: means that ; see [27, Chapter 25, section 7] for technical details. The least integer larger or equal to is written ; the largest one smaller or equal to is written . The -module of polynomials of degree is written .

1.2.Modular composition and related problems

One essential tool for our new algorithms is modular composition. Before returning to our main problem, let us recall several useful existing results on this problem and various related topics.

Let be as above and assume that , whence . Given the computation of the remainder of the Euclidean division of by is called the problem of modular composition. It is equivalent to the problem of evaluating at a point . Currently, the best known solution to this problem is due to Brent and Kung [13, 56] and requires operations in . Here the constant , with , denotes an exponent such that a rectangular matrix can be multiplied with a square matrix with operations in . Huang and Pan showed in [40] that one may take . Brent and Kung's algorithm is based on the baby-step giant-step technique [50]; we will recall and generalize it in our section 3.

For a fixed point , the evaluation map

is linear. The dual or transposed map is given by

and sends an -linear functional to the vector . The computation of this transposed map is called the problem of modular power projection. Using the transposition principle (see [10] and historical references therein), the cost of power projection and modular composition are essentially the same. In particular, the computation of the traces corresponds to one modular power projection.

Given an element , the characteristic polynomial of is the characteristic polynomial of the multiplication endomorphism by in . If is a field, then this polynomial can be computed in softly quadratic time by means of a resultant (see for instance [45, Corollary 29]). Shoup has also designed a practical randomized algorithm to compute minimal polynomials for the case when has sufficiently many elements, with an expected complexity of .

The fastest known method to compute the characteristic polynomial of an element over proceeds as follows: first compute the traces of the powers of for all using modular power projection, and then recover by integrating the differential equation

(1.1)

where represents the reverse polynomial of . This integration requires to have the inverses of in at our disposal. Historically, this method goes back to Le Verrier [43], and formula (1.1), often called the Newton–Girard formula, expresses the relationship between symmetric and power sum polynomials. The integration step takes softly linear time: the seminal contributions are due to Brent and Kung [13], generalizations can be found in [8], the first efficient extension to finite fields has been designed in [9], and we refer the reader to [29, section 2] for a concise proof of it.

1.3.Previous work on fast tower arithmetic

Recall that an element is said to be primitive over if . The primitive element theorem states that such an element always exists: if contains sufficiently many elements, then it suffices to take a sufficiently generic -linear combination of the . In this light, it may seem odd at first sight to work with towers of height , since an isomorphic tower of height one always exists. The problem is that both the computation of primitive elements and conversions between representations are usually expensive.

Concerning primitive elements, it is currently not known how to efficiently find one, together with its minimal polynomial and polynomials such that for . In fact, naive algorithms mostly boil down to linear algebra in dimension , and thus feature at least quadratic costs. One may for instance appeal to the FGLM algorithm to change orderings of Gröbner bases [21, 22].

If a modular composition algorithm (technically, for the multivariate version of section 3.2) of softly linear complexity were known over any field, then primitive element representations of towers would be computable in softly linear expected time by a randomized algorithm; conversions between towers and their primitive element representations would also become possible in softly linear time. These aspects have been studied by Poteaux and Schost in [51, 52], where subquadratic costs are achieved for multivariate modular composition and its transpose, as well as for computing primitive representations of towers.

Let represent a fixed positive rational value. If is the finite field with elements, then a major result by Kedlaya and Umans [42] states that modular composition and power projection are possible in time . Whenever , then this in particular implies that characteristic polynomials can be obtained with expected bit complexity : see [29, sections 2.3–2.5] for details. Based on these results, Poteaux and Schost have derived [51] fast algorithms for multiplication, division, traces and primitive element representations for separable towers over finite fields. Unfortunately, very large constant factors are hidden in the complexities of all these algorithms, which currently prevent them from being of any practical relevance.

For some other particular cases of interest, fast algorithms also exist for modular composition and related problems. For fixed irreducible moduli of sufficiently smooth degree , practically faster algorithms for modular composition have been proposed in [39]. When , we have also shown in [38] that modular composition can be achieved in quasi-linear time when computing with a sufficiently high numeric precision.

More direct approaches for algebraic tower arithmetic are usually based on computations modulo so-called “triangular sets”. We may see the preimage of over as a multivariate polynomial in such that is monic in , has degree in , degrees in for all , and . The sequence forms a special case of a triangular set. Triangular sets and decompositions have a long history that goes back to Ritt's contributions in differential algebra [53].

In this context of triangular sets, it was shown in [46] that two elements in can be multiplied in time . In [44, Proposition 2], Lebreton has proved the stronger bound . This result even applies if the ideal () is not radical, under the condition that form a regular chain (in the sense of Kalkbrener [41], see also [1] for generalities about triangular sets). For triangular sets of certain quite special types, we notice that even faster multiplication algorithms have been designed in [7]. When working over finite fields, practically efficient algorithms have also been proposed in [17], based on fast embeddings into towers of a specific type.

Another major occurrence of algebraic towers in the literature concerns “dynamic evaluation”. This technique was introduced by Della Dora, Dicrescenzo and Duval [19, 20] as a way to compute with algebraic numbers without requiring algorithms for polynomial factorization. Basically, the idea is to compute in as if it were a field, and cases are distinguished whenever a zero test of an element is requested: the “current tower” is then explicitly decomposed into a “direct product of two towers” such that the projections of in these two towers are respectively zero and invertible. The computations finally resume non-deterministically with each of the two new towers in the role of the current tower. For recent complexity results on this technique we refer the reader to [16].

1.4.Our contributions

The efficient computation with polynomials modulo zero-dimensional ideals defined by triangular sets is an important topic in applied algebra. Obtaining softly linear complexity bounds when working over a general field is an open problem. The previous best known bound due to Lebreton [44, section 3.2] is recalled in section 2.4.

Then, one first technical contribution of us concerns multivariate modular composition: in section 3.2 we design a new baby-step giant-step algorithm to evaluate with the same kind of complexity as the Brent–Kung algorithm for the univariate case. It slightly improves upon the incremental method described in [52], that actually relies on the bivariate case. In fact, our method applies the baby-step giant-step paradigm over the whole representation of . Compared to [52, Lemma 4] used with , the complexity bound given in Proposition 3.3 does not have restrictions on the characteristic, and is asymptotically smaller by a factor of .

The main contribution of this paper is section 4, which is devoted to a new deterministic algorithm to multiply in towers of separable field extensions (all the are thus irreducible and separable). For the first time we achieve an asymptotic complexity with an exponent that becomes arbitrarily close to one. From the asymptotically complexity point of view, and in the specific case of towers of separable field extensions, this improves upon [44, 52], and also upon the randomized algorithms for finite fields from [51]. For specific towers over finite fields, some of the algorithms from [17] remain more efficient both in practice and in theory.

The main idea behind our algorithms is as follows: we use the primitive element theorem to replace a general tower of height by a new “accelerated” tower for which we can control the height . There is a tradeoff between the cost of tower arithmetic (more expensive for larger ) and primitive element computations (more expensive for small ). By making a suitable compromise, we obtain our main complexity bound.

One shortcoming of our main result is that it only applies to towers of fields. Now if an algorithm is available for factoring polynomials in into irreducibles, then our results generalize to towers of separable integral extensions over , by “factoring” them into towers of separable field extensions. This reduction and the corresponding conversion algorithms (that generalize Chinese remaindering) are detailed in section 5. Some particular cases when the cost of factorization is often affordable are discussed in section 5.5.

Our final section 6 contains a general subquadratic Las Vegas complexity bound for computing primitive element representations of towers of separable field extensions, along with the related data for conversions: our algorithms directly benefit from the fast product, and they rely on power projections as in [11, 42, 51]. Compared to [52, Lemma 4] used with , section 6 gains a factor of asymptotically, without restrictions on the characteristic. When working over a finite field, section 6 does not improve upon the asymptotic complexity bounds from [51]. Nevertheless, [51] relies on the Kedlaya–Umans algorithm for modular composition, so we expect our approach to behave better in practice.

Let us finally stress once more that we regard our contribution as a uniform way to prove almost optimal complexity bounds for tower arithmetic in the algebraic complexity model. It is well known that this model is reasonably close to bit complexity models when working over finite fields. Over other fields such as , coefficient growth needs to be taken into account, which leads us beyond the scope of this paper. From a practical perspective, we also recall that efficient tower arithmetic has been developed over various specific kinds of base fields such as finite fields and subfields of : see [17, 38, 39] and section 5.5. It is true however that actual implementations of our own algorithms have fallen somewhat behind. We intend to address such implementation issues in the near future.

2.Basic tower arithmetic

This section gathers basic prerequisites on complexity models and polynomial arithmetic. In particular we recall Lebreton's results on multiplication in towers [44]. Then we examine the costs of divisions.

2.1.Complexity model

Let be an effective ring. Our complexity analyses concern the algebraic complexity model [14, Chapter 4] over , and more precisely straight-line programs and computation trees. In other words, running times of algorithms are measured in terms of the required number of ring operations in , and constants are thought to be freely at our disposal.

It is convenient to introduce the notations and as abstractions for the respective costs of multiplication and division in (whenever defined). For simplicity we always assume that multiplication is at least as expensive as addition, subtraction, and zero testing.

For randomized algorithms over a finite effective ring , we assume a special instruction that uniformly generates random elements in , with constant cost. For a given input, the cost of a randomized algorithm is thus a random variable. The expected cost for input size is defined as the maximum of the averages of these random variables over all possible inputs of size .

Remark 2.1. For arithmetic operations in basic rings and fields, such as or , it is common to use Turing or RAM machines to count the total number of “bit operations”. The translation of our results in terms of bit complexity is mostly straightforward, although some care is needed for the manipulation of multivariate polynomials on Turing machines. We refer to [36, 37] for more technical details.

2.2.Univariate polynomial multiplication

Let be an effective commutative ring with unity. We denote by a cost function for multiplying two polynomials . For general , it has been shown in [15] that one may take

(2.1)

For rings of positive characteristic, one has

by [30]. We make the following assumptions:

These assumptions hold for as in (2.1). Notice that (2.2) is also equivalent to . Applying equation (2.2) a finite number of times, it follows that for any fixed positive integer .

If is an effective -algebra that is also a finite dimensional free -module, then we denote by a cost function for multiplying two polynomials in , in terms of the number of operations in , and we make the same assumptions as for . Notice that we may always take .

In particular, if is a monic polynomial in of degree then is such an -algebra of dimension ; elements in are represented by polynomials in . In this case, we have and by means of Kronecker substitution [27, Chapter 8].

2.3.Primitive towers

If is a finitely generated -algebra, an element is said to be primitive if . In this case, we write the monic minimal polynomial of , so the following isomorphism holds:

Notice that such a primitive element representation does not necessarily exist: consider and .

Definition 2.2. A primitive tower representation of consists of the following data:

  • A primitive element of over , for .

  • The minimal polynomial of over , for .

  • such that , for and .

These data induce the following isomorphisms for :

The polynomials are called parametrizations of the in terms of the .

With the triangular set as defined in the introduction, an element is represented by a polynomial with defined modulo . So the conversion of into an element of boils down to evaluating modulo . The backward conversion consists in evaluating a univariate polynomial at . Such conversions are the topic of section 3 below.

Remark 2.3. Special kinds of primitive tower representations have been used before; see [18, 39], for instance. Algorithms therein use special routines for conversions between and . In the present paper, we only rely on conversions between and , the natural identity isomorphism , and combinations of these conversions.

2.4.Multiplication in towers

Under the assumptions of section 2.2, recall that there exists a constant such that for , and where represents a cost function for multiplying two polynomials in , in terms of the number of operations in . When using this bound in an iterated fashion, we obtain

By taking care of the cost of polynomial divisions in terms of multiplications, one may prove the following sharper bounds:

Proposition 2.4. If for , then there exists a constant such that

Proof. Taking , Lebreton proved that ; see [44, section 3.2]. His proof was done for the particular cost function . For completeness, we briefly repeat this proof, for a general cost function such that is non-decreasing. We recall that two polynomials of partial degrees in the can be multiplied in time using Kronecker substitution recursively; see [27, Chapter 8] and [44, section 3.2].

We let represent the triangular set associated to the tower, as defined in section 1.3. We identify elements in the tower with their canonical representatives in . Let represent the reverse polynomial of in . Assume for now that we precomputed polynomials in , of partial degrees in the , such that

Let represent a cost function for reducing polynomials in of degrees in for modulo . The reduction of such a polynomial modulo is done recursively, as follows:

  1. Let . If , then we reduce modulo and we are done.

  2. Otherwise, let be the reverse of .

    1. Reduce modulo , in time .

    2. Compute , in time .

    3. Reduce modulo , in time .

  3. Let , compute , and then reduce modulo . It turns out that equals modulo ; see for instance [27, Chapter 9, section 1].

Altogether, the cost of this algorithm is therefore bounded by

for some universal constant . We deduce that .

Now let and be the respective representatives of two elements and to be multiplied in the tower. The product takes , and is then reduced modulo , whence

For the second bound of the proposition, we may multiply two polynomials in as in with operations in , and then reduce their product coefficientwise modulo with cost , whence .

It remains to observe that once are known, the precomputation of using Newton's method takes operations in ; see [27, section 9.1], for instance. Therefore, the complexity bounds obtained so far remain valid up to a constant factor when taking the cost for obtaining the into account. Finally, our assumption for implies , so (2.2) leads to the claimed bounds.

Remark 2.5. The constant does not play a critical role in the design of the forthcoming algorithms. Of course any improvement of remains relevant; the dependence on will be made apparent in Corollary 4.11.

Remark 2.6. The assumption that for all is convenient for proving several complexity bounds. For with , we notice that does not occur in the representation of elements in , so extensions of degree one can be suppressed from the tower without loss of generality. The cost of the corresponding rewritings does not intervene in the algebraic complexity model. On a Turing or RAM machine this cost would actually depend on the way how multivariate polynomials are represented, but it is typically at most linear in the size of the representation.

2.5.Division in towers

In the remainder of this section, we assume that we work in a tower of fields over . Given such that is monic, it is well known [27, Chapter 9] that the quotient and the remainder of the Euclidean division of by can be computed in time .

It is important for us that the gcd algorithm with input returns the monic polynomial along with such that . In this way, if then is the inverse of modulo and is the inverse of modulo . It is well known [27, Chapter 11, Algorithm 11.6] that this extended gcd can be performed in time .

In fact, when replacing all polynomial divisions by pseudo-divisions in [27, Chapter 11], we avoid all divisions in , but the computed gcd is not necessarily monic. Multiplying , and with the inverse of the leading coefficient of , we do obtain a monic gcd, at the expense of a single division in . Summarizing, this shows that the extended monic gcd can actually be computed in time .

Now consider an extension of , where is a monic irreducible polynomial of degree . Then the above bounds for division and gcd computations yield

When using the univariate algorithm for inverting non-zero elements in in a recursive manner, we obtain the following complexity bound:

Proposition 2.7. Let and assume that for . With the above assumptions and notations, and with as in Proposition 2.4, we have

Proof. Let and be universal constants with for all and for all extensions of as previously considered. Then we have

and we conclude by induction.

Let us mention that inversion modulo a general triangular set (that does not determine a prime ideal) is more subtle; see various approaches in [16, 49].

3.Evaluating polynomials at points in algebras

Throughout this section represents an effective ring and is an effective -algebra with a given basis . Given a polynomial and a point , we study how to compute efficiently. We first recall the well known baby-step giant-step algorithm in the case when . In section 4 below, these evaluation algorithms will be used for conversions between triangular and primitive representations of the same algebra.

3.1.Univariate baby-step giant-step method

Given a polynomial and , how to evaluate efficiently at ? Horner's method uses time . For convenience of the reader, we now recall a well known faster algorithm from [50] that relies on the baby-step giant-step technique.

Algorithm 3.1

Input: and .

Output: .

  1. Let and .

  2. For do:

    1. Compute and decompose .

      (This yields an matrix .)

    2. For , let .

      (This yields a matrix .)

  3. Compute the matrix product .

  4. For , let .

  5. Return .

Proposition 3.1. Algorithm 3.1 is correct. If , then it runs in time

Proof. By construction, we have for , whence . This proves the correctness of the algorithm. Step 2a requires multiplications in , when computing the powers using . Step 2b takes assignments in , which come for free in our complexity model. The matrix multiplication in step 3 can be done in time . Step 5 involves multiplications and additions in , when using Horner's method. Altogether, this leads to the claimed running time.

Remark 3.2. If and if a monic annihilator with is known, then it is possible to compute with operations in . Indeed, we have , and is computed in time .

3.2.Multivariate baby-step giant-step generalization

Let us now study the evaluation of a multivariate polynomial of partial degree in each at a point . Setting and writing for the coefficient of the monomial in , we have the following generalization of Algorithm 3.1 to multivariate polynomials.

Algorithm 3.2

Input: and with for .

Output: .

  1. Let be maximal with .

  2. If then let else let .

    Then let , , and .

  3. For do:

    1. Let .

    2. Compute and decompose .

      (This yields an matrix .)

    3. For do:

      1. Let .

      2. Let .

        (This yields a matrix .)

  4. Compute the matrix product .

  5. For , let .

  6. Return .

Proposition 3.3. Algorithm 3.2 is correct. If and for , then it runs in time

Proof. This time, for all , we have

Plugging this expression into the formula of step 6 shows that the algorithm indeed returns the desired evaluation.

From we always have . If , then we obtain

and then

The lower bound on also implies

whence

Otherwise straightforwardly implies and . In all cases and are in . Consequently the complexity analysis is the same as for Algorithm 3.1. Of course, one has to carefully evaluate the power products in step 3b and the sum in step 6, in order to use only multiplications in .

Remark 3.4. The constant in step 2 of Algorithm 3.2 may be replaced by any other positive value strictly less than .

4.Separable towers of fields

Let be a tower of separable algebraic extensions of an effective base field . Throughout this section, we assume that the extension degrees satisfy for all (which is usually not restrictive as explained in Remark 2.6), and that is as in Proposition 2.4. Our main objective is to design a fast algorithm for multiplying elements in the tower.

The basic idea is as follows. Let be an arbitrarily small positive constant. If at least half of the are “sufficiently large”, then automatically becomes “small enough” to ensure that . Otherwise, there exist consecutive small degrees , and we replace the corresponding subtower of extensions by a primitive one . We keep repeating these replacements until the height of the tower becomes “small enough”. Some careful balancing is required here, since the computation of a primitive element for over can become expensive if gets “too large”. The precise tradeoff will be made explicit at the end of the section.

In this section, given two effective rings and along with a natural way to convert elements in to elements in , we denote by the cost of such conversions. If we also have an algorithm for conversions in the opposite direction, then we write .

4.1.Primitive tower representations

Assuming that contains sufficiently many elements, the aim of this subsection is to construct a primitive element representation in the sense of Definition 2.2. We thus have to compute primitive elements over such that

together with the minimal polynomials of over for . We also show how to convert efficiently between each of the two representations of .

Proposition 4.1. Assume that we are given a primitive tower representation of such that and for some , for . Then we have

Proof. The bound on the cost of conversions from to is a direct consequence of Proposition 3.3 by taking the assumption into account. For , the minimal polynomial of over may thus be converted into a minimal polynomial over in time . The computation of requires extra operations since for all .

For conversions from to , we consider a polynomial . We evaluate at in seen as with cost by Proposition 3.1. We thus obtain a polynomial such that .

We next need to convert the coefficients of from to . Using a straightforward induction this leads to the following cost:

Here we again use of the assumptions that and for all .

Remark 4.2. In [39, Corollary 7.3], we have introduced an alternative approach to conversions that is more efficient when is “sufficiently small”. More precisely, modulo suitable precomputations, we have shown that conversions between elements of and can be done in time . In various special cases, the factor can be further reduced.

The next proposition provides us with complexity bounds for computing primitive tower representations by induction. We denote the cardinality of by .

Proposition 4.3. Assume that and that a primitive tower representation of has already been computed. Then

  1. We can compute of the form with , and in time

  2. Given and , we can compute with for in time

In addition, if then the computations in part a can be achieved with expected cost by means of a randomized Las Vegas algorithm.

Proof. Thanks to Proposition 4.1, modulo conversions of cost between and , we rewrite into such that , , . We thus have

Now let be a parameter that will be specified later, and consider . The characteristic polynomial of over equals to . Consequently the characteristic polynomial of over is -proportional to

The polynomial can be obtained as the preimage of whose computation costs by a standard “divide and conquer” approach; see [5, Lemma 7], for instance. Then the resultant may be computed in time by [45, Corollary 26].

Regarding as an indeterminate, the polynomial has degree . Therefore may be interpolated from values of . The total cost to obtain is thus

Geometrically speaking, the system admits the pairwise distinct solutions in , where denotes the algebraic closure of . The polynomial is separable if, and only if, its roots are pairwise distinct. There are at most values of for which this is not the case, so we simply need to try distinct values of until becomes separable. Testing the separability of for values of is achieved as follows.

We first evaluate for . Using fast multi-point evaluation, this takes time . We next test whether the discriminant of in vanishes, for ; this can be done in time . Doing this for packets of values, the overall computation can be done in time . This completes the proof of part a.

As to part b, for any root of the gcd of and has degree , so the specialization property of subresultants ensures that the subresultant in of degree of and has the form with coprime to . This subresultant can be computed in time again by [45, Corollary 26]. Since and have degrees , we obtain the polynomial modulo in with additional cost . Then from we deduce that . For , we then take . Proposition 3.1 implies that the cost of these modular compositions is bounded by . This completes the proof of part b.

If , then we use a Las Vegas algorithm for part a: we pick at random in a set of size until is separable. Each pick is successful with probability at least , so the expected cost is .

Corollary 4.4. Assume that . Then a primitive tower representation of can be computed in time . Using a randomized Las Vegas algorithm, the computation can even be done with expected cost .

Proof. This directly follows from the latter proposition, using that for all .

Remark 4.5. In a similar way as for the computation of gcds in section 2.5, it is possible to avoid divisions during the computation of resultants and subresultants. However, the required adaptations of the algorithms from [45] are a bit more technical than the mere replacement of Euclidean divisions by pseudo-divisions. For this reason, we have not further optimized the number of divisions in our complexity bounds.

4.2.Accelerated tower arithmetic

The main problem with the complexity bounds of Proposition 2.4 is that the height of the tower may get as large as . Now if indeed gets large, then many of the are necessarily small. Furthermore, as long as a product of the form is reasonably small, the results from the previous subsection allow us to change the given representation of into a more efficient primitive element representation . Repeating this operation as many times as necessary, we reduce the height of the tower and guarantee that all defining polynomials have sufficiently large degrees. This process is detailed in the next paragraphs.

Definition 4.6. Let be a positive integer. We define a -accelerated tower representation of to consist of the following data:

  • A sequence of integers .

  • A tower such that for , represented by a sequence of defining polynomials , where and .

  • For , a primitive tower representation of seen as a tower over and ending with .

  • Denote . If and , then . If , then .

Lemma 4.7. Let the notations be as in Definition 4.6 and assume that . There exists a -accelerated tower representation of height .

Proof. The construction of the sequence is done by induction. For with , assume that we have completed the construction up to height . We distinguish the following cases:

Whenever we must have either or . Consequently the number of such indices is constrained by . It follows that . On the other hand the number of indices with is necessarily . It follows that . Once the indices are known, the existence of a -accelerated tower representation follows from Corollary 4.4.

Algorithm 4.1

Input. A separable tower of fields over and with .

Output. A -accelerated tower representation of .

  1. Determine the integer sequence as described in the proof of Lemma 4.7. Set and .

  2. For do:

    1. Compute a primitive tower representation of over .

    2. Let and respectively represent the top level primitive element found for over and its minimal polynomial, and set .

  3. Return , , together with the primitive tower representations from step 2a.

In order to bound the execution time of Algorithm 4.1, we need to carefully analyze the cost of conversions between and for .

Lemma 4.8. With the notations of Algorithm 4.1, there exists a universal constant such that for , we have

Proof. By Proposition 2.4, there exists a universal constant with

By Proposition 4.1, there also exists a universal constant such that conversions between and can be performed in time

for , and where we used the assumption that is non-decreasing in . If then such conversions are actually for free. Otherwise we have , whence

Now take and let us prove the lemma by induction over . For we have nothing to do, so we assume that . Using the induction hypothesis, conversions between and are performed in time

Consequently,

The lemma follows by induction.

Proposition 4.9. Assume that . Then Algorithm 4.1 is correct and runs in time

In addition, if , then a randomized Las Vegas version of Algorithm 4.1 has expected cost .

Proof. The correctness of Algorithm 4.1 is ensured by Lemma 4.7. Let us analyze the cost of step 2a for a given . The necessary conversions for rewriting the defining polynomials of over can be done in time

by Lemma 4.8. If , then there is nothing to be done since . So assume that , whence . Then Proposition 2.7 gives

Consequently, in view of Corollary 4.4 and using assumption (2.2), the remainder of step 2a takes time

which dominates the cost of conversions since . The total cost for all is thus .

Finally, if , then the randomized variant from Corollary 4.4 leads to the claimed expected cost.

Theorem 4.10. If and , then

In addition, if a -accelerated tower representation of is known, then we have

Proof. The cost to obtain a -accelerated tower representation of is given in Proposition 4.9, namely .

Then, in order to multiply two elements in , we convert them into the accelerated representation, and multiply them with cost

by Proposition 2.4. We finally convert the product back into . By Lemma 4.8, each of these conversions can be done in time .

Corollary 4.11. If , then we have

where . In particular, .

Proof. It is important to notice that constants hidden in the “ of Theorem 4.10 are independent of the value for , so we may freely set in terms of . Now taking balances the contributions of and . Indeed, since by Lemma 4.7, we have . We conclude by the latter theorem.

Remark 4.12. For the same reasons as in Remark 4.5, we have not attempted to further optimize the number of divisions in our complexity bounds. Again, we expect that most divisions in can actually be avoided.

5.Separable towers and factorization

Up to now we have focused on towers of field extensions. In this section, we turn our attention to more general towers of separable integral ring extensions. Each extension ring is still assumed to be a product of separable field extensions over a common ground field , which will allow us to apply the theory from the previous sections.

We start with a description of the precise types of towers of rings that we will be able to handle. We next explain how such towers of rings can be “factored” into towers of fields and we analyze the cost of the corresponding conversions. Finally, we give a complexity bound for the computation of “tower factorizations”. We do not claim that our approach is efficient in all cases. Nevertheless it is expected to be so whenever the factorization process behaves as a pretreatment with negligible cost (or when factoring polynomials over is reasonably affordable; see section 5.5). See the conclusion of this paper for a discussion of alternative approaches.

5.1.Separable towers of rings

Throughout this section, we still assume that the ground ring is a field, written , and we consider a tower of integral ring extensions of . For , we still let denote the defining polynomial of over , define , , etc. as before, and assume that . We say that is a tower of separable integral ring extensions, or just separable in short, if

By induction over , one may check that each is isomorphic to a direct product of separable algebraic extensions of . The projection of into is separable in the usual sense, which means that and are coprime for all and .

In terms of the triangular set defined in the introduction, a separable tower corresponds to the situation where the ideal is radical over the algebraic closure of . In particular, this means that and therefore are uniquely determined by the variety

For each , we notice that is the projection of onto the first coordinates.

From a geometric point of view, one may regard computations with elements in as computations with all zeros in in parallel. Alternatively, we may regard them as computations with parameters subject to the polynomial constraints .

5.2.Tree factorizations of towers

Consider two separable towers and of the same height and over the same base field . Let and denote the respective defining polynomials for and . We say that is a factor of if . This is the case if and only if there exist natural projections (that naturally extend to projections in a coefficientwise manner) such that:

We say that is irreducible if the are all fields or, equivalently, if the are all irreducible.

The notation “()” stands for the empty tuple. Let be index sets of -tuples of integers with and

for suitable integers and . Consider a family of factor towers of with the property that only depends on for all , and write . We say that such a family of factors forms a tree factorization of if the variety is partitioned into . We say that the factorization is irreducible, if all its factor towers are irreducible. If and , then we also write for the defining polynomial of over .

It is convenient to represent such a factorization by a labeled tree (see Figure 5.1 ): the nodes are identified with the index set , and each node is labeled with the algebraic extension . The parent of the node with is simply the node . Each individual factor corresponds to a path from the root to a leaf.

Figure 5.1. Example of a representation of a tree factorization of a tower of height .

Given and , let

Projecting the equality

on the first coordinates, we observe that the projection of , for given , is the same for all , and equals , whence . Consequently, forms a tree factorization of the subtower . From an algebraic point of view, this means that we have a natural isomorphism

(5.1)

For any , we denote by the natural projection of onto . Dually, the family forms a tree factorization of the tower , where

for . In particular, if , then this relation becomes

(5.2)

This corresponds to the factorization

(5.3)

where we recall that stands for the defining polynomial of for , whereas is the defining polynomial of .

5.3.Multi-modular representations

If , then (5.1) reduces into the well known isomorphism

(5.4)

from the Chinese remainder theorem. We may thus view the isomorphism (5.1) as a generalized Chinese remainder theorem. The multi-modular representation of an element in is simply its image under this isomorphism. The aim of this section is to present efficient algorithms for conversions between the usual and the multi-modular representations.

For the usual Chinese remainder theorem, efficient algorithms are well known for carrying out the corresponding conversions [6, 23, 48]. These algorithms are based on the technique of so-called remainder trees, for which recent improvements can be found in [4, 10, 35]. In particular, if then the conversions from (5.4) can be carried out in time ; see for instance [27, Chapter 10]. These fast algorithms actually work in our context. This means that the isomorphism

from (5.2) and (5.3) can be computed with complexity , whenever modulo are precomputed for all . In fact, if are elements of with natural preimages for , then the natural preimage of can be computed as

(5.5)

using fast “linear combination for linear moduli” [27, Algorithm 10.9].

The idea is to combine these “isomorphisms at nodes in order to compute the global isomorphism from (5.1). For the complexity analysis, it is convenient to introduce the following maximal normalized cost of multiplication in any of the factors of the tree factorization:

(5.6)

For the time being, we may use Proposition 2.4 as a complexity bound for . The conversion of elements in into elements in is called multi-modular reduction and we may use the following algorithm for this operation:

Algorithm 5.1

Input: a tree factorization of a tower , and .

Output: .

  1. If , then return .

  2. Expand with .

  3. For , recursively compute .

  4. For each do:

    1. Set .

    2. Compute for , so that .

  5. Collect and return the family .

Proposition 5.1. Algorithm 5.1 is correct and runs in time .

Proof. The correctness is straightforward from the definitions. We perform step 4b using fast univariate multi-modular reduction. Let be a universal constant such that multi-modular reduction of a polynomial of degree over an arbitrary effective ring can be performed in time .

Let us show by induction over that the algorithm runs in time . For , the result is obvious, so assume that . By the induction hypothesis, step 3 runs in time . By the definition of , the contribution of step 4b to the complexity is bounded by

Adding up, it follows that the algorithm runs in time .

The opposite conversion of elements in into elements in is called Chinese remaindering; we may use the following algorithm for this operation:

Algorithm 5.2

Input: a tree factorization of and a family with .

Output: such that for all .

Assumption: modulo are precomputed for all , , and .

  1. If , then is the singleton made of the empty tuple, so return .

  2. For each do:

      Compute such that for .

  3. For do:

      Recursively compute such that for all .

  4. Return .

Proposition 5.2. Algorithm 5.2 is correct and costs .

Proof. The proof is very similar to the one of Proposition 5.1. The polynomials modulo are actually used in the Chinese remaindering of step 2 via formula (5.5).

5.4.Factorization

Factoring univariate polynomials over an arbitrary effective field is generally expensive or even impossible [24, 25]. Still, if an algorithm for this task is given, then it is interesting to study how to exploit it for the computation of an irreducible tree factorization of a given separable tower . Once such an irreducible tree factorization is known, arithmetic and other operations can potentially be sped up by switching to the multi-modular representation thanks to Algorithms 5.1 and 5.2.

For the complexity analysis, the function represents the time necessary to factor a polynomial in , and it is again convenient to introduce the following maximal normalized cost of factoring univariate polynomials over any of the fields in the factor towers:

(5.7)

Here we notice that each of the irreducible factors is a tower of separable fields, so we may directly use the accelerated arithmetic from section 4 for computations over any of the .

Algorithm 5.3

Input: a separable tower .

Output: the irreducible tree factorization of .

  1. If , then return .

  2. Recursively compute the irreducible tree factorization of .

  3. Compute using Algorithm 5.1.

  4. For each do:

    1. Compute the irreducible factors of .

    2. Let for .

  5. Return , where .

Proposition 5.3. Algorithm 5.3 is correct and takes time at most whenever for .

Proof. The cost of all factorizations in step 4a is bounded by

Together with recursive calls, the total cost of all factorizations is bounded by

Step 3 takes time , by Proposition 5.1, and the same step for the recursive calls amounts to a similar complexity. We conclude by adding up these contributions.

We are now ready to present a major corollary of this result; in combination with Algorithms 5.1 and 5.2, it reduces arithmetic in general separable towers to arithmetic in accelerated towers of fields.

Corollary 5.4. The irreducible tree factorization of a separable tower can be computed in time .

Proof. This follows from Proposition 5.3 combined to Corollary 4.11, which yields .

5.5.Remarks about special coefficient fields

For certain specific fields of coefficients , factorization of univariate polynomials over can be reasonably cheap. Let us briefly study two particular such cases.

Complex numbers
Let us first consider the case when is a subfield of . In practice, this usually means that elements of are represented approximately by complex floating point numbers of fixed bit precision . The bit complexity of the field operations in is then bounded by , where bounds the cost of -bit integer multiplication. It has recently been shown that ; see [31-33].

Even if or if is an algebraic number field, then it may be useful to temporarily convert coefficients in into bit complex floating point numbers and convert the results of computations back using rational reconstruction techniques [27, Chapter 5].

A convenient framework for analyzing the “ultimate complexity” of numeric algorithms was introduced in [38]. Concerning the factorization of complex polynomials of degree , it was shown therein that all roots could be computed at precision in time for “sufficiently large precisions” . The actual precision from which this bound becomes relevant highly depends on the location of the roots. As a rule of thumb, a precision is often required and sufficient, but we refer the reader to the seminal works of Schönhage, Pan and others for details [47, 54].

Since is algebraically closed, towers of separable algebraic extensions can be factored into towers of degree one. From the “ultimate bit complexity” point of view from [38], it follows that the ultimate bit complexity of the conversions in Propositions 5.1 and 5.2 becomes and so does the cost of factoring itself in Proposition 5.3. Using our rule of thumb, we expect that a precision of the order of should be sufficient in practice in order to observe these complexity bounds. Small values of are thus expected to be favourable for this type of coefficients (notice that this was the least favourable case for general coefficients!). A refined comparison between the bit complexities of this approach and the building of accelerated towers would require efforts that are beyond the scope of the present paper.

Finite fields
Let us next consider the case when is a finite field of characteristic , with elements. The bit complexity of multiplying two polynomials in is bounded by , uniformly in ; see [30, 34]. Fast computations in towers of finite field extensions were treated quite extensively in [39]; see also [17] for alternative approaches. The case when is small is most favourable. The reason is that the defining polynomial of can be factored over each of the subfields efficiently. It is also worth to mention that [39] contains various fast algorithms for the computation of primitive elements in towers of finite fields. Once again, these algorithms are most efficient whenever remains small. It would be interesting to exploit these techniques to further accelerate arithmetic in towers of finite fields.

6.Primitive element representations

This section revisits the computation of traces and characteristic polynomials in towers of field extensions over a general field by using fast products. Our algorithms are rather different from those of [52]: we compute traces faster, but for characteristic polynomials and primitive elements the bottleneck lies in the baby-step giant-step method, so the complexity exponents in terms of turn out to be the same as in [52] (notice that we gain a logarithmic factor ). Until the end of this section, is a tower of separable field extensions of degrees and is as in Proposition 2.4.

6.1.Trace computations in towers

Let us recall how to compute traces in a tower of separable field extensions. We let represent the reverse polynomial of . Its constant coefficient is and the trace function of over may be computed as the first terms of the power series

In other words, the trace map can be regarded as a vector in that can be computed in time : indeed this vector represents the matrix of with respect to the basis . Given such a vector representation, the individual evaluation of at an element in takes time .

More generally, is a -algebra for all , and the relative trace function satisfies

Lemma 6.1. Assume that products in are performed by linear algorithms over in the sense of [10] (and as is always the case in this paper). Then the vector representation of the trace map can be computed in time

Proof. First we compute , as a vector in , for . This computation takes time . We regard each as a -linear map from to , whose evaluation takes time . Regarding as a vector , we apply the transpose of to in order to obtain a new vector that represents . This computation takes time thanks to the transposition principle (as explained in [10]). We next apply the transpose of to and obtain a vector that represents . Continuing this way, we obtain as a vector in time .

6.2.Characteristic polynomials

We may take advantage of accelerated towers to compute a primitive element over , together with its minimal polynomial and the parametrizations with for . We first consider the computation of characteristic polynomials.

Theorem 6.2. Assume that a -accelerated tower for is given, with , and that the inverses of are available in . Then the characteristic polynomial of over can be computed in time .

Proof. Lemma 6.1 and Proposition 2.4 allow us to compute the representation of as a vector in in time

Using Lemma 4.8, we convert into an element of in time . By using the transposed product in the vector representation of the linear form can be computed in time . Then, as in section 1.2, the computation of

is achieved by transposing Algorithm 3.1, so Proposition 3.1 yields the cost

We finally use the Newton–Girard identity (1.1) to recover the characteristic polynomial of with further cost ; see for instance [29, section 2.4]. The conclusion follows.

By taking as in Corollary 4.11, the expected cost of Theorem 6.2 simplifies to , thanks to the assumption .

6.3.Primitive elements

We are now in a position to design a randomized algorithm with subquadratic expected cost for computing a primitive element representation of over .

Theorem 6.3. Assume that , that the inverses of are available in , and that a -accelerated tower for is given, with . Then a primitive element representation of over can be computed by a randomized Las Vegas algorithm with expected cost

Taking as in Corollary 4.11, this cost further simplifies into .

Proof. The characteristic polynomial of has total degree in . So its discriminant is a polynomial in of total degree . By the Schwartz–Zippel lemma [55, 57], picking at random with the in a set of size leads to a primitive element with probability . Computing the characteristic polynomial of takes time by Theorem 6.2. We can verify that is a primitive element by checking that its characteristic polynomial is separable with further cost . Consequently the expected time to find a primitive element is .

Now assume that a primitive element has been found, and let us consider the computation of polynomials such that for . We follow Kronecker's classical deformation technique of the primitive element; see for instance [28, section 3.3]. For this purpose, we first introduce a new formal indeterminate with , so that is the ring of tangent numbers over . We aim at computing the characteristic polynomial of over . Then

Notice that is already available at this point. The computation of the vector representation of the linear form can be done in time by the transposition principle. Therefore is obtained by transposing Algorithm 3.1. In view of Proposition 3.1, this computation takes time

At this point, we have computed the traces for all . We deduce the characteristic polynomial of using Newton–Girard's formula (1.1), so that

It follows that . The total cost to obtain all the for is

Now let be an element of . It can be converted into an element of in time O() by Lemma 4.8, where has partial degree in for . The expression of in terms of is obtained as in time by Proposition 3.3. By successively taking for , the polynomials can all be obtained in time

Finally, thanks to , the assumption implies the simplified complexity bound.

Conclusion

We have proved nearly optimal complexity bounds for basic arithmetic operations in towers of fields of arbitrary height. Our results are based on the newly introduced concept of “accelerated” towers. Two major problems remain open:

Another major challenge concerns efficient implementations. Asymptotically fast algorithms based on Proposition 2.4 obviously only become interesting when is in the range where evaluation-interpolation style algorithms are used for univariate arithmetic. For instance, FFT-based algorithms are typically used for .

For such sufficiently large , we expect our acceleration techniques to be useful. For instance, assume that and are both small, say . Then conversions between and an equivalent primitive representation can be done very fast. Conversions between and can be done coefficientwise, so they are also fast. Without much cost, this allows us to diminish the height by one and gain a constant factor when applying Proposition 2.4.

This discussion indicates that our techniques should be of practical interest for sufficiently large and generic base fields . For specific base fields and specific types of towers, various alternative approaches have been proposed [17, 38, 39], and it would require a more significant implementation effort to figure out which approaches are best; we intend to address to this issue in the near future.

Acknowledgments
We thank the anonymous referees for their helpful comments.

Bibliography

[1]

Ph. Aubry, D. Lazard, and M. Moreno Maza. On the theories of triangular sets. J. Symbolic Comput., 28(1–2):105–124, 1999.

[2]

S. Baktir, J. Pelzl, T. Wollinger, B. Sunar, and C. Paar. Optimal tower fields for hyperelliptic curve cryptosystems. In Conference Record of the Thirty-Eighth Asilomar Conference on Signals, Systems and Computers, 2004, volume 1, pages 522–526. IEEE, 2004.

[3]

N. Benger and M. Scott. Constructing tower extensions of finite fields for implementation of pairing-based cryptography. In M. Anwar Hasan and T. Helleseth, editors, Arithmetic of Finite Fields. Third International Workshop, WAIFI 2010. Istanbul, Turkey, June 27-30, 2010. Proceedings, volume 6087 of Lecture Notes in Comput. Sci., pages 180–195. Springer-Verlag Berlin Heidelberg, 2010.

[4]

D. Bernstein. Scaled remainder trees. Available from https://cr.yp.to/arith/scaledmod-20040820.pdf, 2004.

[5]

J. Berthomieu, G. Lecerf, and G. Quintin. Polynomial root finding over local rings and application to error correcting codes. Appl. Algebra Engrg. Comm. Comput., 24(6):413–443, 2013.

[6]

A. Borodin and R. T. Moenck. Fast modular transforms. J. Comput. System Sci., 8:366–386, 1974.

[7]

A. Bostan, M. F. I. Chowdhury, J. van der Hoeven, and É. Schost. Homotopy methods for multiplication modulo triangular sets. J. Symbolic Comput., 46(12):1378–1402, 2011.

[8]

A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, É. Schost, and A. Sedoglavic. Fast computation of power series solutions of systems of differential equations. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA '07, pages 1012–1021, Philadelphia, PA, USA, 2007. Society for Industrial and Applied Mathematics.

[9]

A. Bostan, L. Gonzales-Vega, H. Perdry, and É. Schost. Complexity issues on Newton sums of polynomials. Distributed in the digital proceedings of MEGA'05, 2005.

[10]

A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.

[11]

A. Bostan, B. Salvy, and É. Schost. Fast algorithms for zero-dimensional polynomial systems using duality. Appl. Algebra Engrg. Comm. Comput., 14(4):239–272, 2003.

[12]

F. Boulier, D. Lazard, F. Ollivier, and M. Petitot. Computing representations for radicals of finitely generated differential ideals. Appl. Algebra Engrg. Comm. Comput., 20(1):73–121, 2009.

[13]

R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.

[14]

P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.

[15]

D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Inform., 28:693–701, 1991.

[16]

X. Dahan, M. Moreno Maza, É. Schost, and Yuzhen Xie. On the complexity of the D5 principle. In J.-G. Dumas, editor, Proceedings of Transgressive Computing 2006: a conference in honor of Jean Della Dora, pages 149–168. U. J. Fourier, Grenoble, France, 2006.

[17]

L. De Feo, J. Doliskani, and É. Schost. Fast algorithms for -adic towers over finite fields. In Proceedings of the 38th International Symposium on Symbolic and Algebraic Computation, ISSAC '13, pages 165–172, New York, NY, USA, 2013. ACM.

[18]

L. De Feo and É. Schost. Fast arithmetics in Artin–Schreier towers over finite fields. J. Symbolic Comput., 47(7):771–792, 2012.

[19]

J. Della Dora, C. Dicrescenzo, and D. Duval. About a new method for computing in algebraic number fields. In B. F. Caviness, editor, EUROCAL '85: European Conference on Computer Algebra Linz, Austria, April 1–3 1985 Proceedings Vol. 2: Research Contributions, pages 289–290. Springer Berlin Heidelberg, 1985.

[20]

D. Duval. Algebraic numbers: An example of dynamic evaluation. J. Symbolic Comput., 18(5):429–445, 1994.

[21]

J.-C. Faugère, P. Gaudry, L. Huot, and G. Renault. Sub-cubic change of ordering for Gröbner basis: a probabilistic approach. In K. Nabeshima, editor, Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 170–177, New York, NY, USA, 2014. ACM.

[22]

J.-C. Faugère, P. Gianni, D. Lazard, and T. Mora. Efficient computation of zero-dimensional Gröbner bases by change of ordering. J. Symbolic Comput., 16(4):329–344, 1993.

[23]

C. M. Fiduccia. Polynomial evaluation via the division algorithm: the fast Fourier transform revisited. In A. L. Rosenberg, editor, Fourth annual ACM symposium on theory of computing, pages 88–93, 1972.

[24]

A. Fröhlich and J. C. Shepherdson. On the factorisation of polynomials in a finite number of steps. Math. Z., 62:331–334, 1955.

[25]

A. Fröhlich and J. C. Shepherdson. Effective procedures in field theory. Philos. Trans. Roy. Soc. London. Ser. A., 248:407–432, 1956.

[26]

A. Garcia and H. Stichtenoth. A tower of Artin–Schreier extensions of function fields attaining the Drinfeld–Vladut bound. Invent. Math., 121(1):211–222, 1995.

[27]

J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 3rd edition, 2013.

[28]

M. Giusti, G. Lecerf, and B. Salvy. A Gröbner free alternative for polynomial system solving. J. Complexity, 17(1):154–211, 2001.

[29]

B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Algebra Engrg. Comm. Comput., 27(3):237–257, 2016.

[30]

D. Harvey and J. van der Hoeven. Faster integer and polynomial multiplication using cyclotomic coefficient rings. Technical report, ArXiv, 2017. http://arxiv.org/abs/1712.03693.

[31]

D. Harvey and J. van der Hoeven. Faster integer multiplication using short lattice vectors. Technical report, ArXiv, 2018. http://arxiv.org/abs/1802.07932, to appear in Proc. ANTS 2018.

[32]

D. Harvey and J. van der Hoeven. Faster integer multiplication using plain vanilla FFT primes. Math. Comp., 88(315):501–514, 2019.

[33]

D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication. J. Complexity, 36:1–30, 2016.

[34]

D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.

[35]

J. van der Hoeven. Faster Chinese remaindering. Technical report, CNRS & École polytechnique, 2016. http://hal.archives-ouvertes.fr/hal-01403810.

[36]

J. van der Hoeven and G. Lecerf. On the complexity of multivariate blockwise polynomial multiplication. In J. van der Hoeven and M. van Hoeij, editors, Proceedings of the 37th International Symposium on Symbolic and Algebraic Computation, ISSAC '12, pages 211–218. ACM, 2012.

[37]

J. van der Hoeven and G. Lecerf. On the bit-complexity of sparse polynomial and series multiplication. J. Symbolic Comput., 50:227–254, 2013.

[38]

J. van der Hoeven and G. Lecerf. Modular composition via complex roots. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01455731.

[39]

J. van der Hoeven and G. Lecerf. Modular composition via factorization. J. Complexity, 48:36–68, 2018.

[40]

Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.

[41]

M. Kalkbrener. A generalized Euclidean algorithm for computing triangular representations of algebraic varieties. J. Symbolic Comput., 15(2):143–167, 1993.

[42]

K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.

[43]

U. J. J. Le Verrier. Sur les variations séculaires des éléments elliptiques des sept planètes principales : Mercure, Venus, La Terre, Mars, Jupiter, Saturne et Uranus. J. Math. Pures Appli., 1:220–254, 1840.

[44]

R. Lebreton. Relaxed Hensel lifting of triangular sets. J. Symbolic Comput., 68(2):230–258, 2015.

[45]

G. Lecerf. On the complexity of the Lickteig–Roy subresultant algorithm. J. Symbolic Comput., 92:243–268, 2019.

[46]

X. Li, M. Moreno Maza, and É Schost. Fast arithmetic for triangular sets: from theory to practice. J. Symbolic Comput., 44(7):891–907, 2009.

[47]

J. M. McNamee and V. Y. Pan. Numerical Methods for Roots of Polynomials, Part II, volume 16 of Studies in Computational Mathematics. Elsevier, 2013.

[48]

R. T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.

[49]

M. Moreno Maza, É. Schost, and P. Vrbik. Inversion modulo zero-dimensional regular chains. In V. P. Gerdt, W. Koepf, E. W. Mayr, and E. V. Vorozhtsov, editors, Computer Algebra in Scientific Computing. 14th International Workshop, CASC 2012. Maribor, Slovenia, September 2012. Proceedings, volume 7442 of Lect. Notes Comput. Sci., pages 224–235. Springer Berlin Heidelberg, 2012.

[50]

M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J. Comput., 2(1):60–66, 1973.

[51]

A. Poteaux and É. Schost. Modular composition modulo triangular sets and applications. Comput. Complex., 22(3):463–516, 2013.

[52]

A. Poteaux and É. Schost. On the complexity of computing with zero-dimensional triangular sets. J. Symbolic Comput., 50:110–138, 2013.

[53]

J. F. Ritt. Differential algebra. Amer. Math. Soc., New York, 1950.

[54]

A. Schönhage. The fundamental theorem of algebra in terms of computational complexity. Technical report, Math. Inst. Univ. of Tübingen, 1982.

[55]

J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. J. ACM, 27(4):701–717, 1980.

[56]

V. Shoup. Fast construction of irreducible polynomials over finite fields. J. Symbolic Comput., 17(5):371–391, 1994.

[57]

R. Zippel. Probabilistic algorithms for sparse polynomials. In Edward W. Ng, editor, Symbolic and Algebraic Computation. EUROSAM '79, An International Symposium on Symbolic and Algebraic Manipulation, Marseille, France, June 1979, number 72 in Lect. Notes Comput. Sci., pages 216–226. Springer Berlin Heidelberg, 1979.