Composition modulo powers of polynomials

 Joris van der Hoeven and Grégoire Lecerf

 Laboratoire d'informatique de l'École polytechnique, CNRS UMR 7161 École polytechnique 91128 Palaiseau Cedex, France

 {vdhoeven,lecerf}@lix.polytechnique.fr

## Abstract

Modular composition is the problem to compose two univariate polynomials modulo a third one. For polynomials with coefficients in a finite field, Kedlaya and Umans proved in 2008 that the theoretical bit complexity for performing this task could be made arbitrarily close to linear. Unfortunately, beyond its major theoretical impact, this result has not led to practically faster implementations yet. In this paper, we study the more specific case of composition modulo the power of a polynomial. First we extend previously known algorithms for power series composition to this context. We next present a fast direct reduction of our problem to power series composition.

Preprint version, April 30, 2017

### Keywords

Modular composition; Series Composition; Algorithm; Complexity

## 1.Introduction

Let be an effective field and let be polynomials in of degrees . The problem of modular composition is to compute modulo . This is a fundamental problem in complexity theory because of its applications to polynomial factorization [16, 17, 18].

From a theoretical point of view, Kedlaya and Umans achieved a major breakthrough by showing that modular compositions over a finite field can be computed in time . Unfortunately, the practical impact of these results has been limited so far. The best current implementations still admit nearly quadratic complexities in . More precisely, over a field , denoting by the cost to multiply two polynomials in of degrees , Brent and Kung [3] gave an algorithm with cost . It uses the baby-step giant-step technique due to Paterson–Stockmeyer [21] and even yields a sub-quadratic cost when using fast linear algebra (see [15, p. 185] and our Theorem 2 below).

In the special case when , composition modulo corresponds to the composition of truncated power series at order . Better practical algorithms are known for this particular situation. Brent and Kung gave an algorithm [3] with complexity under the conditions that is invertible and that the characteristic is at least . A variant proposed by van der Hoeven [11, Section 3.4.3] removes the condition on . For fields of small positive characteristic, Bernstein [1] proposed an algorithm that is softly linear in the precision but linear in the characteristic. Series with integer, rational or floating point coefficients can often be composed in quasi-linear time as well in suitable bit complexity models, as shown by Ritzmann [22]; see also [12].

In this paper, we focus on the intermediate case between power series and general modular compositions. More precisely, we assume to be a power of some polynomial . Composition modulo can also be regarded as a special “base” case of composition modulo a polynomial that can be factored into . Using Chinese remaindering, one may reduce composition modulo such to compositions modulo powers of irreducible polynomials. In a separate paper [13], we exploit this observation for the design of fast practical (although not fully general) algorithms for modular composition over finite fields.

We develop two main approaches for fast composition modulo . In section 3, we generalize Brent and Kung's and Bernstein's algorithms. We achieve a complexity exponent close to , but only under the condition that is much larger than . In section 4, we directly reduce composition modulo to truncated power series composition. Our main result is Theorem 11: one composition modulo is essentially reduced to compositions modulo and one power series composition in . An important ingredient of this reduction is a softly optimal algorithm for converting between and where is sent to . Potential applications of these conversions might also concern algebraic algorithms for plane curves such as integral basis computations and Puiseux expansions, which usually require several Taylor expansions at roots of discriminants.

Our paper is organized as follows. In section 2, we introduce basic notations and recall several well known complexity results. We also specify the complexity model that we work with. Further technical complexity results are gathered in the appendix. In sections 3 and 4, we present our main algorithms. For testing purposes, our algorithms are implemented in the Mathemagix language within the factorix library (www.mathemagix.org, revision 10538).

## 2.Preliminaries

### 2.1Costs of elementary operations

Recall that an effective ring is a ring with unity whose elements can be represented on a computer and such that we have algorithms for performing the ring operations. Effective fields and effective algebras over an effective ring are defined similarly.

Given an effective ring , algebraic complexity models express running times in terms of the number of ring operations in . Unless otherwise stated, we will analyze the costs of the algorithms in this paper in this way. More precisely, our results both apply for the straight-line program and computation tree models [4, Chapter 4].

Let be an effective ring with unity, let , and denote

We write for a cost function such that two polynomials in can be multiplied using operations in . It will be convenient to assume that is a nondecreasing function in . Notice that this assumption implies the super-additivity of , namely for all and .

The schoolbook algorithm allows us to take . The fastest currently known algorithm [5] yields . If is a field of finite characteristic, then we even have , where [10].

The constant represents a feasible exponent for the multiplication cost of matrices: two square matrices of size can be multiplied using operations in their coefficient ring. The constant is defined similarly but for multiplying a matrix by a rectangular one. The best known bound is due to Le Gall [19]. This naturally yields . However the latter bound does not improve upon the earlier bound due to Huang and Pan [14, Theorem 10.1].

### 2.2Problems related to modular composition

If and are polynomials in , then represents the remainder in the division of by . Let be an effective ring and be an effective -algebra. We introduce the following cost functions:

• : the cost of computing the power series composition , where .

• : the cost of computing the power series composition in terms of operations in , where .

• : the cost of computing the modular composition , where is a monic polynomial of degree and .

• : the cost of computing the characteristic polynomial of modulo a monic polynomial of degree . This is the characteristic polynomial of the multiplication endomorphism by in .

• : the cost of modular power projections, i.e. the cost to compute , where is monic of degree and is a linear form on .

Let us recall a few known results about these functions. For , we denote and .

Theorem 1. Let be a field of characteristic .

1. [3] If or , then we may take .

2. [1] If , then .

Theorem 2. Let be a monic polynomial of degree over a ring and let . The composed polynomial may be computed with

1. operations in , or

2. or operations in .

Proof. The first bound is immediate. The proof of the second bound is detailed in [8, Section 12.2]. The bound follows from by taking .

For a fixed monic polynomial in of degree , the modular composition is a linear operation in , for and in . The corresponding transposed application is precisely the operation of modular power projections: it corresponds to computing , where is a linear form on . If a modular composition algorithm with cost can be transposed in the sense of [2], then this leads to a power projection algorithm with cost .

Theorem 3. Let be a monic polynomial of degree over a ring and let . The characteristic polynomial of modulo can be computed using

1. operations in , including divisions in (the partial division in is supposed to be implemented), or

2. operations in , if is a field, or

3. operations in , if is a field with elements, or

4. operations in , if there exist given inverses of in .

Proof. The first two bounds directly rely on the formula by using [20, Corollary 29]. The third bound is obtained by computing for values of in , from which is interpolated using additional operations in . We refer to Appendix A.1 for the fourth bound.

For the particular kind of moduli that interest us in here, the problem of computing characteristic polynomials quite easily reduces to the case when :

Proposition 4. Let be of degree in , let with , and let be in . Then the characteristic polynomial of modulo can be computed using operations in .

Proof. The characteristic polynomial is simply the -th power of the characteristic polynomial of modulo .

## 3.Extending known algorithms

From now on, we assume the modulus to be the -th power of a polynomial of degree . In this section we extend the two fast known algorithms for power series composition from Theorem 1 to composition modulo . We directly state and prove the generalized algorithms. For more explanations we refer to the original papers [1, 3] and [11, Section 3.4.3].

### 3.1Extension of Bernstein's algorithm

If has positive characteristic , then Bernstein's algorithm [1] exploits the Frobenius map in order to recursively reduce the degrees of the polynomials to be composed by a factor .

Algorithm 1

Input
, , where .

Output

.

Assumptions

, , .

1. If , then return , computed by a general modular composition algorithm of cost .

2. Split into polynomials such that . Let and , be such that and .

3. Recursively compute for all .

4. Return .

Proposition 5. Algorithm 1 is correct and takes operations in .

Proof. Since , the degrees of and the are at most . Now implies

which ensures the correctness of the algorithm.

In step 2 we need: operations in to compute all the necessary -th powers, for , and for the division to obtain . In step 4, by [8, Exercise 9.16], the computation of for may be done in time

The remaining calculations of step 4 amount to operations in .

Let be the time complexity for precision and let be such that . By what precedes, we have

for all and a sufficiently large constant . Unrolling this inequality times, we obtain

Using and the super-additivity of , we conclude that

### 3.2Extension of Brent and Kung's algorithm

We now extend Brent and Kung's series composition algorithm to composition modulo . This method uses Taylor series expansion in order to reduce the degrees of the polynomials to be composed. For a given polynomial , we write for its -th derivative. Algorithm 2 is a sub-algorithm that is used when the arguments of the composition both have small degree.

Algorithm 2

Input
, where has degree .

Output

.

1. If then return .

2. Split with and .

3. Recursively compute for ,

4. Return .

Algorithm 3

Input
, , where .

Output

.

Assumption

has characteristic , where and ; , is separable and irreducible, .

1. Compute and .

2. If then compute the valuation of in and the modular inverse .

3. Compute using Algorithm 2.

4. If then compute for all from to . Otherwise, for from to , compute .

5. Return .

Proposition 6. Algorithm 3 is correct and takes operations in if .

Proof. The main ingredient of the proof is the following Taylor expansion:

which makes use of and . If then holds for all , so the algorithm is correct. Otherwise we may write with and of degree , which implies . Since and separable, it follows that , whence is well defined.

Let us show that , by induction on . This is clear for . Assume that the identity holds for some . From we see that the valuation of in is necessarily and that . The induction hypothesis is thus satisfied for because . We are done with the correctness.

The -adic expansion of takes operations and dominates the cost of steps 1 and 2. Steps 4 and 5 amount to operations. As to step 3, we enter Algorithm 2 with .

Let be the cost function of Algorithm 2. The degrees of , and are , so that

for some constant . Let be the largest integer such that . Unrolling times the inequality for and , we obtain

In a similar way we obtain . Consequently, . The conclusion follows from .

Corollary 7. Let be of degree in , let be of degree , and let be in . If , then the modular composition may be computed using operations in .

Proof. If , then we may use Algorithm 1, which requires operations in . Otherwise, we use Algorithm 3.

Notice that for fixed , this corollary gives a cost , independently of the characteristic.

## 4.Reduction to series composition

Recall that the modulus is the -th power of a polynomial of degree . From now on, we assume that is separable. So far, our algorithms for composition modulo relied on extensions of known algorithms dedicated to power series. In this section, we investigate the other natural approach: we design algorithms that reduce one composition modulo to one series composition of order over a suitable extension of the ground field. We set and write for the residue class of in . The associated trace function is denoted by . This function naturally extends to a map in a coefficientwise fashion.

### 4.1Case of small moduli

The first method we study is rather elementary. It works well for any characteristic, but it is only efficient when is much smaller than . We compute for all roots of and recover by Chinese remaindering. The simultaneous computation with all roots of is emulated via standard algebraic techniques.

Since is not assumed to be irreducible, is not necessarily a field. We will compute inverses in using the technique of dynamic evaluation, which is briefly recalled in Appendix A.2. For convenience of the reader, we use the auxiliary variable for power series and keep the variable for polynomials.

Algorithm 4

Input
, , where .

Output

.

Assumptions

, is separable, .

1. Compute in .

2. Compute in .

3. Perform the power series composition

in .

4. Compute in .

5. With the algorithm underlying Corollary 20, compute the modular inverse .

6. Compute

and then .

7. Return .

Proposition 8. Algorithm 4 is correct and takes

operations in .

Proof. For a root of in some algebraic closure of , we write for the homomorphism from to that sends to . This homomorphism extends coefficientwise into a map . One verifies that equals

so equals for all roots of . This proves the correctness of the algorithm thanks to the Chinese remainder theorem.

For the complexity analysis, step 3 amounts to operations in . In step 5, we use Corollary 20 to compute with operations in . Steps 1, 2, 4, and 6 require additional ring operations in . In step 7, the row matrix of may be computed with operations in by formula (20), so each trace takes operations in . Therefore step 7 amounts to operations in , which is negligible.

When is much smaller than , we may directly benefit from fast power series composition. But overall this does not improve much on Corollary 7. In the rest of this section, we aim at decreasing the dependency in in complexity bounds.

### 4.2Fast reduction to series composition

In order to improve on Algorithm 4, we shall use a better way to expand and modulo . For this purpose we introduce the -algebra homomorphism

We regard and as conversions that we respectively call the untangling and tangling mappings.

Lemma 9. For all separable polynomial and all , the map is a -algebra isomorphism.

Proof. is clearly a homomorphism between -algebras of equal dimensions, so we need to prove its injectivity when is separable. We consider the -adic expansion of modulo , where all the have degrees , and we assume that . It is immediate that . We may thus suppose, by induction on , that for all . From we obtain . Since is invertible, it follows that , whence . This completes the induction.

Remark. The separability assumption on is necessary: if and , then we have .

Algorithm 5

Input
, , where .

Output

.

Assumptions

, is separable, .

1. Compute .

2. Let and compute the characteristic polynomial of modulo .

3. Compute .

4. Write , where is the class of in , and the have degrees . For each , compute and let .

5. Perform the power series composition in .

6. Return .

From now on, in cost analyses, the expression (resp. ) represents a cost function for computing the direct image (resp. the preimage) of . If good algorithms are known for and , for characteristic polynomials and for compositions modulo , then we observe that Algorithm 5 boils down to one power series composition at precision over .

Proposition 10. Algorithm 5 is correct and it requires

operations in .

Proof. The homomorphism that sends to is well defined, so applying it to we obtain . Therefore we have , which ensures the correctness of the algorithm. The cost directly follows from the definitions.

The rest of this section is devoted to algorithms with softly linear costs for computing and its inverse. More precisely, we will prove that we may take and to be in the latter proposition. This leads to our main theorem:

Theorem 11. Let and , where , such that is separable of degree . Then, we can compute using

operations in .

### 4.3Untangling in characteristic zero

The following lemma is elementary, but we shall refer to it several times.

Lemma 12. Given integers and a polynomial , we have .

Proof. We have by Leibnitz formula, showing that is divisible by .

We begin with the easiest situation when the characteristic of is zero or sufficiently large and achieve softly linear time for .

Algorithm 6

Input
, .

Output

.

Assumptions

, is separable, .

1. If then return , otherwise let .

2. Recursively apply the algorithm to , giving .

3. Recursively apply the algorithm to , giving .

4. Return .

Proposition 13. Algorithm 6 is correct and takes operations in . If or , then, given , one may compute using operations in .

Proof. The correctness follows from Lemma 12 which ensures that for all and for all . The cost analysis is straightforward. Finally, to deduce , thanks to the assumption on the characteristic, we may simply use the Taylor expansion .

### 4.4Untangling in positive characteristic

In order to handle the case , we need to pay attention to vanishing coefficients when computing Taylor expansions using higher order derivatives. For this, we rely on the following adaptation of the Taylor expansion of :

The polynomial is called the -th order Hasse derivative of . The operator is linear and we may compute it as follows: if is a monomial and , then ; otherwise

Lemma 14. For all integers and , we have .

Proof. Let . A straightforward calculation in characteristic zero yields

Now if the characteristic divides the integer , then it also divides , and the lemma remains correct.

It is convenient to also introduce the Pochhammer symbol

The next lemma expresses conditions to compute Hasse derivatives from ones of lower orders, in positive characteristic.

Lemma 15. Let be an integer power of , let be an integer such that , and let be two integers in . Then, for any polynomial , we may compute from using the formula

 (1)

Proof. The previous lemma implies that . Now in characteristic zero, we have

for all . In positive characteristic, we claim that the ratios have non-negative valuations in .

Let us first consider the case when . If , with , then . If , then . This shows that is well defined and may be computed in whenever is a multiple of .

At a second stage, we make the induction hypothesis that has been computed in for such that . The value for may then be obtained via

since . This deals with the case when .

Finally, for any , the ratio is also well defined since it rewrites into

which concludes the proof.

We are now in a position to adapt the “divide and conquer” Algorithm 6 to the case when has small positive characteristic .

Algorithm 7

Input
of degree ; integers ; .

Output

, for all .

Assumption

has positive characteristic ; is a power of , ; ; .

1. If , then return .

2. Let .

3. Compute .

4. By using formula (1), compute from .

5. Recursively apply the algorithm to , , , , , and . This yields for all .

6. Recursively apply the algorithm to , , , , , , and . This yields , for all .

7. Return , for all .

Proposition 16. Algorithm 7 is correct and takes operations in .

Proof. The correctness follows from Lemmas 12 and 15; the cost analysis is straightforward.

Algorithm 8

Input
with , ; an integer ; .

Output

.

Assumptions

has positive characteristic ; ; ; .

1. If , then use Algorithm 6 to compute and return .

2. Let be minimal with , let , and .

3. Compute for all using Algorithm 7 (called with , , and ).

4. If , then compute from with formula (1).

5. For all , use the algorithm recursively with in order to obtain .

6. If then use the algorithm recursively with in order to obtain .

7. Return .

Proposition 17. Algorithm 8 is correct and takes

operations in .

Proof. Let us first examine the case when . We need to show that coincides with for all . From Lemma 14, we have . Since , it follows that .

Now we suppose that . We have , so is a power of , we have , and the assumption is preserved through recursive calls. The conditions of Algorithm 7 are satisfied, whence the correctness of step 3 by Proposition 16. Step 4 is a consequence of Lemmas 12 and 15. We are done with the correctness.

Step 1 costs by Proposition 13. Step 3 costs by Proposition 16 and step 4 takes operations in . The total cost function of the algorithm satisfies

where is a sufficiently large constant and when . This leads to .

### 4.5Tangling

The next algorithm is independent of the characteristic. It relies on the untangling algorithms from the previous subsections, by using the “divide and conquer” strategy. For a polynomial or series in and , we denote and . For convenience, we assume that is a nondecreasing function of .

Algorithm 9

Input
of degree ; an integer ; , where ; for where .

Output

.

1. If , then let be the preimage of in and return .

2. Let and compute , .

3. Return .

Lemma 18. Algorithm 9 is correct and takes time . In particular, for any , we have

Proof. The computation is clear when . Otherwise we verify that

Kronecker substitution [8, Chapter 8, Section 4] allows us to multiply two series in at precision using operations in .

The computation of requires one inversion in that takes operations in , together with operations for the series inversion. The other amount to more operations in . The cost function satisfies

for some constant , which leads to . The final bound of the lemma follows from Proposition 17.

## Appendix A

### A.1Trace and Newton–Girard identities

Let be a monic polynomial in of degree , let be a linear form on , and let . We study the modular power projection problem, which corresponds to computing .

Let represent . If we take , where denotes the usual trace function of over , then the latter power projections write as

It is well known that these traces are related to the characteristic polynomial of modulo by the Newton–Girard identities:

where is the reverse polynomial of . If there exist given inverses of in , then it is possible to integrate the latter differential equation in with operations in , which directly leads to an algorithm to compute from the power sums of the roots of (see for instance [9, Section 2] for details, which also cover the case when is a finite field).

First, let us explain how to compute efficiently. Recall that . In the basis , the linear form may be computed as

which uses operations in . If represents the multiplication endomorphism by in , then we can write and therefore . By transposing the modular multiplication by with the techniques from [2, Sections 4 and 5], the computation of requires operations in .

### A.2Modular inversions

Let be a field. Often in computer algebra, we are led to compute in monogen algebras such as , where is separable of degree , but not necessarily irreducible. Ring operations in require operations in . Algorithms such as the determinant are often slower over rings than their counterparts over fields. For this reason, we often like to conduct our computations as if were a field. From the programming point of view, this is easy to achieve: whenever we need to invert an element or test whether , we factor , where and is invertible modulo . We then restart the relevant part of the computations with and/or in the role of . This approach is known as the dynamic evaluation principle in computer algebra [6, 7]. We borrow the following complexity result for extended gcds from Dahan et al. [6]:

Proposition 19. Let be a separable polynomial of degree over and let and be univariate polynomials over of degrees . Using operations in , one can compute a factorization of and triples of polynomials respectively over , such that , is monic, generates the extension of the ideal (,) to , and the Bézout relation holds with and , for all .

Proof. This statement rephrases [6, Propositions 2.4 and 4.1], where we replace the assumption “ is square-free over a perfect field” by “ is separable”. In fact all arguments of [6] apply in this setting mutatis mutandis. An alternative point of view is to apply [6, Propositions 2.4 and 4.1] over the algebraic closure of , while noticing that all actual computations are really done over .

Corollary 20. Let be a separable polynomial of degree over and let and be univariate polynomials over of degrees , such that is monic and is invertible modulo . Then the inverse of modulo may be computed using operations in .

Proof. The triples of the previous proposition satisfy over . We recover the inverse of modulo from the using the Chinese remainder theorem.

## 5.References

[1] D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.

[2] A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.

[3] R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.

[4] P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.

[5] D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.

[6] X. Dahan, M. Moreno Maza, É. Schost, and Yuzhen Xie. On the complexity of the D5 principle. In J.-G. Dumas, editor, Proceedings of Transgressive Computing 2006: a conference in honor of Jean Della Dora, pages 149–168. U. J. Fourier, Grenoble, France, 2006.

[7] J. Della Dora, C. Dicrescenzo, and D. Duval. A new method for computing in algebraic number fields. In G. Goos and J. Hartmanis, editors, Eurocal'85 (2), volume 174 of Lect. Notes in Comp. Science, pages 321–326. Springer, 1985.

[8] J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 2 edition, 2003.

[9] B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Alg. Eng. Comm. Comp., 27(3):237–257, 2016.

[10] D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.

[11] J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.

[12] J. van der Hoeven. Fast composition of numeric power series. Technical Report 2008-09, Université Paris-Sud, Orsay, France, 2008.

[13] J. van der Hoeven and G. Lecerf. Modular composition via factorization. Technical report, CNRS & École polytechnique, 2017. http://hal.archives-ouvertes.fr/hal-01457074.

[14] Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.

[15] E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.

[16] E. Kaltofen and V. Shoup. Subquadratic-time factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.

[17] K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.

[18] K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.

[19] F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 296–303, New York, NY, USA, 2014. ACM.

[20] G. Lecerf. On the complexity of the Lickteig-Roy subresultant algorithm. Technical report, CNRS & École polytechnique, 2017. https://hal.archives-ouvertes.fr/hal-01450869.

[21] M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.

[22] P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.