
Modular composition is the problem to compose two univariate polynomials
modulo a third one. For polynomials with coefficients in a finite field,
Kedlaya and Umans proved in 2008 that the theoretical bit complexity for
performing this task could be made arbitrarily close to linear.
Unfortunately, beyond its major theoretical impact, this result has not
led to practically faster implementations yet. In this paper, we study
the more specific case of composition modulo the power of a polynomial.
First we extend previously known algorithms for power series composition
to this context. We next present a fast direct reduction of our problem
to power series composition.
Preprint version, April
30, 2017
Modular composition; Series Composition; Algorithm; Complexity
Let be an effective field and let be polynomials in of degrees . The problem of modular composition is to compute modulo . This is a fundamental problem in complexity theory because of its applications to polynomial factorization [16, 17, 18].
From a theoretical point of view, Kedlaya and Umans achieved a major breakthrough by showing that modular compositions over a finite field can be computed in time . Unfortunately, the practical impact of these results has been limited so far. The best current implementations still admit nearly quadratic complexities in . More precisely, over a field , denoting by the cost to multiply two polynomials in of degrees , Brent and Kung [3] gave an algorithm with cost . It uses the babystep giantstep technique due to Paterson–Stockmeyer [21] and even yields a subquadratic cost when using fast linear algebra (see [15, p. 185] and our Theorem 2 below).
In the special case when , composition modulo corresponds to the composition of truncated power series at order . Better practical algorithms are known for this particular situation. Brent and Kung gave an algorithm [3] with complexity under the conditions that is invertible and that the characteristic is at least . A variant proposed by van der Hoeven [11, Section 3.4.3] removes the condition on . For fields of small positive characteristic, Bernstein [1] proposed an algorithm that is softly linear in the precision but linear in the characteristic. Series with integer, rational or floating point coefficients can often be composed in quasilinear time as well in suitable bit complexity models, as shown by Ritzmann [22]; see also [12].
In this paper, we focus on the intermediate case between power series and general modular compositions. More precisely, we assume to be a power of some polynomial . Composition modulo can also be regarded as a special “base” case of composition modulo a polynomial that can be factored into . Using Chinese remaindering, one may reduce composition modulo such to compositions modulo powers of irreducible polynomials. In a separate paper [13], we exploit this observation for the design of fast practical (although not fully general) algorithms for modular composition over finite fields.
We develop two main approaches for fast composition modulo . In section 3, we generalize Brent and Kung's and Bernstein's algorithms. We achieve a complexity exponent close to , but only under the condition that is much larger than . In section 4, we directly reduce composition modulo to truncated power series composition. Our main result is Theorem 11: one composition modulo is essentially reduced to compositions modulo and one power series composition in . An important ingredient of this reduction is a softly optimal algorithm for converting between and where is sent to . Potential applications of these conversions might also concern algebraic algorithms for plane curves such as integral basis computations and Puiseux expansions, which usually require several Taylor expansions at roots of discriminants.
Our paper is organized as follows. In section 2, we
introduce basic notations and recall several well known complexity
results. We also specify the complexity model that we work with. Further
technical complexity results are gathered in the appendix. In sections
3 and 4, we present our main algorithms. For
testing purposes, our algorithms are implemented in the
Recall that an effective ring is a ring with unity whose elements can be represented on a computer and such that we have algorithms for performing the ring operations. Effective fields and effective algebras over an effective ring are defined similarly.
Given an effective ring , algebraic complexity models express running times in terms of the number of ring operations in . Unless otherwise stated, we will analyze the costs of the algorithms in this paper in this way. More precisely, our results both apply for the straightline program and computation tree models [4, Chapter 4].
Let be an effective ring with unity, let , and denote
We write for a cost function such that two polynomials in can be multiplied using operations in . It will be convenient to assume that is a nondecreasing function in . Notice that this assumption implies the superadditivity of , namely for all and .
The schoolbook algorithm allows us to take . The fastest currently known algorithm [5] yields . If is a field of finite characteristic, then we even have , where [10].
The constant represents a feasible exponent for the multiplication cost of matrices: two square matrices of size can be multiplied using operations in their coefficient ring. The constant is defined similarly but for multiplying a matrix by a rectangular one. The best known bound is due to Le Gall [19]. This naturally yields . However the latter bound does not improve upon the earlier bound due to Huang and Pan [14, Theorem 10.1].
If and are polynomials in , then represents the remainder in the division of by . Let be an effective ring and be an effective algebra. We introduce the following cost functions:
: the cost of computing the power series composition , where .
: the cost of computing the power series composition in terms of operations in , where .
: the cost of computing the modular composition , where is a monic polynomial of degree and .
: the cost of computing the characteristic polynomial of modulo a monic polynomial of degree . This is the characteristic polynomial of the multiplication endomorphism by in .
: the cost of modular power projections, i.e. the cost to compute , where is monic of degree and is a linear form on .
Let us recall a few known results about these functions. For , we denote and .
operations in , or
or operations in .
For a fixed monic polynomial in of degree , the modular composition is a linear operation in , for and in . The corresponding transposed application is precisely the operation of modular power projections: it corresponds to computing , where is a linear form on . If a modular composition algorithm with cost can be transposed in the sense of [2], then this leads to a power projection algorithm with cost .
operations in , including divisions in (the partial division in is supposed to be implemented), or
operations in , if is a field, or
operations in , if is a field with elements, or
operations in , if there exist given inverses of in .
For the particular kind of moduli that interest us in here, the problem of computing characteristic polynomials quite easily reduces to the case when :
From now on, we assume the modulus to be the th power of a polynomial of degree . In this section we extend the two fast known algorithms for power series composition from Theorem 1 to composition modulo . We directly state and prove the generalized algorithms. For more explanations we refer to the original papers [1, 3] and [11, Section 3.4.3].
If has positive characteristic , then Bernstein's algorithm [1] exploits the Frobenius map in order to recursively reduce the degrees of the polynomials to be composed by a factor .
Algorithm
If , then return , computed by a general modular composition algorithm of cost .
Split into polynomials such that . Let and , be such that and .
Recursively compute for all .
Return .
which ensures the correctness of the algorithm.
In step 2 we need: operations in to compute all the necessary th powers, for , and for the division to obtain . In step 4, by [8, Exercise 9.16], the computation of for may be done in time
The remaining calculations of step 4 amount to operations in .
Let be the time complexity for precision and let be such that . By what precedes, we have
for all and a sufficiently large constant . Unrolling this inequality times, we obtain
Using and the superadditivity of , we conclude that
We now extend Brent and Kung's series composition algorithm to composition modulo . This method uses Taylor series expansion in order to reduce the degrees of the polynomials to be composed. For a given polynomial , we write for its th derivative. Algorithm 2 is a subalgorithm that is used when the arguments of the composition both have small degree.
Algorithm
If then return .
Split with and .
Recursively compute for ,
Return .
Algorithm
Compute and .
If then compute the valuation of in and the modular inverse .
Compute using Algorithm 2.
If then compute for all from to . Otherwise, for from to , compute .
Return .
which makes use of and . If then holds for all , so the algorithm is correct. Otherwise we may write with and of degree , which implies . Since and separable, it follows that , whence is well defined.
Let us show that , by induction on . This is clear for . Assume that the identity holds for some . From we see that the valuation of in is necessarily and that . The induction hypothesis is thus satisfied for because . We are done with the correctness.
The adic expansion of takes operations and dominates the cost of steps 1 and 2. Steps 4 and 5 amount to operations. As to step 3, we enter Algorithm 2 with .
Let be the cost function of Algorithm 2. The degrees of , and are , so that
for some constant . Let be the largest integer such that . Unrolling times the inequality for and , we obtain
In a similar way we obtain . Consequently, . The conclusion follows from .
Notice that for fixed , this corollary gives a cost , independently of the characteristic.
Recall that the modulus is the th power of a polynomial of degree . From now on, we assume that is separable. So far, our algorithms for composition modulo relied on extensions of known algorithms dedicated to power series. In this section, we investigate the other natural approach: we design algorithms that reduce one composition modulo to one series composition of order over a suitable extension of the ground field. We set and write for the residue class of in . The associated trace function is denoted by . This function naturally extends to a map in a coefficientwise fashion.
The first method we study is rather elementary. It works well for any characteristic, but it is only efficient when is much smaller than . We compute for all roots of and recover by Chinese remaindering. The simultaneous computation with all roots of is emulated via standard algebraic techniques.
Since is not assumed to be irreducible, is not necessarily a field. We will compute inverses in using the technique of dynamic evaluation, which is briefly recalled in Appendix A.2. For convenience of the reader, we use the auxiliary variable for power series and keep the variable for polynomials.
Algorithm
Compute in .
Compute in .
Perform the power series composition
in .
Compute in .
With the algorithm underlying Corollary 20, compute the modular inverse .
Compute
and then .
Return .
operations in .
so equals for all roots of . This proves the correctness of the algorithm thanks to the Chinese remainder theorem.
For the complexity analysis, step 3 amounts to operations in . In step 5, we use Corollary 20 to compute with operations in . Steps 1, 2, 4, and 6 require additional ring operations in . In step 7, the row matrix of may be computed with operations in by formula (20), so each trace takes operations in . Therefore step 7 amounts to operations in , which is negligible.
When is much smaller than , we may directly benefit from fast power series composition. But overall this does not improve much on Corollary 7. In the rest of this section, we aim at decreasing the dependency in in complexity bounds.
In order to improve on Algorithm 4, we shall use a better way to expand and modulo . For this purpose we introduce the algebra homomorphism
We regard and as conversions that we respectively call the untangling and tangling mappings.
Remark. The separability assumption on is necessary: if and , then we have .
Algorithm
Compute .
Let and compute the characteristic polynomial of modulo .
Compute .
Write , where is the class of in , and the have degrees . For each , compute and let .
Perform the power series composition in .
Return .
From now on, in cost analyses, the expression (resp. ) represents a cost function for computing the direct image (resp. the preimage) of . If good algorithms are known for and , for characteristic polynomials and for compositions modulo , then we observe that Algorithm 5 boils down to one power series composition at precision over .
operations in .
The rest of this section is devoted to algorithms with softly linear costs for computing and its inverse. More precisely, we will prove that we may take and to be in the latter proposition. This leads to our main theorem:
operations in .
The following lemma is elementary, but we shall refer to it several times.
We begin with the easiest situation when the characteristic of is zero or sufficiently large and achieve softly linear time for .
Algorithm
If then return , otherwise let .
Recursively apply the algorithm to , giving .
Recursively apply the algorithm to , giving .
Return .
In order to handle the case , we need to pay attention to vanishing coefficients when computing Taylor expansions using higher order derivatives. For this, we rely on the following adaptation of the Taylor expansion of :
The polynomial is called the th order Hasse derivative of . The operator is linear and we may compute it as follows: if is a monomial and , then ; otherwise
Now if the characteristic divides the integer , then it also divides , and the lemma remains correct.
It is convenient to also introduce the Pochhammer symbol
The next lemma expresses conditions to compute Hasse derivatives from ones of lower orders, in positive characteristic.
(1) 
for all . In positive characteristic, we claim that the ratios have nonnegative valuations in .
Let us first consider the case when . If , with , then . If , then . This shows that is well defined and may be computed in whenever is a multiple of .
At a second stage, we make the induction hypothesis that has been computed in for such that . The value for may then be obtained via
since . This deals with the case when .
Finally, for any , the ratio is also well defined since it rewrites into
which concludes the proof.
We are now in a position to adapt the “divide and conquer” Algorithm 6 to the case when has small positive characteristic .
Algorithm
If , then return .
Let .
Compute .
By using formula (1), compute from .
Recursively apply the algorithm to , , , , , and . This yields for all .
Recursively apply the algorithm to , , , , , , and . This yields , for all .
Return , for all .
Algorithm
If , then use Algorithm 6 to compute and return .
Let be minimal with , let , and .
Compute for all using Algorithm 7 (called with , , and ).
If , then compute from with formula (1).
For all , use the algorithm recursively with in order to obtain .
If then use the algorithm recursively with in order to obtain .
Return .
operations in .
Now we suppose that . We have , so is a power of , we have , and the assumption is preserved through recursive calls. The conditions of Algorithm 7 are satisfied, whence the correctness of step 3 by Proposition 16. Step 4 is a consequence of Lemmas 12 and 15. We are done with the correctness.
Step 1 costs by Proposition 13. Step 3 costs by Proposition 16 and step 4 takes operations in . The total cost function of the algorithm satisfies
where is a sufficiently large constant and when . This leads to .
The next algorithm is independent of the characteristic. It relies on the untangling algorithms from the previous subsections, by using the “divide and conquer” strategy. For a polynomial or series in and , we denote and . For convenience, we assume that is a nondecreasing function of .
Algorithm
If , then let be the preimage of in and return .
Let and compute , .
Return .
Kronecker substitution [8, Chapter 8, Section 4] allows us to multiply two series in at precision using operations in .
The computation of requires one inversion in that takes operations in , together with operations for the series inversion. The other amount to more operations in . The cost function satisfies
for some constant , which leads to . The final bound of the lemma follows from Proposition 17.
Let be a monic polynomial in of degree , let be a linear form on , and let . We study the modular power projection problem, which corresponds to computing .
Let represent . If we take , where denotes the usual trace function of over , then the latter power projections write as
It is well known that these traces are related to the characteristic polynomial of modulo by the Newton–Girard identities:
where is the reverse polynomial of . If there exist given inverses of in , then it is possible to integrate the latter differential equation in with operations in , which directly leads to an algorithm to compute from the power sums of the roots of (see for instance [9, Section 2] for details, which also cover the case when is a finite field).
First, let us explain how to compute efficiently. Recall that . In the basis , the linear form may be computed as
which uses operations in . If represents the multiplication endomorphism by in , then we can write and therefore . By transposing the modular multiplication by with the techniques from [2, Sections 4 and 5], the computation of requires operations in .
Let be a field. Often in computer algebra, we are led to compute in monogen algebras such as , where is separable of degree , but not necessarily irreducible. Ring operations in require operations in . Algorithms such as the determinant are often slower over rings than their counterparts over fields. For this reason, we often like to conduct our computations as if were a field. From the programming point of view, this is easy to achieve: whenever we need to invert an element or test whether , we factor , where and is invertible modulo . We then restart the relevant part of the computations with and/or in the role of . This approach is known as the dynamic evaluation principle in computer algebra [6, 7]. We borrow the following complexity result for extended gcds from Dahan et al. [6]:
[1] D. J. Bernstein. Composing power series over a finite ring in essentially linear time. J. Symbolic Comput., 26(3):339–341, 1998.
[2] A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Hoon Hong, editor, Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, ISSAC '03, pages 37–44, New York, NY, USA, 2003. ACM.
[3] R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
[4] P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic complexity theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. SpringerVerlag, 1997.
[5] D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.
[6] X. Dahan, M. Moreno Maza, É. Schost, and Yuzhen Xie. On the complexity of the D5 principle. In J.G. Dumas, editor, Proceedings of Transgressive Computing 2006: a conference in honor of Jean Della Dora, pages 149–168. U. J. Fourier, Grenoble, France, 2006.
[7] J. Della Dora, C. Dicrescenzo, and D. Duval. A new method for computing in algebraic number fields. In G. Goos and J. Hartmanis, editors, Eurocal'85 (2), volume 174 of Lect. Notes in Comp. Science, pages 321–326. Springer, 1985.
[8] J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 2 edition, 2003.
[9] B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Appl. Alg. Eng. Comm. Comp., 27(3):237–257, 2016.
[10] D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. J. ACM, 63(6), 2017. Article 52.
[11] J. van der Hoeven. Relax, but don't be too lazy. J. Symbolic Comput., 34(6):479–542, 2002.
[12] J. van der Hoeven. Fast composition of numeric power series. Technical Report 200809, Université ParisSud, Orsay, France, 2008.
[13] J. van der Hoeven and G. Lecerf. Modular composition via factorization. Technical report, CNRS & École polytechnique, 2017. http://hal.archivesouvertes.fr/hal01457074.
[14] Xiaohan Huang and V. Y. Pan. Fast rectangular matrix multiplication and applications. J. Complexity, 14(2):257–299, 1998.
[15] E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188, New York, NY, USA, 1997. ACM.
[16] E. Kaltofen and V. Shoup. Subquadratictime factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.
[17] K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In FOCS'08: IEEE Conference on Foundations of Computer Science, pages 146–155, Washington, DC, USA, 2008. IEEE Computer Society.
[18] K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
[19] F. Le Gall. Powers of tensors and fast matrix multiplication. In Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation, ISSAC '14, pages 296–303, New York, NY, USA, 2014. ACM.
[20] G. Lecerf. On the complexity of the LickteigRoy subresultant algorithm. Technical report, CNRS & École polytechnique, 2017. https://hal.archivesouvertes.fr/hal01450869.
[21] M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J.Comput., 2(1):60–66, 1973.
[22] P. Ritzmann. A fast numerical algorithm for the composition of power series with complex coefficients. Theoret. Comput. Sci., 44:1–16, 1986.