Fast composition of numeric power series

Joris van der Hoeven

CNRS, Département de Mathématiques

Bâtiment 425

Université Paris-Sud

91405 Orsay Cedex

France

Email: joris@texmacs.org

Web: http://www.math.u-psud.fr/~vdhoeven

November 3, 2023

. This work was partially supported by the ANR Gecko project.

Let and be two convergent power series in or , whose first terms are given numerically with a -bit precision for a fixed constant . Assuming that , we will show in this paper that the first coefficients of can be computed with a -bit precision in time . Using Newton iteration, a similar complexity bound holds for power series reversion of . Our method relies on fast multi-point evaluation, which will be recalled and further detailed for numeric polynomials. We also discuss relaxed variants of our algorithm.

Keywords: power series, composition, FFT, multi-point evaluation, algorithm

A.M.S. subject classification: 40-04, 42-04, 68W40

1.Introduction

Let be an effective ring of coefficients (i.e. we have algorithms for performing the ring operations). Let be two power series with , so that the composition is well-defined. We are interested in algorithms for fast composition: given and , how much arithmetic operations in are needed in order to compute ?

A first efficient general purpose algorithm of time complexity was given in [BK78, CT65, SS71]. Here denotes the complexity for the multiplication of two polynomials of degrees and we have [CK91]. In the case when is polynomial [BK78] or algebraic [vdH02], then this complexity further reduces to . For some very special series , there even exist algorithms; see [BSS08] for an overview. In positive characteristic , right composition can also be performed in quasi-linear time [Ber98].

In this paper, we are interested in efficient algorithms when is a ring of numbers, such as , or a ring of floating point numbers. In that case, we are interested in the bit complexity of the composition, which means that we also have to take into account the bit precision of the underlying integer arithmetic. In particular, we will denote by the time needed for multiplying two -bit integers. We have [SS71] and even [Für07], where satisfies . If all coefficients , and correspond to -bit numbers, then we will search for a composition algorithm which runs in quasi-linear time . Throughout the paper, we will assume that is increasing and .

In section 2, we start by reviewing multiplication and division of polynomials with numeric coefficients. Multiplication of two polynomials of degrees with -bit coefficients can be done in time using Kronecker's substitution [Kro82]. Division essentially requires a higher complexity [Sch82], due to the fact that we may lose bits of precision when dividing by a polynomial of degree . Nevertheless, we will see that a best possible complexity can be achieved in terms of the output precision.

In section 3, we will study multi-point evaluation of a polynomial at points in the case of numeric coefficients. Adapting the classical binary splitting algorithm [AHU74] of time complexity to the numeric context, we will prove the complexity bound . In the case when , this complexity bound is again non-optimal in many cases. It would be nice if a bound of the form could be achieved and we will present some ideas in this direction in section 3.2.

In section 4, we turn to the problem of numeric power series composition. Under the assumptions and , we first show that the computation of the first coefficients of a convergent power series is essentially equivalent to its evaluation at equally spaced points on a given circle (this fact was already used in [Sch82], although not stated as explicitly). The composition problem can therefore be reduced to one fast Fourier transform, one multi-point evaluation and one inverse Fourier transform, leading to a complexity . We first prove this result under certain normalization conditions for floating point coefficients. We next extend the result to different types of numeric coefficients (including integers and rationals) and also consider non-convergent series. The good complexity of these algorithms should not come as a full surprise: under the analogy numbersseries, it was already known [BK77] how to compose multivariate power series fast.

In the last section, we conclude by giving relaxed versions [vdH02] of our composition algorithm. This allows for the resolution of functional equations for numeric power series, involving composition. Unfortunately, we did not yet achieve a quasi-linear bound, even though some progress has been made on the exponent. The special case of power series reversion may be reduced more directly to composition, using Newton's method [BK75].

2.Fundamental operations

Let be a normed vector space. Given and , we say that is an -approximation of if . We will also say that is an approximation of with absolute error or an absolute precision of bits. For polynomials , we will use the norm , and notice that . For vectors , we will use the norm .

2.1.Multiplication

Given two polynomials with , , and , take . Then the product can be read off from the integer product , since the coefficients of the result all fit into bits. This method is called Kronecker multiplication [Kro82] and its cost is bounded by . Clearly, the method generalizes to the case when or . A direct consequence for the computation with floating point polynomials is the following:

Lemma 1. Let be two polynomials with . Given with and , we may compute a -approximation of in time .

Proof. Let and consider polynomials with

By assumption, the bit-sizes of the coefficients of and are bounded by . Therefore, we may compute the exact product in time , using Kronecker multiplication. We have

This proves that is a -approximation of . One may optionally truncate the mantissas of the coefficients of so as to fit into bits.

Remark 2. In order to increase readability, we will loosely use real and complex numbers as inputs of our algorithms. In practice, such inputs are really floating point numbers whose precisions should be clear from the context.

2.2.Inversion

Lemma 3. Let be such that for all and

Let and . Let be such that and . Given , we may compute a -approximation of in time .

Proof. Assume that we are given an approximation of . Then we may compute a better approximation using the classical Newton iteration

(1)

If is “small”, then

(2)

is about twice as “small”. We will apply the Newton iteration for a polynomial whose first coefficients are good approximations of and the remaining coefficients less good approximations.

Given a polynomial , we write

Assume that and , with . Setting and , we have

The relation (2) yields

whence

When starting with , we may take . If is sufficiently large, then one Newton iteration yields . Applying the Newton iteration once more for , we obtain . In view of lemma 1 and the assumption , the cost of two such Newton iterations is bounded by for a suitable constant .

We are now in a position to prove the lemma. Since an increase of by leaves the desired complexity bound unaltered, we may assume without loss of generality that . Then the cost of the inversion at order satisfies . We conclude that .

Remark 4. If only is given, then a constant for the bound is not known a priori. Assume that and write . Then and

Assuming , it follows that

We may thus take . Inversely, we have

In other words, our choice of is at most a factor two worse than the optimal value.

Remark 5. The following alternative algorithm for the approximation of is a variant of the method described in [Sch82, Section 4]:

Because of (5) below, we may always take and , which gives a complexity bound of the form . In fact the FFT of a numeric -bit polynomial of degree can be computed in time [Sch82, Section 3], which drops the bound further down to .

In practice, we may also start with and double until the computed approximation of satisfies the equation up to a sufficient precision. This leads to the same complexity bound as in the lemma.

It is not clear which of the two methods is most efficient: Newton's method performs a certain amount of recomputations, whereas the alternative method requires us to work at a sufficiently large degree for which (3) holds.

Given power series and , where , we will say that majorates and write if for all coefficients of . This notation applies in particular to polynomials.

Lemma 6. For and as in lemma 3, we may compute a -approximation of in time .

Proof. Then and , whence

(4)
(5)

We conclude by applying lemma 3 for and .

2.3.Euclidean division

The following result was first proved in [Sch82, Section 4].

Lemma 7. Let be polynomials with , and

for with for all . Consider the Euclidean division

with . Given , we may compute a -approximations of and in time .

Proof. Setting and , we may write

Setting , we then have

Our lemma now follows from lemmas 6 and 1.

Remark 8. Assuming that , , and , it is again possible to prove the improved estimate . However, the proof relies on Schönhage's original method, using the same doubling strategy as in remark 5. Indeed, when using Newton's method for inverting , nice bounds and do not necessarily imply a nice bound for , where is the truncated inverse of . Using a relaxed division algorithm [vdH02] for , one may still achieve the improved bound, up to an additional overhead.

For applications where Euclidean division is used as a subalgorithm, it is necessary to investigate the effect of small perturbations of the inputs and on the outputs and .

Lemma 9. Consider two Euclidean divisions

with similar hypotheses as in lemma 7. Then

Proof. With the notations of the proof of lemma 7, let and be the truncations of the power series and at order . Let us first consider the case when , so that

Then and

Let us next consider the case when . Then

whence and

In general, the successive application of the second and the first case yield

We have also seen in the proof of lemma 6 that , and .

Lemma 10. Let and be as in lemma 7, while allowing for the case when . Then

Proof. With the notations from the proof of lemma 7, we have

whence

and

3.Multi-point evaluation

Consider a complex polynomial

which has been normalized so that . Let

be pairwise distinct points with for all . The problem of multi-point evaluation is to find an efficient algorithm for the simultaneous evaluations of at . A -approximation of will also be called a -evaluation of at . In order to simplify our exposition, we will assume that is a power of two; this assumption only affects the complexity analysis by a constant factor.

3.1.The binary splitting algorithm

An efficient and classical algorithm for multi-point evaluation [AHU74] relies on binary splitting: let , . Denoting by the remainder of the Euclidean division of by , we compute

Then

In other words, we have reduced the original problem to two problems of the same type, but of size .

For the above algorithm to be fast, the partial products and are also computed using binary splitting, but in the opposite direction. In order to avoid recomputations, the partial products are stored in a binary tree of depth . At depth , we have nodes, labeled by polynomials , where

For instance, for , the tree is given by

We have

(6)

For a given polynomial of degree we may now compute

(7)

This second computation can be carried out in place. At the last stage, we obtain

More generally,

(8)

for all and .

Lemma 11. Let and be such that , and for all . Given , we may compute -evaluate at in time .

Proof. In the algorithm, let and assume that all multiplications (6) and all euclidean divisions (7) are computed up to an absolute error . Let and denote the results of these approximate computations. We have

It follows by induction that

(9)
(10)

From (8) and lemma 10, we have

(11)

In view of (7) and lemma 9, we also have

By induction over , we obtain

Using our choice of , we conclude that

Let us now estimate the complexity of the algorithm. In view of (9) and lemma 1, we may compute in time . In view of (11) and lemma 7, the computation of takes a time . For fixed , the computation of all and thus takes a time . Since there are stages, the time complexity of the complete algorithm is bounded by .

3.2.Low precision methods

If , then the bound from lemma 11 reduces to , which is not very satisfactory. Indeed, in the very special case when for , we may achieve the complexity , by using the fast Fourier transform. In general, we may strive for a bound of the form . We have not yet been able to obtain this complexity, but we will now describe some partial results in this direction. Throughout the section, we assume that .

Proposition 12. With the notations of proposition 11, assume that . Then we may -evaluate at in time .

Proof. Let , and for all . Each of the points is at the distance of at most to one of the points . We will compute using a Taylor series expansion of at . We first observe that

(12)

Using fast Fourier transforms of size , we first compute -approximations of for all and . This can be done in time

From (12) it follows that

The -evaluation of sums of the form using Horner's rule takes a time .

Proposition 13. With the notations of proposition 11, we may -evaluate at in time .

Proof. The proof of the above proposition adapts to the case when for all . Let us now subdivide the unit disk into annuli and the disk of radius . For a fixed annulus, we may evaluate at each of the inside the annulus in time . For , we have

provided that . Consequently, taking , we may evaluate at the remaining points in time . Under the condition , and up to logarithmic terms, the sum

becomes minimal for and .

Proposition 14. With the notations of proposition 11, assume that there exists an with and for all . Then we may -evaluate at in time .

Proof. We will only provide a sketch of the proof. As in the proof of proposition 12, we first compute the Taylor series expansion of at up to order . This task can be performed in time via a division of by followed by a Taylor shift. We next evaluate this expansion at for all . According to proposition 16, this can be done in time .

Proposition 15. Let be a fixed constant. With the notations of proposition 11, assume that for all . Then we may -evaluate at in time .

Proof. Again, we will only provide a sketch of the proof. The main idea is to use the binary splitting algorithm from the previous subsection. Let . If for all , then we obtain and the algorithm reduces to a fast Fourier transform. If each is a slight perturbation of , then becomes a slight perturbation of . In particular, the Taylor coefficients of only have a polynomial growth in . Consequently, the Euclidean division by accounts for a loss of at most instead of bits of precision. The binary splitting algorithm can therefore be carried out using a fixed precision of bits.

Sometimes, it is possible to decompose , such that a fast multi-point evaluation algorithm is available for each set of points . This leads to the question of evaluating at points.

Proposition 16. Let and be such that , and for all . Let be the time needed in order to compute -evaluations of polynomials with at . Then -evaluations of at can be computed in time .

Proof. Without loss of generality, we may assume that and are powers of two. We decompose as follows:

We may compute -evaluations of the at in time . We may also -approximate in time , using binary powering. Using Horner's rule, we may finally -evaluate

for in time .

4.Composition of power series

4.1.Evaluation of truncated power series

Let be a convergent power series with radius of convergence . Given , we denote

Using Cauchy's formula, we have

(13)

For with , it follows that

(14)

Let be the time needed to compute -approximations of . Given , let be the time needed to -evaluate at for .

Lemma 17.

  1. We have .

  2. We have .

Proof. We first consider the case when . Let be the smallest power of two with

(15)

For all , the bound (14) implies

(16)

The computation of -approximations of takes a time . The -evaluation of at primitive roots of unity can be done using fast Fourier transforms of size . This requires a time . If , we thus obtain the bound

The general case is reduced to the case via a change of variables . This requires computations to be carried out with an additional precision of bits, leading to the complexity bound in (a).

As to (b), we again start with the case when . Taking to be the smallest power of two with (15), we again obtain (16) for all . If , then we also notice that for all . We may -evaluate at the primitive -th roots of unity in time . We next retrieve the coefficients of the polynomial up to precision using one inverse fast Fourier transform of size . In the case when , we thus obtain

In general, replacing by yields the bound in (b).

4.2.Composition of power series

Let us now consider two convergent power series with , and . Then is well-defined and . In fact, the series , and still converge on a compact disc of radius . Let be such that , and . Then (14) becomes

(17)

and similarly for and . In view of lemma 17, it is natural to use an evaluation-interpolation scheme for the computation of -approximations of .

For a sufficiently large and , we evaluate on the -th roots of unity using one direct FFT on . Using the fact that for all and the algorithm for multi-point evaluation from the previous section, we next compute . Using one inverse FFT, we finally recover the coefficients of the polynomial with for all . Using the tail bound (17) for , , and a sufficiently large , the differences can be made as small as needed. More precisely, the algorithm goes as follows:

Algorithm compose

Input: with and ,

such that we have bounds , and

, and for certain

Output: -approximations for

Step 1. [Determine auxiliary degree and precision]

Let be smallest with and

Let , so that

Step 2. [Evaluate on roots of unity ]

Let

Compute a -approximation with

We will show that for all

Step 3. [Evaluate on ]

Let

Compute a -approximation

We will show that for all

Step 4. [Interpolate]

Compute a -approximation

We will show that for all

Return

Theorem 18. Let be power series with and . Assume that and . Given , we may compute -approximations for (as a function of and ) in time .

Proof. Let us first prove the correctness of the algorithm. The choice of and the tail bound (17) for imply . This ensures that we indeed have for all at the end of step 2. For , we also have

This proves the bound stated at the end of step 3. As to the last bound, let . Then and . Using the fact that for all vectors of length , we obtain

This proves the second bound. Our choice of implies the correctness of the algorithm.

As to the complexity bound, the FFT transform in step 2 can be done in time

By lemma 11, the multi-point evaluation in step 3 can be performed in time

The inverse FFT transform in step 4 can be performed within the same time as a direct FFT transform. This leads to the overall complexity bound

since .

4.3.Variants of the main theorem

Corollary 19. Let be convergent power series with and . Given , we may compute -approximations for (as a function of and ) in time .

Proof. Let be sufficiently small such that and . Consider the series and , so that

Let . By the theorem, we may compute -approximations of in time . Using additional -bit multiplications, we may compute

with for all . This can again be done in time .

Corollary 20. Let be convergent power series with , such that is an ordinary generating function. Then we may compute (as a function of and ) in time .

Proof. It suffices to take in corollary 19 and round the results.

Corollary 21. Let be convergent power series with such that is an exponential generating function. Then we may compute (as a function of and ) in time .

Proof. Applying corollary 19 for , we obtain -approximations of in time . Using additional -bit multiplications of total cost , we may compute

with for all . The are obtained by rounding the to the nearest integers.

Corollary 22. Let be power series with and . Let and be positive increasing functions with and . Given , we may compute -approximations for (as a function of and ) in time .

Proof. Without loss of generality, we may assume that for . In the proof of corollary 19, we may therefore choose , which yields and . It follows that and we conclude in a similar way as in the proof of corollary 19.

5.Relaxed composition

Let be such that and . Given an order , we have shown in the previous section how to compute efficiently, as a function of and . Since only depends on and , we may consider the relaxed composition problem in which and are given progressively, and where we require each coefficient to be output as soon as and are known. Similarly, in the semi-relaxed composition problem, the coefficients are known beforehand, but are given progressively. In that case, we require to be output as soon as are known. The relaxed and semi-relaxed settings are particularly useful for the resolution of implicit equations involving functional composition. We refer to [vdH02, vdH07] for more details about relaxed computations with power series.

5.1.Semi-relaxed composition

Let . In this section, we consider the problem of computing the semi-relaxed composition up to order . If , then we simply have . For , we denote

and similarly for and . Our algorithm relies on the identity

(18)

The first part of is computed recursively, using one semi-relaxed composition at order :

As soon as is completely known, we compute at order , using one of the algorithms for composition from the previous section. We also compute at order using binary powering. We are now allowed to recursively compute the second part of , using

This involves one semi-relaxed composition at order and one semi-relaxed multiplication at order .

Assume now that for all and . Consider the problem of computing -approximations of . In our algorithm, it will suffice to conduct the various computations with the following precisions:

Let us show that this indeed enables us to obtain the desired -approximations for . This is clear for the coefficients of . As to the second half, we have the bounds

and

These bounds justify the extra number of bits needed in the computations of and respectively.

Let us finally analyze the complexity of the above algorithm. We will denote by the complexity of multiplying two -bit integer polynomials of degrees . Using Kronecker multiplication, we have . We denote by the cost of a semi-relaxed multiplication of two -bit integer polynomials of degrees . Using the fast relaxed multiplication algorithm from [vdH02], we have . We will denote by and the cost of classical and semi-relaxed composition for , and as described above. By theorem 18, we have . The complexity of the semi-relaxed composition satisfies the following recursive bound:

Using theorem 18, it follows by induction that

(19)

From [BK75, vdH02], we also have

(20)

Applying (19) for and (20) on , we obtain

We have proved the following theorem:

Theorem 23. Let be power series with and . Assume that and . The computation of the semi-relaxed composition at order and up to an absolute error can be done in time .

5.2.Relaxed composition

Let be as in the previous section and . Assume also that . In order to compute the relaxed composition , we again use the formula (18), in combination with

The first coefficients of are still computed recursively, by performing a relaxed composition at order . As soon as and are completely known, we may compute the composition at order using the algorithm from section 4. Differentiation and division by also yields at order . The product can therefore be computed using one semi-relaxed multiplication.

Let . This time, the intermediate computations should be conducted with the following precisions:

Indeed, we have

Denoting by the complexity of relaxed composition, we obtain

It follows that

Theorem 24. Let be power series with , and . Assume that and . The computation of the relaxed composition at order and up to an absolute error can be done in time .

Remark 25. From [BK75, vdH02], we have

Theorem 24 improves on this bound in the frequent case when . Unfortunately, we have not yet been able to prove a quasi-linear bound .

Remark 26. For certain types of functional equations, one may avoid to apply theorem 24. For instance, power series reversion can directly be reduced to composition using Newton's method [BK75]. The solutions of certain other equations, such as

can be evaluated efficiently on small disks. Consequently, the Taylor coefficients of can be computed efficiently using lemma 17.

Bibliography

[AHU74]

A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974.

[Ber98]

D.J. Bernstein. Composing power series over a finite ring in essentially linear time. JSC, 26(3):339–341, 1998.

[BK75]

R.P. Brent and H.T. Kung. algorithms for composition and reversion of power series. In J.F. Traub, editor, Analytic Computational Complexity, Pittsburg, 1975. Proc. of a symposium on analytic computational complexity held by Carnegie-Mellon University.

[BK77]

R.P. Brent and H.T. Kung. Fast algorithms for composition and reversion of multivariate power series. In Proc. Conf. Th. Comp. Sc., pages 149–158, Waterloo, Ontario, Canada, August 1977. University of Waterloo.

[BK78]

R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.

[BSS08]

Alin Bostan, Bruno Salvy, and Éric Schost. Power series composition and change of basis. In J. Rafael Sendra and Laureano González-Vega, editors, ISSAC, pages 269–276, Linz/Hagenberg, Austria, July 2008. ACM.

[CK91]

D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.

[CT65]

J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.

[Für07]

M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.

[Kro82]

L. Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. Jour. für die reine und ang. Math., 92:1–122, 1882.

[Sch82]

A. Schönhage. Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients. In J. Calmet, editor, EUROCAM '82: European Computer Algebra Conference, volume 144 of Lect. Notes Comp. Sci., pages 3–15, Marseille, France, April 1982. Springer.

[SS71]

A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.

[vdH02]

J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.

[vdH07]

Joris van der Hoeven. New algorithms for relaxed multiplication. JSC, 42(8):792–802, 2007.