Fast composition of numeric power series

Let

be an effective ring of coefficients (i.e. we have algorithms for performing the ring operations). Let

be two power series with

, so that the composition

is well-defined. We are interested in algorithms for fast composition: given

and

, how much arithmetic operations in

are needed in order to compute

A first efficient general purpose algorithm of time complexity

was given in [BK78, CT65, SS71]. Here

denotes the complexity for the multiplication of two polynomials of degrees

and we have

[CK91]. In the case when

is polynomial [BK78] or algebraic [vdH02], then this complexity further reduces to

. For some very special series

, there even exist

algorithms; see [BSS08] for an overview. In positive characteristic

, right composition can also be performed in quasi-linear time

[Ber98].

In this paper, we are interested in efficient algorithms when

is a ring of numbers, such as

or a ring of floating point numbers. In that case, we are interested in the bit complexity of the composition, which means that we also have to take into account the bit precision

of the underlying integer arithmetic. In particular, we will denote by

the time needed for multiplying two

-bit integers. We have

[SS71] and even

[Für07], where

satisfies

. If all coefficients

and

correspond to

-bit numbers, then we will search for a composition algorithm which runs in quasi-linear time

. Throughout the paper, we will assume that

is increasing and

In section 2, we start by reviewing multiplication and division of polynomials with numeric coefficients. Multiplication of two polynomials of degrees

with

-bit coefficients can be done in time

using Kronecker's substitution [Kro82]. Division essentially requires a higher complexity

[Sch82], due to the fact that we may lose

bits of precision when dividing by a polynomial of degree

. Nevertheless, we will see that a best possible complexity can be achieved in terms of the output precision.

In section 3, we will study multi-point evaluation of a polynomial

points

in the case of numeric coefficients. Adapting the classical binary splitting algorithm [AHU74] of time complexity

to the numeric context, we will prove the complexity bound

. In the case when

, this complexity bound is again non-optimal in many cases. It would be nice if a bound of the form

could be achieved and we will present some ideas in this direction in section 3.2.

In section 4, we turn to the problem of numeric power series composition. Under the assumptions

and

, we first show that the computation of the first

coefficients of a convergent power series is essentially equivalent to its evaluation at

equally spaced points on a given circle (this fact was already used in [Sch82], although not stated as explicitly). The composition problem can therefore be reduced to one fast Fourier transform, one multi-point evaluation and one inverse Fourier transform, leading to a complexity

. We first prove this result under certain normalization conditions for floating point coefficients. We next extend the result to different types of numeric coefficients (including integers and rationals) and also consider non-convergent series. The good complexity of these algorithms should not come as a full surprise: under the analogy numbers

series, it was already known [BK77] how to compose multivariate power series fast.

In the last section, we conclude by giving relaxed versions [vdH02] of our composition algorithm. This allows for the resolution of functional equations for numeric power series, involving composition. Unfortunately, we did not yet achieve a quasi-linear bound, even though some progress has been made on the exponent. The special case of power series reversion may be reduced more directly to composition, using Newton's method [BK75].

2.Fundamental operations

Let

be a normed vector space. Given

and

, we say that

is an -approximation of

. We will also say that

is an approximation of

with absolute error or an absolute precision of

bits. For polynomials

, we will use the norm

, and notice that

. For vectors

, we will use the norm

2.1.Multiplication

Given two polynomials

with

and

, take

. Then the product

can be read off from the integer product

, since the coefficients of the result all fit into

bits. This method is called Kronecker multiplication [Kro82] and its cost is bounded by

. Clearly, the method generalizes to the case when

. A direct consequence for the computation with floating point polynomials is the following:

By assumption, the bit-sizes of the coefficients of

and

are bounded by

. Therefore, we may compute the exact product

in time

, using Kronecker multiplication. We have

This proves that

is a

-approximation of

. One may optionally truncate the mantissas of the coefficients of

so as to fit into

bits.

Remark 2. In order to increase readability, we will loosely use real and complex numbers as inputs of our algorithms. In practice, such inputs are really floating point numbers whose precisions should be clear from the context.

2.2.Inversion

Proof. Assume that we are given an approximation

. Then we may compute a better approximation using the classical Newton iteration

is about twice as “small”. We will apply the Newton iteration for a polynomial

whose first

coefficients are good approximations of

and the remaining coefficients less good approximations.

When starting with

, we may take

. If

is sufficiently large, then one Newton iteration yields

. Applying the Newton iteration once more for

, we obtain

. In view of lemma 1 and the assumption

, the cost of two such Newton iterations is bounded by

for a suitable constant

We are now in a position to prove the lemma. Since an increase of

leaves the desired complexity bound unaltered, we may assume without loss of generality that

. Then the cost

of the inversion at order

satisfies

. We conclude that

Remark 4. If only

is given, then a constant

for the bound

is not known a priori. Assume that

and write

. Then

and

In other words, our choice of

is at most a factor two worse than the optimal value.

Remark 5. The following alternative algorithm for the approximation of

is a variant of the method described in [Sch82, Section 4]:

Because of (5) below, we may always take

and

, which gives a complexity bound of the form

. In fact the FFT of a numeric

-bit polynomial of degree

can be computed in time

[Sch82, Section 3], which drops the bound further down to

In practice, we may also start with

and double

until the computed approximation

satisfies the equation

up to a sufficient precision. This leads to the same complexity bound

as in the lemma.

It is not clear which of the two methods is most efficient: Newton's method performs a certain amount of recomputations, whereas the alternative method requires us to work at a sufficiently large degree

for which (3) holds.

Given power series

and

, where

, we will say that

majorates

and write

for all coefficients of

. This notation applies in particular to polynomials.

2.3.Euclidean division

Remark 8. Assuming that

and

, it is again possible to prove the improved estimate

. However, the proof relies on Schönhage's original method, using the same doubling strategy as in remark 5. Indeed, when using Newton's method for inverting

, nice bounds

and

do not necessarily imply a nice bound for

, where

is the truncated inverse of

. Using a relaxed division algorithm [vdH02] for

, one may still achieve the improved bound, up to an additional

overhead.

For applications where Euclidean division is used as a subalgorithm, it is necessary to investigate the effect of small perturbations of the inputs

and

on the outputs

and

Proof. With the notations of the proof of lemma 7, let

and

be the truncations of the power series

and

at order

. Let us first consider the case when

, so that

3.Multi-point evaluation

be pairwise distinct points with

for all

. The problem of multi-point evaluation is to find an efficient algorithm for the simultaneous evaluations of

. A

-approximation of

will also be called a -evaluation of

. In order to simplify our exposition, we will assume that

is a power of two; this assumption only affects the complexity analysis by a constant factor.

3.1.The binary splitting algorithm

An efficient and classical algorithm for multi-point evaluation [AHU74] relies on binary splitting: let

. Denoting by

the remainder of the Euclidean division of

, we compute

In other words, we have reduced the original problem to two problems of the same type, but of size

For the above algorithm to be fast, the partial products

and

are also computed using binary splitting, but in the opposite direction. In order to avoid recomputations, the partial products are stored in a binary tree of depth

. At depth

, we have

nodes, labeled by polynomials

, where

This second computation can be carried out in place. At the last stage, we obtain

Proof. In the algorithm, let

and assume that all multiplications (6) and all euclidean divisions (7) are computed up to an absolute error

. Let

and

denote the results of these approximate computations. We have

Let us now estimate the complexity of the algorithm. In view of (9) and lemma 1, we may compute

in time

. In view of (11) and lemma 7, the computation of

takes a time

. For fixed

, the computation of all

and

thus takes a time

. Since there are

stages, the time complexity of the complete algorithm is bounded by

3.2.Low precision methods

, then the bound from lemma 11 reduces to

, which is not very satisfactory. Indeed, in the very special case when

for

, we may achieve the complexity

, by using the fast Fourier transform. In general, we may strive for a bound of the form

. We have not yet been able to obtain this complexity, but we will now describe some partial results in this direction. Throughout the section, we assume that

Proof. Let

and

for all

. Each of the points

is at the distance of at most

to one of the points

. We will compute

using a Taylor series expansion of

. We first observe that

Using

fast Fourier transforms of size

, we first compute

-approximations of

for all

and

. This can be done in time

Proof. The proof of the above proposition adapts to the case when

for all

. Let us now subdivide the unit disk into

annuli

and the disk of radius

. For a fixed annulus, we may evaluate

at each of the

inside the annulus in time

. For

, we have

provided that

. Consequently, taking

, we may evaluate

at the remaining points in time

. Under the condition

, and up to logarithmic terms, the sum

Proof. We will only provide a sketch of the proof. As in the proof of proposition 12, we first compute the Taylor series expansion of

up to order

. This task can be performed in time

via a division of

followed by a Taylor shift. We next evaluate this expansion at

for all

. According to proposition 16, this can be done in time

Proof. Again, we will only provide a sketch of the proof. The main idea is to use the binary splitting algorithm from the previous subsection. Let

. If

for all

, then we obtain

and the algorithm reduces to a fast Fourier transform. If each

is a slight perturbation of

, then

becomes a slight perturbation of

. In particular, the Taylor coefficients

only have a polynomial growth

. Consequently, the Euclidean division by

accounts for a loss of at most

instead of

bits of precision. The binary splitting algorithm can therefore be carried out using a fixed precision of

bits.

Sometimes, it is possible to decompose

, such that a fast multi-point evaluation algorithm is available for each set of points

. This leads to the question of evaluating

points.

Proof. Without loss of generality, we may assume that

and

are powers of two. We decompose

as follows:

We may compute

-evaluations of the

in time

. We may also

-approximate

in time

, using binary powering. Using Horner's rule, we may finally

-evaluate

4.Composition of power series

4.1.Evaluation of truncated power series

Let

be the time needed to compute

-approximations of

. Given

, let

be the time needed to

-evaluate

for

Proof. We first consider the case when

. Let

be the smallest power of two with

The computation of

-approximations of

takes a time

. The

-evaluation of

primitive roots of unity can be done using

fast Fourier transforms of size

. This requires a time

. If

, we thus obtain the bound

The general case is reduced to the case

via a change of variables

. This requires computations to be carried out with an additional precision of

bits, leading to the complexity bound in (a).

As to (b), we again start with the case when

. Taking

to be the smallest power of two with (15), we again obtain (16) for all

. If

, then we also notice that

for all

. We may

-evaluate

at the primitive

-th roots of unity in time

. We next retrieve the coefficients of the polynomial

up to precision

using one inverse fast Fourier transform of size

. In the case when

, we thus obtain

4.2.Composition of power series

Let us now consider two convergent power series

with

and

. Then

is well-defined and

. In fact, the series

and

still converge on a compact disc of radius

. Let

be such that

and

. Then (14) becomes

and similarly for

and

. In view of lemma 17, it is natural to use an evaluation-interpolation scheme for the computation of

-approximations of

For a sufficiently large

and

, we evaluate

on the

-th roots of unity

using one direct FFT on

. Using the fact that

for all

and the algorithm for multi-point evaluation from the previous section, we next compute

. Using one inverse FFT, we finally recover the coefficients of the polynomial

with

for all

. Using the tail bound (17) for

and a sufficiently large

, the differences

can be made as small as needed. More precisely, the algorithm goes as follows:

Proof. Let us first prove the correctness of the algorithm. The choice of

and the tail bound (17) for

imply

. This ensures that we indeed have

for all

at the end of step 2. For

, we also have

This proves the bound stated at the end of step 3. As to the last bound, let

. Then

and

. Using the fact that

for all vectors

of length

, we obtain

This proves the second bound. Our choice of

implies the correctness of the algorithm.

The inverse FFT transform in step 4 can be performed within the same time as a direct FFT transform. This leads to the overall complexity bound

4.3.Variants of the main theorem

Proof. Let

be sufficiently small such that

and

. Consider the series

and

, so that

Let

. By the theorem, we may compute

-approximations

in time

. Using

additional

-bit multiplications, we may compute

Proof. Applying corollary 19 for

, we obtain

-approximations

in time

. Using

additional

-bit multiplications of total cost

, we may compute

Proof. Without loss of generality, we may assume that

for

. In the proof of corollary 19, we may therefore choose

, which yields

and

. It follows that

and we conclude in a similar way as in the proof of corollary 19.

5.Relaxed composition

Let

be such that

and

. Given an order

, we have shown in the previous section how to compute

efficiently, as a function of

and

. Since

only depends on

and

, we may consider the relaxed composition problem in which

and

are given progressively, and where we require each coefficient

to be output as soon as

and

are known. Similarly, in the semi-relaxed composition problem, the coefficients

are known beforehand, but

are given progressively. In that case, we require

to be output as soon as

are known. The relaxed and semi-relaxed settings are particularly useful for the resolution of implicit equations involving functional composition. We refer to [vdH02, vdH07] for more details about relaxed computations with power series.

5.1.Semi-relaxed composition

Let

. In this section, we consider the problem of computing the semi-relaxed composition

up to order

. If

, then we simply have

. For

, we denote

The first part

is computed recursively, using one semi-relaxed composition at order

As soon as

is completely known, we compute

at order

, using one of the algorithms for composition from the previous section. We also compute

at order

using binary powering. We are now allowed to recursively compute the second part

, using

This involves one semi-relaxed composition

at order

and one semi-relaxed multiplication

at order

Assume now that

for all

and

. Consider the problem of computing

-approximations of

. In our algorithm, it will suffice to conduct the various computations with the following precisions:

Let us show that this indeed enables us to obtain the desired

-approximations for

. This is clear for the coefficients of

. As to the second half, we have the bounds

These bounds justify the extra number of bits needed in the computations of

and

respectively.

Let us finally analyze the complexity of the above algorithm. We will denote by

the complexity of multiplying two

-bit integer polynomials of degrees

. Using Kronecker multiplication, we have

. We denote by

the cost of a semi-relaxed multiplication of two

-bit integer polynomials of degrees

. Using the fast relaxed multiplication algorithm from [vdH02], we have

. We will denote by

and

the cost of classical and semi-relaxed composition for

and

as described above. By theorem 18, we have

. The complexity of the semi-relaxed composition satisfies the following recursive bound:

5.2.Relaxed composition

Let

be as in the previous section and

. Assume also that

. In order to compute the relaxed composition

, we again use the formula (18), in combination with

The first

coefficients of

are still computed recursively, by performing a relaxed composition

at order

. As soon as

and

are completely known, we may compute the composition

at order

using the algorithm from section 4. Differentiation and division by

also yields

at order

. The product

can therefore be computed using one semi-relaxed multiplication.

Let

. This time, the intermediate computations should be conducted with the following precisions:

Theorem 24 improves on this bound in the frequent case when

. Unfortunately, we have not yet been able to prove a quasi-linear bound

Remark 26. For certain types of functional equations, one may avoid to apply theorem 24. For instance, power series reversion can directly be reduced to composition using Newton's method [BK75]. The solutions of certain other equations, such as

can be evaluated efficiently on small disks. Consequently, the Taylor coefficients of

can be computed efficiently using lemma 17.

Bibliography

[AHU74]: A. Aho, J. Hopcroft, and J. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, Massachusetts, 1974.
[Ber98]: D.J. Bernstein. Composing power series over a finite ring in essentially linear time. JSC, 26(3):339–341, 1998.
[BK75]: R.P. Brent and H.T. Kung. algorithms for composition and reversion of power series. In J.F. Traub, editor, Analytic Computational Complexity, Pittsburg, 1975. Proc. of a symposium on analytic computational complexity held by Carnegie-Mellon University.
[BK77]: R.P. Brent and H.T. Kung. Fast algorithms for composition and reversion of multivariate power series. In Proc. Conf. Th. Comp. Sc., pages 149–158, Waterloo, Ontario, Canada, August 1977. University of Waterloo.
[BK78]: R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.
[BSS08]: Alin Bostan, Bruno Salvy, and Éric Schost. Power series composition and change of basis. In J. Rafael Sendra and Laureano González-Vega, editors, ISSAC, pages 269–276, Linz/Hagenberg, Austria, July 2008. ACM.
[CK91]: D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
[CT65]: J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
[Für07]: M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.
[Kro82]: L. Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. Jour. für die reine und ang. Math., 92:1–122, 1882.
[Sch82]: A. Schönhage. Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients. In J. Calmet, editor, EUROCAM '82: European Computer Algebra Conference, volume 144 of Lect. Notes Comp. Sci., pages 3–15, Marseille, France, April 1982. Springer.
[SS71]: A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
[vdH02]: J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
[vdH07]: Joris van der Hoeven. New algorithms for relaxed multiplication. JSC, 42(8):792–802, 2007.

			(4)
			(5)


			(9)
			(10)

1.Introduction

2.Fundamental operations

2.1.Multiplication

2.2.Inversion

2.3.Euclidean division

3.Multi-point evaluation

3.1.The binary splitting algorithm

3.2.Low precision methods

4.Composition of power series

4.1.Evaluation of truncated power series

4.2.Composition of power series

4.3.Variants of the main theorem

5.Relaxed composition

5.1.Semi-relaxed composition

5.2.Relaxed composition

Bibliography