Newton's method and FFT trading

Let

be the ring of power series over an effective ring

. It will be convenient to assume that

. In fact, it will suffice that all natural numbers up to the desired series expansion order can be inverted in

. In this paper, we are concerned with the efficient resolution of implicit equations over

. Such an equation may be presented in fixed-point form as

where

is an indeterminate vector in

with

. The operator

is constructed using classical operations like addition, multiplication, integration or postcomposition with a series

with

. In addition, we require that the coefficient

depends only on coefficients

with

, which allows for the recursive determination of all coefficients.

In particular, linear and algebraic differential equations may be put into the form (1). Indeed, given a linear differential system

where

is an

matrix with coefficients in

, we may take

. Similarly, if

is a tuple of

polynomials in

, then the initial value problem

For our complexity results, and unless stated otherwise, we will always assume that polynomials are multiplied using evaluation-interpolation. If

contains all

-th roots of unity with

, then it is classical that two polynomials of degrees

can be multiplied using

operations over

, using the fast Fourier transform [CT65]. In general, such roots of unity can be added artificially [SS71, CK91, vdH02a] and the complexity becomes

. We will respectively refer to these two cases as the standard and the synthetic FFT models. More details about evaluation-interpolation will be provided in section 2.

Let

be the set of

matrices over

. It is classical that two matrices in

can be multiplied in time

with

[Str69, Pan84, CW87]. We will denote by

the cost of multiplying two polynomials of degrees

with coefficients in

. When using evaluation-interpolation in the standard FFT model, one has

and

In [BK78], it was shown that Newton's method may be applied in the power series context for the computation of the first

coefficients of the solution

to (2) or (3) in time

. However, this complexity does not take into account the dependence on the order

, which was shown to be exponential in [vdH02a]. Recently [BCO+07], this dependence in

has been reduced to a quadratic factor. In particular, the first

coefficients of the solution

to (3) can be computed in time

. In fact, the resolution of (2) in the case when

and

are replaced by matrices in

resp.

can also be done in time

. Taking

, this corresponds to the computation of a fundamental system of solutions.

However, the new complexity is not optimal in the case when the matrix

is sparse. This occurs in particular when a linear differential equation

is rewritten in matrix form. In this case, the method from [BCO+07] for the asymptotically efficient resolution of the vector version of (4) as a function of

gives rise to an overhead of

, due to the fact that we need to compute a full basis of solutions in order to apply the Newton iteration.

In [vdH97, vdH02a], the alternative approach of relaxed evaluation was presented in order to solve equations of the form (1). More precisely, let

be the cost to gradually compute

terms of the product

of two series

. This means that the terms of

and

are given one by one, and that we require

to be returned as soon as

and

are known (

). In [vdH97, vdH02a], we proved that

. In the standard FFT model, this bound was further reduced in [vdH07a] to

. We also notice that the additional

overhead only occurs in FFT models: when using Karatsuba or Toom-Cook multiplication [KO63, Too63, Coo66], one has

. One particularly nice feature of relaxed evaluation is that the mere relaxed evaluation of

provides us with the solution to (1). Therefore, the complexity of the resolution of systems like (2) or (3) only depends on the sparse size of

as an expression, without any additional overhead in

Let

denote the complexity of computing the first

coefficients of the solution to (4). By what precedes, we have both

and

. A natural question is whether we may further reduce this bound to

or even

. This would be optimal in the sense that the cost of resolution would be the same as the cost of the verification that the result is correct. A similar problem may be stated for the resolution of systems (2) or (3).

In this paper, we present several results in this direction. The idea is as follows. Given

, we first choose a suitable

, typically of the order

. Then we use Newton iterations for determining successive blocks of

coefficients of

in terms of the previous coefficients of

and

. The product

is computed using a lazy or relaxed method, but on FFT-ed blocks of coefficients. Roughly speaking, we apply Newton's method up to an order

, where the

overhead of the method is not yet perceptible. The remaining Newton steps are then traded against an asymptotically less efficient lazy or relaxed method without the

overhead, but which is actually more efficient when working on FFT-ed blocks of coefficients.

In fact, FFT trading is already useful in the more elementary case of power series division. In order to enhance the readability of the paper, we will therefore introduce the technique on this example in section 3. In the FFT model, this leads to an order

division algorithm of time complexity

, which improves on the best previously known bound

[HZ04]. Notice that

should be read

and likewise for other fractions of this kind. Division with remainder of a polynomial of degree

by a polynomial of degree

can be done in time

; the best previously known bound was

(private communication by Paul Zimmermann).

In sections 4 and 5, we treat the cases of linear and algebraic differential equations. The main results are summarized in Tables 3 and 4 and analyzed in detail in section 7. In particular, the exponential of a power series can now be computed at order

in time

instead of

[Ber00].

In two recent papers [Har09a, Har09b] David Harvey has further improved the technique of FFT trading. In the standard FFT model, the FFT coincides up to a constant factor with the inverse of its transpose. This has been exploited in [Har09a] to get better bounds

and

for power series inversion and square roots. In [Har09b], the complexity for exponentiation has been further improved to

. In table 1, we have summarized the new results for elementary operations on power series.

It is well known that FFT multiplication allows for tricks of the above kind, in the case when a given argument is used in several multiplications. In the case of FFT trading, we artificially replace an asymptotically fast method by a slower method on FFT-ed blocks, so as to use this property. We refer to [Ber00] (see also remark 14 below) for a variant and further applications of this technique (called FFT caching by the author). The central idea behind [vdH07a] is also similar. In section 6, we outline yet another application to the truncated multiplication of dense polynomials.

The efficient resolution of differential equations in power series admits several interesting applications, which are discussed in more detail in [vdH02a]. In particular, certified integration of dynamical systems at high precision is a topic which currently interests us [Moo66, Loh88, MB96, Loh01, MB04, vdH07b].

Remark 1. This paper is a thoroughly reworked version of an earlier preprint [vdH06], integrating several helpful remarks by two of the referees and David Harvey.

2.Prerequisites

2.1.Evaluation-interpolation schemes

Let

be a ring in which

is not a zero-divisor and assume that

contains a primitive

-th root

of unity. Given a polynomial

of degree

, and identifying

with its vector

of coefficients, its discrete Fourier transform

is defined by

is a power of two, then the fast Fourier transform [CT65] allows us to perform the transformation

and its inverse

using

operations in

. If

are two polynomials with

, then we clearly have

for all

, whence

. Consequently, we may compute

using

where

stands for the componentwise product of the vectors

and

. If

is a power of two, this takes

operations in

More generally, let

denote the set of polynomials of degree

. Then an evaluation-interpolation scheme at degree

and

points consists of two computable

-linear mappings

for all

with

. We will denote by

the maximum of the time complexities of

and

. Given

with

, we may then compute

in time

An evaluation-interpolation model is a recipe which associates an evaluation-interpolation scheme to any degree

. Most fast multiplication schemes in the literature are actually based on evaluation-interpolation models. In the sequel, we will therefore assume that the cost

of multiplying two polynomials of degrees

is given by

for a suitable evaluation-interpolation model. Similarly, if scalar

matrices can be multiplied in time

, then we will assume that the cost

of multiplying two

matrices whose entries are polynomials of degrees

is given by

2.2.Classical FFT models

Let

again be a ring in which

is not a zero-divisor and assume that

contains a primitive

-th root of unity. Then we have seen that the FFT provides us with an evaluation-interpolation scheme at degree

, with

In fact, if

, then the truncated Fourier transform [vdH04, vdH05] still provides us with an evaluation-interpolation scheme with

and

. We will call this evaluation-interpolation model the standard FFT model.

does not contain a primitive

-th root of unity, then we may artificially adjoin a suitable root of unity to

as follows [SS71, CK91]. We first decompose

, with

and

. Any polynomial in

corresponds to a unique polynomial in

. We will consider the problem of multiplying in the latter ring. Consider the following sequence:

The first map is a natural identification when setting

. The injection

corresponds to writing elements of

as polynomials in

of degrees

, and padding with zeros. Since

is a primitive

-th root of unity and

, we may finally perform an FFT in

with respect to

. Each of the arrows can be reversed; in the case of

, we take

In particular, we have

for all

. Repeating the construction on

, we end up with an evaluation-interpolation model with

We will call this model the synthetic FFT model. Using similar ideas to those in [CK91], the model adapts to the case when

is a zero-divisor.

2.3.Classical evaluation-interpolation models

is infinite, then we may also use multipoint evaluation and interpolation in order to construct an evaluation-interpolation scheme at any degree. Using arbitrary points, we obtain [MB72, Str73, BM74]

If it is possible to take points in a geometric progression, then one even has [BS05]

Of course, these evaluation-interpolation models are already based on fast multiplication, whence they are not suitable for designing the fast multiplication (5). On the other hand, for large values of

, they may perform better than the synthetic FFT model on matrix and matrix-vector products (6) and (7).

For small values of

, it is sometimes interesting to use simpler, but asymptotically slower evaluation-interpolation models. For instance, we may iterate the construction

This “Karatsuba model” corresponds to even-odd Karatsuba multiplication. In a similar way, one may construct Toom-Cook models.

A lot of the complexity results for polynomials also hold for integers, by considering them as the evaluations of polynomials in

for a suitable word length

. For integer matrix multiplications, several evaluation-interpolation models are of interest. First of all, one may use approximate floating point arithmetic of bit length

. Secondly, one may fit the

-bit coefficients in

where

has many

-th roots of unity (e.g.

). These two models are counterparts of the standard FFT model. One may also use the Schönhage-Strassen model (which is the counterpart of the synthetic FFT model). For large matrix sizes, one may finally use Chinese remaindering, which is the natural counterpart of multipoint evaluation.

In practice, operations in

do not have a constant cost. Nevertheless, when computing with truncations of a power series

, it is usually the case that the bit size of

is proportional to

(or a power of

). Consequently, the worst cost of an operation in

is usually bounded by a constant times the average cost of the same operation over the complete computation.

2.4.Middle products

Let

be the product of two power series

. In order to efficiently compute only a part

, a useful tool is the so called “middle product” [HQZ04]. Let

be two polynomials with

and

. Then we define their middle product

(or simply

is clear from the context) by

Notice that this definition generalizes to the case when

and

are arbitrary rings with a multiplication

, and

. In matrix form, we have

This formula is almost the transposed form of a usual product. More precisely, if

with

and

, then we have

For a fixed evaluation-interpolation scheme, the product

is computed efficiently using evaluation and interpolation. More precisely, the operator

at degree

, restricted to polynomials of degree

corresponds to an

matrix

Assuming that the algorithms

and

for evaluation and interpolation only use

-linear operations, the actions of

and

on vectors can be computed by the transpositions

and

of these algorithms in time

[Bor56, Ber]. We may thus compute the middle product using

in time

. In the standard FFT model, the matrix

is actually symmetric and

is the upper half part of a symmetric matrix. Hence, (8) becomes

which is based on the fact that standard FFT multiplication of

and

really computes the exact product of

modulo

. Writing

with

, we then notice that

coincides with the middle product, whereas

modulo

3.Division

Given a power series

(and similarly for vectors or matrices of power series, or power series of vectors or matrices) and integers

, we will use the notations:

3.1.Blockwise products

Let

be fixed. Any series

may be rewritten blockwise as a power series

with coefficients

Let us now consider the computation of a product

, where

. The coefficients of the blockwise product

are polynomials of degrees

instead of

. In order to recover the coefficients of

, we define the “contraction operator”

: given a series

, whose coefficients are polynomials of degrees

, we let

These formulas are illustrated in Figure 1. From now on, and for reasons which are detailed in remark 2, we will use formula (11) for all product computations.

Assume now that we have fixed an evaluation-interpolation model for polynomials of degrees

. Then we may replace the polynomial representations of the blockwise coefficients

and

by their transforms

In particular, assuming

and

known, we may compute

using

scalar multiplications and

using an additional time

Remark 2. It turns out that the formula (11) is slightly more convenient and efficient than (10): in the applications below, the coefficients

will be computed one by one as a function of the previous diagonal sums

. In particular, when using (10), the computation of the high part

will need to be done separately at the next iteration. When using middle products, this computation is naturally integrated into the product

3.2.Division

Let

be two power series such that

is invertible. Assume that we want to compute the first

coefficients of

. Denoting

, we first compute

using a classical Newton iteration [BK78, Sch00]. Given

, assume that

have been computed, and let

Carrying out this iterative method in an evaluation-interpolation model for polynomials of degrees

yields the following algorithm:

Remark 3. In the above algorithm, the coefficients of

are computed in a naive manner using

Alternatively, we may rewrite (15) as an implicit equation in the transformed model and use a relaxed algorithm for its resolution [vdH02a, vdH07a]. For this purpose, we first extend the operators

, etc. blockwise to series

using

3.3.Complexity analysis

From now on, it will be convenient to restrict our attention to evaluation-interpolation models for which

and

are increasing functions in

and

. Given two functions

and

, we will write

if for any

we have

for all sufficiently large

Proof. Let us analyze the complexity of

. The precomputation of

can be done in time

when using a fast algorithm [HQZ04] based on Newton's method. The main loop accounts for

Choosing

such that

and

, the theorem follows. The choice

works both in the standard and the synthetic FFT model.

Remark 5. In practice, the number

should be chosen not too large, so as to keep

reasonably small. According to (16), we need

in order to beat the best previously known division algorithm [HZ04], which happens for

Remark 6. For small values of

, the fact that we perform more multiplications in the transformed model is compensated by the fact that the FFTs are computed for smaller sizes. Let us compare in detail a truncated FFT multiplication at order

and a blockwise multiplication as in section 3.1 at order

For simplicity, we will assume the standard FFT model and naive inner multiplication. The

inner multiplications in the classical FFT multiplication are replaced by

inner multiplications in the blockwise model, accounting for

extra multiplications. Every FFT at size

is replaced by

FFTs of size

, thereby saving approximately

multiplications. For

, the blockwise algorithm therefore saves

multiplications. For

, we save

multiplications. For

, we perform

more multiplications.

Using relaxed Karatsuba multiplication, we only need

inner multiplications in the blockwise model for

, so we save

multiplications. We also notice that the division algorithm requires five FFTs instead of three for multiplication. For

and naive inner multiplication, this means that we actually “save”

multiplications. In any case, the analysis shows that blockwise algorithms should perform well for moderate values of

, at least for the standard FFT model.

can be computed using

additional transforms of cost

and

additional inner multiplications. This implies that the quotient and the remainder of a division of a polynomial

of degree

by a polynomial

of degree

can be computed in time

. Indeed, it suffices to take

, and apply the above argument.

Remark 8. The division algorithm should also apply for integers and floating point numbers instead of polynomials and truncated power series. Of course, this requires a certain amount of extra work in order to handle carries correctly. We also expect FFT trading to be more efficient for standard FFT models (FFT multiplication over complex doubles or over a field

with a large

-th root of unity) than for the synthetic model (Schönhage-Strassen multiplication).

4.Linear differential equations

In order to simplify our exposition, it is convenient to write all differential equations in terms of the operator

. The inverse

is defined by

4.1.Newton iteration

admits a unique solution

with

. The main idea of [BCO+07] is to provide a Newton iteration for the computation of

. More precisely, assume that

and

are known. Then we have

so that

and

. Computing

and

together using (18) and the usual Newton iteration [Sch33, MC79]

for inverses yields an algorithm of time complexity

. The quantities

and

may be computed efficiently using the middle product algorithm.

Instead of doubling the precision at each step, we may also increment the number of known terms with a fixed number of terms

. More precisely, given

, we have

This relation is proved in a similar way as (18). The same recurrence may also be applied for computing blocks of coefficients of the unique solution

to the vector linear differential equation

Both the right-hand sides of the equations (20) and (22) may be computed efficiently using the middle product algorithm.

4.2.Blockwise resolution

The block notations

and

from section 3.1 naturally generalize to series of matrices and series of vectors. The derivation

operates in a blockwise fashion:

Assume that we have fixed an evaluation-interpolation scheme at degree

. Replacing the blockwise coefficients

and

by their transforms (12-13) and applying (23), we may compute

by evaluation-interpolation:

Of course,

and

only need to be computed once. This leads to the following algorithm for the computation of

Remark 9. In the above algorithm, the coefficients of

are computed in a naive manner. In a similar way as in Remark 3, we may use a relaxed algorithm instead. More precisely, the equation (23) may be rewritten as

4.3.Complexity analysis

Assuming that

has

non-zero entries, we denote by

the time complexity in order to compute the truncated solution

to (21) at order

Proof. In our algorithm, let

be a function which increases towards infinity, such that

and

. We take

and

, so that

. Let us carefully examine the cost of our algorithm for these choices of

and

Proof. In the standard FFT model, we have

and

. If

, we may therefore apply the theorem and the second term in (24) becomes negligible with respect to the first one.

In the synthetic FFT model, we have

and

. If

, we may again apply the theorem and the second term in (24) becomes negligible.

4.4.Further observations

Remark 12. In practice, one should choose

just sufficiently large such that the precomputation has a small cost with respect to the remainder of the computation. This is already the case for

close to

Remark 13. The use of middle products was needed in order to achieve the factor

in Corollary 11. As explained in Remark 2, using a more straightforward multiplication algorithm seems to require one additional transform. This leads to the factor

Remark 14. Corollary 11 applies in particular to the exponentiation

of a power series

. We obtain an algorithm of time complexity

, which yields an improvement over [Ber00, HZ04]. Notice that FFT trading is a variant of Newton caching in [Ber00], but not exactly the same: in our case, we use an “order

” Newton iteration, whereas Bernstein uses classical Newton iterations on block-decomposed series.

Remark 15. With minor changes, the algorithm can be adapted in order to compute the unique solution of the matrix differential equation

with

. The unique solution

corresponds to a fundamental system of solutions to (21). A similar complexity analysis to the one in the proof of Theorem 10 yields the bound

Remark 16. In the standard FFT model, the conditions of Theorem 10 reduce to

, our new approach gains a factor

. On the other hand, the relaxed approach becomes more efficient for moderate orders

Remark 17. In the case when

does not admit many

-th roots of unity, we have the choice between the synthetic FFT model and multipoint evaluation-interpolation. In the synthetic FFT model, the almost optimal bounds from Corollary 11 are reached under the rather harsh assumption

. This makes the method interesting only for particularly low orders

(maybe

for really huge values of

admits an infinity of points in geometric progression, then we may also use multipoint evaluation-interpolation with

and

for some constant

. In a similar way as in Corollary 11, one obtains the bound

under the assumption

, since

. If

, then the assumption may even be replaced by

. Recall that

in this context. Therefore, we potentially gain a factor

compared to the relaxed approach.

Remark 18. One may wonder whether the technique of FFT trading is useful in asymptotically less efficient models such as the Karatsuba model. Recall however that

in any model with

for

. The Karatsuba model is even essentially relaxed, in the sense that

. Therefore, the use of Newton's method at best allows for the gain of a constant factor. Moreover, FFT trading also does not help, since

in such models, so the second term in (24) can never be neglected with respect to the first term.

Remark 19. It is instructive to compare our complexity bounds with the complexity bounds if we only use Newton's method and neither FFT trading nor relaxed computations. In that case, let

denote the complexity of computing both

and

. One has

since the product

and the formulas (18) and (19) give rise to

matrix multiplications. This yields

Notice that we may subtract the cost

if the final

is not needed. It follows that

5.Algebraic differential equations

Assuming that one is able to solve the linearized version of an implicit equation (1), it is classical that Newton's method can again be used to solve the equation itself [BK78, vdH02a, BCO+07]. Before we show how to do this for algebraic systems of differential equations, let us first give a few definitions for polynomial expressions.

Given a vector

of series variables, we will represent polynomials in

by dags (directed acyclic graphs), whose leaves are either series in

or variables

, and whose inner nodes are additions, subtractions or multiplications. An example of such a dag is shown in Figure 2. We will denote by

and

the number of nodes which occur as an operand resp. result of a multiplication. We call

the multiplicative size of the dag and the total number

of nodes the total size of the dag. Using evaluation-interpolation, one may compute

in terms of

in time

with initial condition

, and where

is a tuple of

polynomials in

. Given the unique solution

to this initial value problem, consider the Jacobian matrix

Assuming that

is known, we may compute

in time

using the standard differentiation rules. For

, we have

Let us again adopt the notation (9). We will compute

and

for any subexpression

in a relaxed manner. Each series

will actually be broken up into its head

and its tail

, so that sums and products are really computed using

The advantage of our modified way to compute the

is that it also allows us to efficiently evaluate

. Indeed, since

, we have

Proof. In order to perform all multiplications in the transformed model, we have to compute both

and

for each argument

of a multiplicative subexpression of

and

for each multiplicative subexpression

. This amounts to a total of

evaluations-interpolations of size

, of cost

. The computations of the

using (28) induce an additional cost

. The relaxed multiplications in the transformed model correspond to a cost

. The additions are done in the untransformed model, in time

. The precomputation of

and its transforms have a negligible cost

Remark 22. In the case when most multiplications in

only depend linearly on

, it is possible to adapt a similar technique as in the previous section and perform these multiplications using the middle product. This allows for a reduction of the factor

to something between

and

Remark 23. When solving (26) using Newton's method [BCO+07] with the optimization from [Sch00], one obtains the bound

However, the factor

is quite pessimistic. For instance, if the expressions

do not share any common subexpressions, then we may use automatic differentiation [BS83] for the computation of

. The multiplicative size

for this circuit is given by

and

, whence

and

6.Truncated multiplication

Assume the standard FFT model. It is well-known that discrete FFTs are most efficient on blocks of size

with

. In particular, without taking particular care, one may lose a factor

when computing the product of two polynomials

and

of degrees

with

. One strategy to remove this problem is to use the TFT (truncated Fourier transform) as detailed in [vdH04, vdH05]. Another way to smooth the complexity is to cut

and

in smaller blocks, and trade superfluous and asymptotically expensive FFTs against asymptotically less expensive multiplications in the FFT model.

More precisely, we cut

and

into

parts of size

, where

grows slowly to infinity with

. With the notation (9), and using FFTs at size

for evaluation-interpolation, we compute

as follows:

operations in

. If we only need the truncated product

, then we may save

inverse transforms and half of the inner multiplications, so the complexity reduces to

In this particular case, it turns out that the TFT transform is always better, because both the full and the truncated product can be computed using only

operations in

. However, in the multivariate setting, the TFT also has its pitfalls. More precisely, consider two multivariate polynomials

whose supports have a “dense flavour”. Typically, we may assume the supports to be convex subsets of

. In addition one may consider truncated products, where we are only interested in certain monomials of the product. In order to apply the TFT, one typically has to require in addition that the supports of

and

are initial segments of

. Even then, the overhead for certain types of supports may increase if

gets large.

One particularly interesting case for complexity studies is the computation of the truncated product of two dense polynomials

and

with total degree

. This is typically encountered in the integration of dynamical systems using Taylor models. Although the TFT is a powerful tool for small dimensions

, FFT trading might be an interesting complement for moderate dimensions

. For even larger dimensions, one may use [LS03] or [vdH02a, Section 6.3.5]. The idea is again to cut

in blocks

where

is small (and

preferably a power of two). Each block is then transformed using an FFT (or a suitable TFT, since the supports of the blocks are still initial segments when restricted to the block). We next compute the truncated product of the transformed polynomials

and

in a naive way and transform back.

Let us analyze the complexity of this algorithm. The number

of monomials of total degree

is given by

In particular,

and

. In order to compute the monomials in

of total degree

, we need

We may therefore hope for some gain with respect to plain FFT multiplication whenever

In Table 2, we have shown the values of

for small values of

and

. It is clear from the table that FFT trading can be used quite systematically in order to improve the performance. For larger dimensions, the gain becomes particularly important. This should not come as a surprise, because naive multiplication is more efficient than FFT multiplication for

The main advantage of the above method over other techniques, such as the TFT, is that the shape of the support is preserved during the reduction

(as well as for the “destination support”). However, the TFT also allows for some additional tricks [vdH05, Section 9] and it is not yet clear to us which approach is best in practice. Of course, the above technique becomes even more useful in the case of more general truncated multiplications for dense supports with shapes which do not allow for TFT multiplication.

For small values of

, we notice that the even/odd version of Karatsuba multiplication presents the same advantage of geometry preservation (see [HZ02] for the one-dimensional case). In fact, fast multiplication using FFT trading is quite analogous to this method, which generalizes for Toom-Cook multiplication. In the context of numerical computations, the property of geometry preservation is reflected by increased numerical stability.

To finish, we would like to draw the attention of the reader to another advantage of FFT trading: for really huge values of

, it leads to a reduction in memory usage. Indeed, when computing the coefficients of a product sequentially

, we only need to store the transform

of one coefficient in the result at a time.

7.Conclusion

We have summarized the main results of this paper in Tables 3 and 4. We recall that

in the standard FFT model and

otherwise. Consequently, the new approach allows at best for a gain

in the standard FFT model and

in the synthetic FFT model. In practice, the factor

behaves very much like a constant, so the new algorithms become interesting only for quite large values of

. As pointed out in remark 18, FFT trading loses its interest in asymptotically slower evaluation-interpolation models, such as the Karatsuba model. We plan to come back to practical complexity issues as soon as implementations of all algorithms are available in the Mathemagix system [vdH+02b]. Notice also that Newton iterations are better suited to parallel computing than is relaxed evaluation.

One interesting remaining problem is to reduce the cost of computing a fundamental system of solutions to (4). This would be possible if one can speed up the joint computation of the FFTs of

Another interesting question is to what extent Newton's method can be generalized. Clearly, it is not hard to consider more general equations of the kind

since the series

merely act as perturbations. However, it seems harder (but maybe not impossible) to deal with equations of the type

since it is not clear a priori how to generalize the concept of a fundamental system of solutions and its use in the Newton iteration.

In the case of partial differential equations with initial conditions on a hyperplane, the fundamental system of solutions generally has infinite dimension, so essentially new ideas would be needed here. Nevertheless, the strategy of relaxed evaluation applies in all these cases, with the usual

overhead in general and

overhead in the standard FFT model.

Bibliography

[BCO+07]: A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, É. Schost, and A. Sedoglavic. Fast computation of power series solutions of systems of differential equations. In Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms, New Orleans, Louisiana, U.S.A., January 2007.
[Ber]: D. Bernstein. The transposition principle. http://cr.yp.to/transposition.html.
[Ber00]: D. Bernstein. Removing redundancy in high precision Newton iteration. Available from http://cr.yp.to/fastnewton.html, 2000.
[BK78]: R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.
[BM74]: A. Borodin and R.T. Moenck. Fast modular transforms. Journal of Computer and System Sciences, 8:366–386, 1974.
[Bor56]: J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research B: Electrophysics, Acoustics, Optics, Mathematical Methods, 6:1–74, 1956.
[BS83]: Walter Baur and Volker Strassen. The complexity of partial derivatives. Theor. Comput. Sci., 22:317–330, 1983.
[BS05]: A. Bostan and É. Schost. Polynomial evaluation and interpolation on special sets of points. Journal of Complexity, 21(4):420–446, August 2005. Festschrift for the 70th Birthday of Arnold Schönhage.
[CK91]: D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
[Coo66]: S.A. Cook. On the minimum computation time of functions. PhD thesis, Harvard University, 1966.
[CT65]: J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
[CW87]: D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic progressions. In Proc. of the Annual Symposium on Theory of Computing, pages 1–6, New York City, may 25–27 1987.
[Har09a]: David Harvey. Faster algorithms for the square root and reciprocal of power series, 2009. http://arxiv.org/abs/0910.1926.
[Har09b]: David Harvey. Faster exponentials of power series, 2009. http://arxiv.org/abs/0911.3110.
[HQZ04]: Guillaume Hanrot, Michel Quercia, and Paul Zimmermann. The middle product algorithm I. Speeding up the division and square root of power series. AAECC, 14(6):415–438, 2004.
[HZ02]: Guillaume Hanrot and Paul Zimmermann. A long note on Mulders' short product. Research Report 4654, INRIA, December 2002. Available from http://www.loria.fr/ hanrot/Papers/mulders.ps.
[HZ04]: G. Hanrot and P. Zimmermann. Newton iteration revisited. http://www.loria.fr/ zimmerma/papers/fastnewton.ps.gz, 2004.
[KO63]: A. Karatsuba and J. Ofman. Multiplication of multidigit numbers on automata. Soviet Physics Doklady, 7:595–596, 1963.
[Loh88]: R. Lohner. Einschließung der Lösung gewöhnlicher Anfangs- und Randwertaufgaben und Anwendugen. PhD thesis, Universität Karlsruhe, 1988.
[Loh01]: R. Lohner. On the ubiquity of the wrapping effect in the computation of error bounds. In U. Kulisch, R. Lohner, and A. Facius, editors, Perspectives on enclosure methods, pages 201–217, Wien, New York, 2001. Springer.
[LS03]: G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.
[MB72]: R.T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.
[MB96]: K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.
[MB04]: K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
[MC79]: R.T. Moenck and J.H. Carter. Approximate algorithms to derive exact solutions to systems of linear equations. In Symbolic and algebraic computation (EUROSAM '79, Marseille), volume 72 of LNCS, pages 65–73, Berlin, 1979. Springer.
[Moo66]: R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, N.J., 1966.
[Pan84]: V. Pan. How to multiply matrices faster, volume 179 of Lect. Notes in Math. Springer, 1984.
[Sch33]: G. Schulz. Iterative Berechnung der reziproken Matrix. Z. Angew. Math. Mech., 13:57–59, 1933.
[Sch00]: A. Schönhage. Variations on computing reciprocals of power series. Inform. Process. Lett., 74:41–46, 2000.
[SS71]: A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
[Str69]: V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:352–356, 1969.
[Str73]: V. Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. Numer. Math., 20:238–251, 1973.
[Too63]: A.L. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. Soviet Mathematics, 4(2):714–716, 1963.
[vdH97]: J. van der Hoeven. Lazy multiplication of formal power series. In W. W. Küchlin, editor, Proc. ISSAC '97, pages 17–20, Maui, Hawaii, July 1997.
[vdH02a]: J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
[vdH+02b]: J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.
[vdH04]: J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proc. ISSAC 2004, pages 290–296, Univ. of Cantabria, Santander, Spain, July 4–7 2004.
[vdH05]: J. van der Hoeven. Notes on the Truncated Fourier Transform. Technical Report 2005-5, Université Paris-Sud, Orsay, France, 2005.
[vdH06]: J. van der Hoeven. Newton's method and FFT trading. Technical Report 2006-17, Univ. Paris-Sud, 2006. http://www.texmacs.org/joris/fnewton/fnewton-abs.html.
[vdH07a]: J. van der Hoeven. New algorithms for relaxed multiplication. JSC, 42(8):792–802, 2007.
[vdH07b]: J. van der Hoeven. On effective analytic continuation. MCS, 1(1):111–175, 2007.


			(9)

			(12)
			(13)

1.Introduction

2.Prerequisites

2.1.Evaluation-interpolation schemes

2.2.Classical FFT models

2.3.Classical evaluation-interpolation models

2.4.Middle products

3.Division

3.1.Blockwise products

3.2.Division

3.3.Complexity analysis

4.Linear differential equations

4.1.Newton iteration

4.2.Blockwise resolution

4.3.Complexity analysis

4.4.Further observations

5.Algebraic differential equations

6.Truncated multiplication

7.Conclusion

Bibliography