On high precision integration of stiff differential equations

where

is a diagonal matrix with positive real entries

, and where

is a polynomial forcing term. We are interested in analytic solutions

, where

is either an interval of the form

with

, or a larger subset of

that contains an interval of this form. The

components of this mapping will be written

, whereas subscripts will be reserved for the coefficients of local power series solutions. A similar notation will be used for the components of other vectors and matrices.

If the largest eigenvalue

gets large with respect to the forcing term

, then the equation (1) is said to be stiff. Generic numerical integration schemes experience difficulties with this kind of equations, whether we use Euler's method, a Runge–Kutta scheme, or high order algorithms based on Taylor expansions. Roughly speaking, the problem is that all generic methods become inaccurate when the step size exceeds

. One of the most difficult cases is when

is large, but each of the quotients

remains reasonably close to one. This happens for instance when

, a case that is naturally encountered when discretizing certain partial differential equations.

Implicit integration schemes allow for larger step sizes than generic methods, but involve the computation and inversion of Jacobian matrices, which may be expensive in high dimension

. But even such more expensive schemes only seem to deal with the step size issue in a somewhat heuristic manner: one typically selects a scheme that is “stable” when applied to any equation

for all

, and then hopes that the scheme continues to behave well for the actual target equation (1). We refer to [25, Section 17.5] and [4, Section 6] for more information on classical integration schemes for stiff differential equations.

We are mainly interested in high precision computations, in which case it is natural to use high order integration schemes that are based on Taylor expansions. In this context, we are not aware of any numerical method that allows the step size to increase proportionally with time. The main aim of this paper is to present such a method, together with several variants, as well as a way to compute rigorous bounds for the error. Our fastest method is an explicit scheme, but its convergence deteriorates when the eigenvalues

are close. The slower and rigourous counterparts rely on the computation of Jacobian matrices.

In sections 3 and 4, we recall various approaches that can be used to compute truncated series solutions to initial value problems and how to derive high order integration schemes from this. More precisely, given a numerical approximation

for

at time

, we compute the first

terms of a power series solution

to the initial value problem

and return

as an approximation of

for some suitable step size

. The inaccuracy of such schemes for larger step sizes is due to the fact that the

-th coefficient of

tends to change proportionally to

for slight perturbations of

. If

is large, then even small multiples of

quickly dominate

, which severely limits the step sizes that we can use.

This numerical instability is an unpleasant artifact of traditional methods for the integration of differential equations. It is well know that solutions of stiff equations tend to become very smooth after a certain period of time (in which case one says that the system has reached its steady state), but this does not lead to larger step sizes, as we would hope for.

In order to make this idea of “smoothness after a certain period of time” more precise, it is instructive to study the analytic continuation of

in the complex plane. In section 2, for a fixed initial condition

and a fixed forcing term

, we show the existence of a compact half disk

, on which the solution

of (1) is analytic, for any choice of

. In particular, for

, the radius of convergence

of the exact Taylor expansion

at time

is at least

. Setting

, Cauchy's formula thus yields the bound

for all

, then this means that efficient integration schemes should allow for step sizes of the order

at time

, for some fixed constant

. In section 5, we will present various such schemes. They are all based on the computation of good approximations

of the exact Taylor expansion

at time

and order

. By what precedes, such an approximation cannot be obtained by solving (2). Instead, for a suitable so-called critical index

that depends on

, we will solve the following so-called steady-state problem:

Intuitively speaking, the system has reached a steady state for the eigenvalues

with

. Such eigenvalues

are “large”, so it is more accurate to determine coefficients

as a function of later coefficients

rather than earlier ones. Furthermore, the coefficients

are so “small” that they can be approximated by zeros; this explains the steady-state conditions

. The remaining eigenvalues

with

are small enough for the step size under consideration that the dependence of

on the initial condition

is sufficiently moderate for (3) to admit an accurate solution. We say that the system (1) is in a transient state for such eigenvalues

at time

A final topic is the computation of rigorous error bounds for the numerical integration process. For differential equations that are not stiff, this is a classical theme in interval analysis [21, 15, 5, 16, 18, 19, 11]. It has long been an open problem to develop efficient reliable integrators for stiff differential equations. We report on progress in this direction in section 6.

2.Analytic continuation

Consider the equation (1) for a fixed initial condition

and a fixed forcing term

, but where we will allow ourselves to vary

. Our main aim is to show that there exists a compact “half disk”

with

on which the solution

to (1) exists and is analytic, independently of the choice of

. We will both prove a weak and simpler version of this result and a stronger version that allows us to take larger radii

and that is also more convenient if we want to actually compute a radius

that works.

2.1.Notations

Given a non-empty open subset

, we write

for the Banach space of analytic functions

with finite sup-norm

In addition to such “componentwise” norms, we will also use more traditional sup-norms: given

, and an

matrix

, we define

2.2.Reminders from complex analysis

We will also need two well known results from complex analysis, namely an explicit version of Cauchy-Kovalevskaya's theorem and a variant of Montel's theorem.

constitutes a “majorant equation” for (4) in the sense of Cauchy-Kovalevskaya. By symmetry, each of the components

of this equation satisfies the simpler equation

of radius

. Since the formal power series solution of (4) satisfies

for all

and

, it follows that

has radius of convergence at least

Proof. If

, then this is Montel's theorem. The general case is obtained using an easy induction on

: we first extract a subsequence

such that

converges to a limit in

and then apply the induction hypothesis to this subsequence for the remaining

components.

2.3.Analyticity on open half disks

The fixed points

of this operator are precisely the solutions of (7). Let us prove the existence of such a fixed point. The uniqueness follows by analytic continuation from the well known uniqueness of the solution of the initial value problem (7) on a neighbourhood of the origin.

with center

and radius

into itself. Indeed, given

and

, our hypothesis (5) implies

Let us next show that

is actually contracting on

. Given

, consider the homotopy

with

. From (6), we get

This shows that

is indeed contracting on

. Since

is complete as a closed bounded subspace of

, we conclude that

converges to a fixed point

2.4.A refinement for larger compact half disks

It turns out that the condition (6) on the Jacobian is not really needed. Moreover, the solution can further be extended to the closure

Proof. Let

be as in the proof of Theorem 3 and notice that

still maps

into itself. Now consider the sequence

with

. Applying Montel's theorem, we obtain a subsequence

that converges uniformly to a limit

on every compact subset of

We claim that

is a fixed point of

. Indeed, for a sufficiently small

with

, we have

for all

with

. As in the proof of Theorem 3, this means that

is contracting on

. Consequently, the restrictions

of the functions

with

tend to a fixed point

, and so does the subsequence

. Now this fixed point

coincides with

and also solves the initial value problem (7). By analytic continuation,

therefore solves the same initial value problem on

, which completes the proof of our claim.

It remains to be shown that

can be extended to an analytic function

that satisfies

for all

. Since

, we first notice that

is bounded on

, whence

continuously extends to a function

that is defined on the whole of

, and we clearly have

for all

Now let us consider the unique solution

to the initial value problem

with

. Theorem 1 provides us with a lower bound

for the radius of convergence of

, which depends continuously on

. By the compactness of

, it follows that there exists some

with

for all

. Now consider

on the boundary

and let

be such that

. Then

and

on some neighbourhood of

. We conclude that

is an analytic extension of

to the open disk

that contains

3.Computing formal power series solutions

Before we present algorithms for the numeric integration of (1), let us first consider the question how to compute formal power series solutions. Since we will later be considering power series solutions at various times

, we will use the letter

instead of

for such solutions. Notice that a vector

formal power series can also be regarded as a formal power series

with coefficients in

Let

be the initial condition. Setting

, we are thus interested in the computation of truncated power series solutions to the equation

In this section, we recall several approaches to the computation of power series solutions to such equations. The efficiency of each method can be measured in terms of the number of field operations in

that are required to compute the first

terms of the solution. For the time being, we assume that all computations are done at a fixed bit precision

, so that operations in

are done at unit cost. One may also take into account the number

of scalar operations that are necessary for the evaluation of

. A further refinement is to separately count the number

of multiplications that are needed. For instance, if

and

, then we have

and

. We always assume that

Iterative method

Now assume that

is an approximate solution to (11) whose first

terms are correct. In other words, if

is the actual solution and

is a preimage of

with

, then

. Given such an approximation, one iteration

of (11) yields a new approximation whose first

terms are correct. Starting with

, we thus obtain a solution modulo

of (11) after at most

iterations. The total cost of this computation is bounded by

, or by

when using FFT-techniques.

Recurrence relations

Applying this recursively to the polynomial expression

, the iteration (12) yields a recursion relation

that allows us to compute

from

, by induction on

. Proceeding in this way, the computation of the solution

at order

requires

operations.

The lazy power series approach

The lazy power series approach has the important property that the

-th coefficient

of a product (say) becomes available as soon as

and

are given. This makes it possible to let

and

depend on the result

, as long as the computation of

and

only involves previous coefficients

. As a consequence, the fixed point equation (11) can be solved simply by evaluating the right hand side using lazy power series arithmetic. Indeed, the

-th coefficient

only depends on previous coefficients

The relaxed power series approach

One essential observation is that, in order to solve (11) using the lazy power series approach, we only relied on the fact that each coefficient

becomes available as soon as

and

are given. In fact, it is possible to design faster multiplication methods that still have this property; such methods are said to be relaxed or on-line. A relaxed multiplication method that computes a product at order

in time

was presented in [8] and can be traced back to [6]. An even faster algorithm of time complexity

was given in [9].

Denoting by

the cost of relaxed multiplication at order

, the resolution of (11) at order

now requires only

operations.

Newton's method

4.Numerical integration using Taylor expansions

Let

as before. The aim of this section is to present a naive algorithm for the numerical integration of the differential equation

based on the computation of truncated power series solutions at successive times

. We will use a fixed expansion order

, a fixed bit precision

, but an adaptive step size

. At

, we are given an initial condition

and our aim is to find a good numeric approximation for

at time

. The algorithm is not designed to be efficient when the equation gets stiff and we will present a heuristic discussion on what goes wrong when this happens.

4.1.Naive integration using Taylor series

Let us write

for the Taylor series expansion of

at time

and

for its truncation at order

. In other words, setting

If the time

at which we expand

is clear from the context, then we will simply drop the postfix

and write

instead of

. Conversely, for any other quantities that implicitly depend on

, we will use the postfix

to make this dependence explicit.

So let

be the power series expansion of

at a fixed time

. In view of (18), this power series satisfies the equation

In the previous section, we have recalled various methods for computing truncated power series solutions

as a function of the initial condition

at time

. In what follows, we will use any of these algorithms as a black box, and show how to device a numerical integration scheme from that.

Obviously, given

and an appropriate step size

at time

, the idea is to compute

and simply evaluate

We next continue with

in the role of

and with a suitably adapted step size

. The main question is therefore to find a suitable step size

. Now the expected order of magnitude of

is given by

Since we are computing with

bits of precision, we wish to keep the relative error of our numeric integration scheme below

, approximately. Therefore, we need to ensure that the truncation error

remains bounded by

. Although we have not computed any of the coefficients

for

, a reasonable approximate upper bound for

is given by

where

is a small positive integer, called the number of guard terms. In order to protect ourselves against the occasional vanishing of

, it is wise to take

. Nevertheless, a small value such as

should usually provide acceptable upper estimates for

. We now simply take the step size

to be maximal with

. This leads to the following algorithm for the numerical integration of (18):

Input: an initial condition and

Output: a numerical approximation for , where satisfies (18) with

Compute the truncated solution to (19) with

Let be maximal such that , with and as in (20) and (22)

4.2.Step size and error analysis

Let

denote the exact solution of (18) with

and let

denote the computed approximation by Algorithm 1. In order to analyze various aspects of the algorithm, it is instructive to look at the radius of convergence

at time

. Roughly speaking, with the notations from the previous subsection, the coefficients

of the power series expansion

at time

grow as

Ideally speaking, if we were able to compute

with sufficient accuracy, then the coefficients

should grow in a similar way. Setting

for the step size in this ideal situation, we would then expect that

, whence

This suggests to take the expansion order

to be proportional to

, after which the step size

should be proportional to the radius of convergence

Let us pursue this line of wishful thinking a little further. Theorem 4 implies that

is analytic on a compact half disk

that is independent of

. In particular, we get that

for

. It can also be shown that the radius of convergence of

at the origin is of the order

or more. For stiff equations, we typically have

. In order to integrate the equation until a time

, we thus hope for a step size that increases geometrically from

. The entire integration would then require approximately

steps. Using the most efficient algorithms from section 3, each step requires

floating point operations (here the “flat Oh” notation

stands for

). Since additions and multiplications of

bit floating point numbers can be performed in time

, the overall bit cost of the entire integration would thus be bounded by

But are we indeed able to compute

with enough accuracy in order to ensure that the coefficients

grow according to

? Let us carefully analyze each of the sources of error for one step of our integration scheme. By construction, we ensured the truncation error to be of the order

. One of the most intrinsic sources of error comes from the initial condition

: since we are computing with a precision of

bits, the mere representation of

induces a relative error of the order of

. Even when computing

from

with infinite precision, this intrinsic source of error cannot be removed.

Let us now study how the error in the initial condition affects the errors in the other coefficients

. For this, we need to investigate the first variation of

, which describes the sensitivity of the flow to the initial condition. More precisely, let

be the power solution of (19) at time

as a function of the initial condition

. Then the first variation

is the

matrix with entries

. Dropping

suffixes when they are clear from the context, the first variation satisfies the linear differential equation

Here we recall that

stands for the Jacobian matrix of

. If our equation is very stiff, then

is small with respect to

, which leads to the extremely crude approximation

for the first variation.

Now the relative error in the initial condition

is at best

, as explained above, which means that

for

. In combination with the relation

, this leads to the following errors for the other coefficients:

Now if

, then the error

dominates the actual value

, which yields

instead of

. When chosing our step size

such that

, as in Algorithm 1, this yields

and

instead of the desired step size

. The actual bit cost of the complete integration is therefore bounded by

instead of

An interesting aspect of this analysis is the fact that the step size

still turns out to be

times larger than

, whence larger orders

allow for larger step sizes. However, this comes at the expense of a larger relative error than

. Indeed, we have

This means that the relative error for one step of our integration method is

instead of

. In other words, we have “sacrificed”

bits of precision, so that the method admits an “effective precision” of only

bits.

The last source of errors for Algorithm 1 comes from rounding errors during the computation of

from

. The nature of these errors depends on the particular method that we used for computing the power series

. Nevertheless, for most methods, the rounding errors only contribute marginally to the total error. This is due to the fact that

for

, so the rounding errors are absorbed by the errors induced by the error in the initial condition

5.Fighting stiffness

Let us continue with the notations from section 4 and its subsection 4.2. In particular, we assume that the exact solution

to (1) with initial condition

is analytic on the compact half disk

. We also denote by

an upper bound for

, so that

and

for all

and

, by Cauchy's theorem. From now on, we assume that

, so that Algorithm 1 only allows us to use a step size of the order (24) instead of (23). The aim of this section is to present a new way to compute truncated power series solutions of the equation

at time

, with the property that

is a good approximation of the true truncated solution

. In a similar way as in section 4, we will then use this to derive an algorithm for the numerical integration of (1). Contrary to before, the property that

for

allows us to take step sizes of the desired order (23) for

5.1.Integrating stiff equations using Taylor series

We stress once more that the reduced step size for Algorithm 1 is a consequence of our choice to compute the truncated solution

of (25) in terms of the initial condition

(that can only be known approximately) and not of the choice of the particular algorithm that is used for this computation.

Indeed, as explained in section 4.2, only a small change in

of the order of

can have a dramatic effect on the solution

, since the coefficient

can change by as much as

. Since

for large

, it follows that a minor change in

leads to a completely incorrect computation of the coefficients

with

close to

For small

, the error

generally does remain bounded by the actual value

. In the theory of stiff differential equations, it is customary to say that the system is in a transient state for such

. As soon as the error

exceeds

, we say that the system has reached its steady state for the largest eigenvalue

. When this happens, the system may still be in a transient state for some of the other eigenvalues

. This motivates the definition of the critical index

as being the largest index

such that we are in a transient state for

; we take

if we reached the steady state for all eigenvalues

The concepts of transient state, steady state, and critical index are deliberately somewhat vague. As a tentative definition, we might say that we reached the steady state for

. However, for computational purposes, it is convenient to interpret this inequality as an approximate one. The crucial property of reaching the steady state for

is that the smallness of

essentially determines the

-th component

of the initial condition up to the last

-th bit.

The main idea behind our algorithm is to use this property as a “boundary condition” for the computation of

. More precisely, we boldly impose the boundary conditions

as a replacement for the initial conditions

, while keeping the remaining initial conditions

for the transient states. In other words, we wish to find the truncated solution

of the system

We will call

the steady-state condition for

and (26) a steady-state problem.

In order to solve (26), it is natural to adapt the iterative method from section 3 and introduce the operator

with

The actual arithmetic is performed over

, but it will be more convenient to view

as an operator from

into itself. The computation of

as a fixed point of

leaves us with two questions: how to chose a suitable ansatz for the iteration

and how to determine the critical index

For the ansatz, we go back to the solution

from the previous step at time

and simply use

as a first approximation of the solution

at time

. For

, we fall back to the traditional method from section 4. For the initial and steady-state conditions to “propagate to the other end”, at least

iterations are required in order to find a fixed point of

, whereas

iterations usually suffice. One may thus take

As to the critical index

, we determine it as a function of the step size

through

This choice is meant to ensure the fastest convergence of the iteration

and we used

as an approximation of

. The step size

itself is chosen in the same way as in section 4. Since

is not yet available for the computation of

, we may use the previous step size

as an approximation for

in (28). Optionally, one may also wish to implement a few additional sanity checks in order to ensure convergence of the

-iteration and decrease the step size in case of failure. One useful such check is to verify that

Input: an initial condition and

Output: a numerical approximation for , where satisfies (1) with

Compute the truncated solution to (19) with

Let be maximal such that , with and as in (20) and (22)

Replace by the approximate fixed point of as in (27)

5.2.Convergence of the

-iteration

Let us now study the convergence of

as a mapping on the

-dimensional vector space

over

. We define a norm

on this space by

For a linear map on this space, represented as a matrix

, we have the corresponding matrix norm

Then the theorem implies the existence a small neighbourhood of the fixed point

on which the

-iteration converges to

. Whenever this condition is met, the

-iteration actually tends to converge on a rather large neighbourhood that includes our ansatz; see the next subsection for a more detailed discussion. Intuitively speaking, the condition requires the eigenvalues

to be sufficiently separated with respect to the norm of the forcing term

. Even when the condition does not hold, the

-iteration usually still displays an initial convergence for our ansatz, but the quality of the approximate solution to (26) ceases to improve after a while.

If the condition (29) is satisfied for all critical indices

that we encounter when integrating from

until time

, then Algorithm 2 should produce an accurate result. The idealized analysis from section 4.2 then also applies, so the algorithm takes

steps. Since each step now requires

floating point operations at bit precision

, we finally obtain the bound

for the total running time.

5.3.Quality of the computed solution

Let us now investigate in more detail why fixed points

of the

-iteration indeed approximate the true solution

quite well. For this, we will determine a more precise small neighbourhood of the fixed point

on which the

-iteration converges and show that this neighbourhood in particular contains

. We start with two lemmas.

Proof. A straightforward adaptation of the proof of Theorem 5 shows that

on the ball

, which means that

on this ball. By induction on

, it follows that

and

. We conclude that

converges to a fixed point

Returning to the Taylor series expansion

of the exact solution of (1) at time

, we notice that

Now assuming that the aimed step size

was more or less achieved at the previous step, it follows that

is of the desired order. If the condition (29) is indeed satisfied for

, we thus should be able to apply Theorem 8 and conclude that the computed fixed point

is within distance

for the

norm. Using a similar reasoning, we also see that the ansatz at the next step will be sufficiently close to the true solution for the

-iteration to converge.

5.4.Solving steady-state problems through the first variation

A more robust but costly approach to solve the steady-state problem (26) is to compute the initial values

with sufficient precision using Newton's method. Given a tentative approximation

for the initial condition, we both compute

and the first variation

, after which we update

by solving the linear system

This method admits quadratic convergence, but it requires us to compute with a precision of

bits at least in order to be accurate. Indeed, this comes from the fact that

grows roughly as

. On the upside, we may compute

and

using any of the algorithms from section 3. The total running time is therefore bounded by

. Notice also that it actually suffices to compute the last

rows of

, due to the requirement that

The main disadvantage of the above method is that the computation of the first variation

along with

induces an overhead of

, which may be a problem for systems of high dimension. Let us now sketch how one might reduce this overhead by combining the variational approach with the

-iteration. We intend to return to a more detailed analysis in a future work.

Instead of using a single critical index

, the first idea is to use a range

of indices starting at

, and such that the condition (29) holds for

The second idea is to only vary the components

of the initial condition and use the steady-state conditions for the indices

More specifically, for a tentative initial condition

, we first solve the steady-state problem

using the

-iteration technique. In a similar way, we next solve the following variational steady-state problem for the

matrix

As our ansatz, we may use

for all

. Having computed

and

at precision

, we finally update

by solving the linear system

and setting

We repeat this whole process until

is sufficiently close to zero for

Remark 9. The algorithms in this section are reminiscent of implicit numerical schemes for the integration of (1). One interesting difference is that our second optimized method only needs to compute a small part of the full Jacobian matrix.

Remark 10. Instead of imposing the exact steady-state conditions

, yet another approach would be to minimize the norm

under the condition that

. This approach admits the advantages that one does not need to know

and that it might be applied to more general complex matrices

. However, it requires the computation of the full Jacobian matrix.

6.Certification

In the previous two sections, we have described numerical algorithms for integrating (1). An interesting question is how to compute a tight error bound such that the distance between the true and the computed solutions lies within this error bound. Ball arithmetic provides a suitable framework for such computations. This variant of interval arithmetic is well suited for high precision computations with complex numbers and we will briefly recall its basic principles in section 6.1. As a first application, we show how to make Theorem 4 more effective.

The certified integration of differential equations that are not stiff (i.e. the robust counterpart of section 4) is a classical topic in interval arithmetic [20, 21, 15, 5, 24, 16, 7, 22, 18, 14, 17, 23, 19]. For recent algorithms of good complexity, much in the spirit of the present paper, we refer to [11]. Most of the existing algorithms rely on Taylor series expansions as we do, while providing rigorous tail bounds for the truncation error.

The effective counterpart of Theorem 4 provides suitable tail bounds in the case of stiff differential equations. In sections 6.3 and 6.4, we show how to use this for the certified resolution of steady-state problems using the

-iteration from section 5.1. Unfortunately, our first algorithms are rather naive and typically lead to heavily overestimated error bounds. The classical way to reduce this overestimation is to also compute the first variation and apply the mean value theorem: see section 6.5 for how to do this in our context.

6.1.Ball arithmetic

Given

and

, we write

for the closed ball with center

and radius

. The set of such balls is denoted by

. One may lift the ring operations

to balls in

, by setting:

These formulas are simplest so as to satisfy the so called inclusion principle: given

and

, we have

. This arithmetic for computing with balls is called exact ball arithmetic. It extends to other operations that might be defined on

, as long as the ball lifts of operations satisfy the inclusion principle. Any ordinary complex number

can be reinterpreted as a ball

. Given a ball

, we also notice that

and

provide us with reliable upper and lower bounds for

Another interesting operation on balls

that we will need below is intersection. Assuming that the set intersection

is non-empty, we define the ball intersection

to be the ball of smallest radius that contains

. In order to determine this ball intersection, we may assume without loss of generality that

. If

, then

. Otherwise, let

be the two (possibly identical) intersections of the circles

and

. If

, then we still have

. Otherwise,

It will also be convenient to extend vector notations to balls. First of all, we identify vectors of balls

with “ball vectors”

. Given

and

, we also write

if and only if

for

. Similar remarks apply to ball matrices, ball series, etc.

Remark 11. For effective computations, one can only work with approximations of real and complex numbers of finite precision. The IEEE 754 norm provides a standard for real floating point arithmetic based on the concept of “correct rounding”. It naturally generalizes to floating point numbers with a mantissa of

bits and an exponent of

bits [3]. Let

and

the sets of real and complex floating point numbers of this type (strictly speaking, one also has

for the reliable rounding of overflowing operations). One may adapt the above arithmetic on exact balls to floating point balls in

while preserving the inclusion principle: it suffices to slightly increase the radii of output balls so as to take care of rounding errors. For precise formulas and interesting variations, we refer to [13]. For simplicity, the sequel will be presented for exact ball arithmetic only, but it is not hard to adapt the results to take into account rounding errors.

6.2.Computing bounds on compact half disks

The polynomials

in (1) can either be represented as linear combinations of monomials, or using a dag (directed acyclic graph). In both cases, the ball arithmetic from the previous subsection allows us to reliably evaluate

at balls in

For the effective counterpart of Theorem 4, there is a trade-off between the qualities of the radius

(larger radii being better) and the bound

(smaller values of

being better). For a given

, it is simple to compute the “best” corresponding

that satisfies the condition (8), by taking

. Conversely, for a fixed

, let us recall a classical technique from interval analysis that can be used to compute an “almost best” corresponding bound

that satisfies (8).

We simply construct a sequence

by taking

and

for all

. If there exists a finite

for which (8) holds, then the sequence

usually converges to a minimal such

quite quickly. After say ten iterations, we should therefore have a reasonable approximation. One then slightly inflates the last value by taking

. If

, then we have succeeded in finding a suitable

. If not, then we return “failed”.

Using the above procedure, we may also compute a reasonably large compact half disk

on which

is analytic, together with a bound for

: we simply perform a dichotomic search for the largest

for which the computation of a corresponding

does not fail. If we also wish the bound

to remain reasonably small, one may simply divide

by two at the end of the search and compute the corresponding

6.3.Certified integration of stiff differential equations

Let us now return to the integration of (1) and let

be the exact solution for the initial condition

. Let

and

be computed as above such that (8) holds. Assume that we were able to reliably integrate (1) until a given time

and let

, so that

In order to adapt the

-iteration to ball arithmetic, we first deduce from Theorem 4 that

for

. This provides us with the required steady-state conditions

, in addition to the initial conditions

. The ball counterpart of our steady-state problem thus becomes

This map has the property that

. In our ball context, we may actually iterate using the following improved version

Here we understand that the intersections are taken coefficientwise. Since iterations using

can only improve our enclosures, we do not need to worry much about the ansatz. We may deduce a reasonably good ansatz from the power series expansion

at the previous time

, and the Cauchy bounds

for all

. But it is perfectly reasonable as well to use

Applying

a sufficient number of times to this ansatz, we obtain the desired ball enclosure

of the truncated power series expansion of

at time

. Assuming that

, we may then deduce the enclosure

at time

. Notice that the cost of the computation of certified solutions in this way is similar to the cost of Algorithm 1, up to a constant factor.

6.4.Certification of approximate solutions

The above method relies on the property that the steady-state problem (30) for balls specializes into a numerical steady-state problem that admits

as a solution, since

and

. Given a numerical approximation of

, as computed in section 5, a related problem is whether we can certify the existence of an actual solution in a small neighbourhood of the approximate solution.

In order to make this more precise, let us introduce a few more notations. Given

and

, consider the numeric steady-state problem

We will denote by

any exact solution to this problem if such solution exists. It will be convenient to regard

and

as elements of

, through padding with zeros. Now consider the operator

defined by

Through the iterated application of

to a suitable ansatz, we may obtain a numerical approximation

Now consider balls

and

, again padded with zeros. Then we have the following ball analogue of (32):

Let

and

denote the centers of

and

. Starting from a numerical approximation

, we wish to compute the truncation

of a solution to (33), with the property that

for all

and

In order to do this, the idea is again to use a suitable ball version of

, given by

and to keep applying

on the ansatz

until we are sufficiently close to a fixed point. Using a similar inflation technique as in section 6.2, we finally obtain a truncated series

with

, or “failed”. Now for any

and

, we notice that this also yields

. Thinking of

as a compact set of series in

, this means in particular that

admits a fixed point

, as desired.

6.5.Curbing the wrapping effect

, then the algorithm from subsection 6.3 specializes to a classical way to compute the enclosure (31) of a solution to (1) at time

as a function of the enclosure

at time

. However, it is well known that this method suffers from a large overestimation of the errors, due to the so-called “wrapping effect”. A well know technique to reduce this overestimation is to first compute a certified solution for the initial condition

instead of

and then to bound the error due to this replacement, by investigating the first variation. Let us now show how to extend this technique to general critical indices

With

in the role of

and

in the role of

, we start with the computation of a truncated certified solution

to (33). Recall that this ball solution has the property that

for all

. Since the initial conditions are exact, this solution is generally fairly accurate. Now for any

, we may write

where

denotes the exact solution of the following equation for the first steady-state variation:

where

is a solution to (30). We only compute a crude solution

to this equation, for a crude solution

to (30), both truncated to the order

. These solutions can be obtained using the techniques from section 6.3. From (34) we finally conclude that

is a ball enclosure for

. This enclosure is usually of a much better quality than

. However, its computational cost is roughly

times higher, due to the fact that we need to determine the first variation.

6.6.Handling close eigenvalues

In the case when

is too small for (29) to hold, we may generalize the above strategy and incorporate some of the ideas from section 5.4. With

and

from there, let us briefly outline how to do this. We proceed in four stages:

7.Conclusion

There are several directions for generalizations and further improvements of the results in this paper. For simplicity, we assumed

to be a diagonal matrix with positive eigenvalues. It should not be hard to adapt the results to arbitrary matrices

with positive real eigenvalues only (as long as the forcing term

does not explode for the change of coordinates that puts

in Jordan normal form).

A more interesting generalization would be to consider complex eigenvalues

with strictly positive real parts. In that case, the half disks

would need to be replaced by smaller compact sectors of the form

with

. Even more generally, one may investigate how to develop accurate Taylor series methods for integrating arbitrary differential equations with the property that the step size is proportional to the radius of convergence of the exact solution; see also Remark 10.

In this paper, we focussed on high precision computations using high order Taylor schemes. Another interesting question is whether it is possible to develop analogues of our methods in the same spirit as more traditional Runge–Kutta schemes. How would such analogues compare to implicit integration schemes?

Concerning certified integration, the theory of Taylor models [18, 23, 19] allows for higher order expansions of the flow

as a function of the initial condition

(that is, beyond the computation of the first variation as in this paper). We think that it should be relatively straightforward to adapt our techniques to such higher order expansions. In a similar vein, we expect that the techniques from [7, 11] to further curb the wrapping effect can be adapted to the stiff setting.

From a more practical point of view, it would finally be interesting to have more and better machine implementations. For the moment, we only did a toy implementation of the main algorithm from section 5.1 in the Mathemagix system [12]. We found this implementation to work as predicted on various simple examples. Detailed machine experiments with larger systems and the algorithms from sections 5.4 and 6 should make it possible to further improve the new techniques.

Bibliography

[1]: A. Bostan, F. Chyzak, F. Ollivier, B. Salvy, É. Schost, and A. Sedoglavic. Fast computation of power series solutions of systems of differential equations. In Proceedings of the 18th ACM-SIAM Symposium on Discrete Algorithms, pages 1012–1021. New Orleans, Louisiana, U.S.A., January 2007.
[2]: R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. Journal of the ACM, 25:581–595, 1978.
[3]: R. P. Brent and P. Zimmermann. Modern Computer Arithmetic. Cambridge University Press, 2010.
[4]: J. C. Butcher. Numerical methods for ordinary differential equations in the 20th century. J. of Computational and Applied Mathematics, 125:1–29, 2000.
[5]: J.-P. Eckmann, H. Koch, and P. Wittwer. A computer-assisted proof of universality in area-preserving maps. Memoirs of the AMS, 47(289), 1984.
[6]: M. J. Fischer and L. J. Stockmeyer. Fast on-line integer multiplication. Proc. 5th ACM Symposium on Theory of Computing, 9:67–72, 1974.
[7]: T. N. Gambill and R. D. Skeel. Logarithmic reduction of the wrapping effect with application to ordinary differential equations. SIAM J. Numer. Anal., 25(1):153–162, 1988.
[8]: J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
[9]: J. van der Hoeven. New algorithms for relaxed multiplication. JSC, 42(8):792–802, 2007.
[10]: J. van der Hoeven. Newton's method and FFT trading. J. Symbolic Comput., 45(8):857–878, 2010.
[11]: J. van der Hoeven. Certifying trajectories of dynamical systems. In I. Kotsireas, S. Rump, and C. Yap, editors, Mathematical Aspects of Computer and Information Sciences: 6th International Conference, MACIS 2015, Berlin, Germany, November 11-13, 2015, Revised Selected Papers, pages 520–532. Cham, 2016. Springer International Publishing.
[12]: J. van der Hoeven, G. Lecerf, B. Mourrain et al. Mathemagix. from 2002. http://www.mathemagix.org.
[13]: Joris van der Hoeven and Grégoire Lecerf. Evaluating straight-line programs over balls. In Paolo Montuschi, Michael Schulte, Javier Hormigo, Stuart Oberman, and Nathalie Revol, editors, 2016 IEEE 23nd Symposium on Computer Arithmetic, pages 142–149. IEEE, 2016.
[14]: W. Kühn. Rigourously computed orbits of dynamical systems without the wrapping effect. Computing, 61:47–67, 1998.
[15]: O. E. Lanfort. A computer assisted proof of the Feigenbaum conjectures. Bull. of the AMS, 6:427–434, 1982.
[16]: R. Lohner. Einschließung der Lösung gewöhnlicher Anfangs- und Randwertaufgaben und Anwendugen. PhD thesis, Universität Karlsruhe, 1988.
[17]: R. Lohner. On the ubiquity of the wrapping effect in the computation of error bounds. In U. Kulisch, R. Lohner, and A. Facius, editors, Perspectives on enclosure methods, pages 201–217. Wien, New York, 2001. Springer.
[18]: K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74. SIAM, Philadelphia, 1996.
[19]: K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
[20]: R. E. Moore. Automatic local coordinate transformations to reduce the growth of error bounds in interval computation of solutions to ordinary differential equation. In L. B. Rall, editor, Error in Digital Computation, volume 2, pages 103–140. John Wiley, 1965.
[21]: R. E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, N.J., 1966.
[22]: A. Neumaier. The wrapping effect, ellipsoid arithmetic, stability and confedence regions. Computing Supplementum, 9:175–190, 1993.
[23]: A. Neumaier. Taylor forms - use and limits. Reliable Computing, 9:43–79, 2002.
[24]: K. Nickel. How to fight the wrapping effect. In Springer-Verlag, editor, Proc. of the Intern. Symp. on interval mathematics, pages 121–132. 1985.
[25]: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical recipes, the art of scientific computing. Cambridge University Press, 3rd edition, 2007.
[26]: A. Sedoglavic. Méthodes seminumériques en algèbre différentielle ; applications à l'étude des propriétés structurelles de systèmes différentiels algébriques en automatique. PhD thesis, École polytechnique, 2001.

			(13)
			(14)
			(15)
			(16)

1.Introduction

2.Analytic continuation

2.1.Notations

2.2.Reminders from complex analysis

2.3.Analyticity on open half disks

2.4.A refinement for larger compact half disks

3.Computing formal power series solutions

Iterative method

Recurrence relations

The lazy power series approach

The relaxed power series approach

Newton's method

4.Numerical integration using Taylor expansions

4.1.Naive integration using Taylor series

4.2.Step size and error analysis

5.Fighting stiffness

5.1.Integrating stiff equations using Taylor series

5.2.Convergence of the -iteration

5.3.Quality of the computed solution

5.4.Solving steady-state problems through the first variation

6.Certification

6.1.Ball arithmetic

6.2.Computing bounds on compact half disks

6.3.Certified integration of stiff differential equations

6.4.Certification of approximate solutions

6.5.Curbing the wrapping effect

6.6.Handling close eigenvalues

7.Conclusion

Bibliography