Reliable homotopy continuation

Disclaimer. This paper is a preliminary version, which is mainly intended as a working program for an ongoing implementation in the Mathemagix system. It is expected that several adjustments and corrections will have to be made during the implementation of the algorithms in this paper and their proof reading by colleagues.

1.Introduction

Besides Gröbner basis computations, homotopy methods are a popular technique for solving systems of polynomial equations. In this paper, we will only consider zero dimensional systems. Given such a system

and to compute the solutions to

by following the solutions of

from

. Two main approaches exist:

It is surprising that little effort has been undertaken so far in order to bring both approaches closer together. Particularly interesting challenges are how to make numeric homotopy as reliable as possible and how to reconstruct exact end results from the numeric output. Part of this situation might be due to the fact that interval analysis [Moo66, AH83, Neu90, JKDW01, Kul08, MKC09, Rum10] is not so well-known in the communities where homotopy methods were developed, with the exception of one early paper [Kea94]. The main objective paper is to systematically exploit interval analysis techniques in the context of homotopy continuation. We will show how to certify homotopy continuations as well as single and multiple solutions of the polynomial system

Section 3 is devoted to preliminaries from the area of reliable computation. In section 3.1, we start by recalling the basic principles of ball arithmetic [vdH09], which is a more suitable variant of interval arithmetic for our purposes. In section 3.2, we pursue by recalling the concept of a Taylor model [MB96, MB04], which is useful in order to compute with reliable enclosures of multivariate analytic functions on polydisks. We also introduce a variant of Taylors models in section 3.3, which simultaneously encloses an analytic function and a finite number of its derivatives. In sections 3.4 and 3.5, we discuss the well known problem of overestimation which is inherent to ball arithmetic. We will provide some techniques to analyze, quantify and reduce overestimation.

Before attacking the topic of certified path tracking, it is useful to review the theory of numeric path tracking first. In section 4, we start with the case of non singular paths, in which case we use a classical predictor corrector approach based on Euler-Newton's method. The goal of a numeric path tracker is to advance as fast as possible on the solution path while minimizing the risk of errors. Clearly, the working precision has to be sufficiently large in order to ensure that function evaluations are reasonably accurate. In section 4.4, we show how to find a suitable working precision using ball arithmetic. We consider this approach to be simpler, more robust and more general than the one proposed in [BSHW08]. In order to reduce the risk of jumping from one path to another path, we also need a criterion for checking whether our numeric approximations stay reasonably close to the true solution path. A numerically robust way to do this is to ensure that the Jacobian of

does not change to rapidly during each step; see section 4.5 and [BSHW08] for a related approach. Another technique is to detect near collisions of paths and undertake special action in this case; see section 4.6.

In section 5, we turn our attention to homotopies (3) such that the end system (1) admits multiple solutions. We will see that Euler-Newton iterations only admit a linear convergence near multiple solutions. Therefore, it is useful to search for alternative iterations which admit a better convergence. Now the solution path near a multiple solution is given by a convergent Puiseux series in

. When letting

turn around the origin, we thus fall on another solution path. The collection of paths which are obtained through repeated rotations of this kind is called a herd. In sections 5.2 and 5.3, we will describe a new path tracking method with quadratic convergence, which operates simultaneously on all paths in a herd. The remaining issue of how to detect clusters and herds will be described in sections 5.4, 5.5 and 5.6.

In section 6, we turn our attention to the certification of single roots of (1) and single steps of a path tracker. An efficient and robust method for the certification of solutions to systems of non linear equations is Krawczyk's method [Kra69], with several improvements by Rump [Rum80]. In section 6.1, we adapt this classical method to the setting of ball arithmetic. In section 6.2, we will see that an easy generalization of this method provides an algorithm for certified path tracking. An alternative such algorithm was given in [Kea94], but the present algorithm presents similar advantages as Krawczyk's method with respect to other methods for the certification of solutions to systems of non linear equations. However, both methods still suffer from overestimation due to the fact that error bounds are computed on a polydisk which contains the solution path. Using the technique of Taylor models, we will show in section 6.3 that it possible to compute the error bounds in small tubes around the actual solution path, thereby reducing the problem of overestimation.

In the last section, we consider the more difficult problem of certifying multiple roots. Deflation is a classical technique in order to solve this difficulty. However, deflation usually requires the computation of a large number of derivatives of the system, which becomes prohibitive for large clusters of solutions. Notice that solutions which tend to infinity should also be considered as being part of one or more large clusters if we want to compute all solutions to (1). Our certification strategy is again based on the simultaneous consideration of all solution paths in a herd. In sections 7.2 and 7.3, we will show that a herd of solution paths can be considered as a single isolated solution path of a new “suitable fattened” system of equations. From the complexity point of view, if the herd contains

paths, then the evaluation of the fattened system is only

times more expensive than the evaluation of the original system, up to logarithmic factors. Moreover, a single large cluster often contains many different herds, which can be considered separately for our technique. This is particularly useful for the separation of the various paths which tend to infinity. In section 7.4, we will show how to certify the global set of solutions to the system

and how to reconstruct equations for the exact solutions.

2.Notations

Positive elements

Vector notation

Matrix notation

Directed acyclic graphs

is a typical dag for the expression

. We will denote by

the size of a dag

. For instance, the size of the above dag is

3.Reliable arithmetic

3.1.Ball arithmetic

Let us briefly recall the principles behind ball arithmetic. Given a normed vector space

, we will denote by

the set of closed balls with centers in

and radii in

. Given such a ball

, we will denote its center by

and its radius by

. Conversely, given

and

, we will denote by

the closed ball with center

and radius

A continuous operation

is said to lift into an operation

on balls, which is usually also denoted by

, if the inclusion property

is satisfied for any

and

. We also say that

is an enclosure for the set

, whenever (5) holds. For instance, if

is a Banach algebra, then we may take

Similar formulas can be given for division and elementary functions. Certified upper and lower bounds for

will be denoted by

and

It is convenient to extend the notion of a ball to more general radius types, which only carry a partial ordering. This allows us for instance to regard a vector

of balls as a “vectorial ball” with center

and radius

. If

, then we write

if and only if

for all

. A similar remark holds for matrices and power series with ball coefficients.

In concrete machine computations, numbers are usually approximated by floating point numbers with a finite precision. Let

be the set of floating point numbers at a given working precision, which we will assume fixed. It is customary to include the infinities

as well. The IEEE754 standard [ANS08] specifies how to perform basic arithmetic with floating point numbers in a predictable way, by specifying a rounding mode

among “down”, “up” and “nearest”. A multiple precision implementation of this standard is available in the Mpfr library [HLRZ00]. Given an operation

, we will denote by

its approximation using floating pointing arithmetic with rounding mode

. This notation extends to the case when

and

are replaced by their complexifications

and

Let

and

. We will denote by

the set of closed balls in

with centers in

and radii in

. In this case, we will also allow for balls with an infinite radius. A continuous operation

is again said to lift to an operation

on balls if (5) holds for any

and

. The formulas for the ring operations may now be adapted to

where

and

are reliable bounds for the rounding errors induced by the corresponding floating point operations on the centers; see [vdH09] for more details.

In order to ease the remainder of our exposition, we will avoid technicalities related to rounding problems, and compute with “idealized” balls with centers in

and radii in

. For those who are familiar with rounding errors, it should not be difficult though to adapt our results to more realistic machine computations.

Remark 1. In classical interval analysis so called interval lifts of operations

are sometimes required to satisfy the inclusion monotonicity property

for all

, which clearly implies the usual inclusion property (5). For floating intervals, it is easy to ensure this stronger property using correct rounding. In the ball setting, the exact ring operations in

and

are clearly inclusion monotonic, but it seems cumbersome to preserve this stronger property for floating balls. For this reason, we systematically develop our theory without assuming inclusion monotonicity.

3.2.Taylor models

If we are computing with analytic functions on a disk, or multivariate analytic functions on a polydisk, then Taylor models [MB96, MB04] provide a suitable functional analogue for ball arithmetic. We will use a multivariate setup with

as our coordinates and a polydisk

for a fixed

. Taylor models come in different blends, depending on whether we use a global error bound on

or individual bounds for the coefficients of the polynomial approximation. Individual bounds are sharper (especially if we truncate up to an small order such that the remainder is not that small), but more expensive to compute. Our general setup covers all possible blends of Taylor models.

We first need some more definitions and notations. Assume that

is given the natural partial ordering. Let

denote the

-th canonical basis vector of

, so that

and

for

. For every

, recall that

. A subset

is called an initial segment, if for any

and

with

, we have

. In that case, we write

and

. In what follows, we assume that

and

are fixed initial segments of

with

. For instance, we may take

and

Let

. Given a series

, we will write

for its support. Given a subset

and a subset

, we write

and

. If

is analytic on

, then we denote its sup-norm by

A Taylor model is a tuple

, where

and

are as above,

and

. We will write

for the set of such Taylor models. Given

and

, we will also denote

and

. Given an analytic function

, we write

, if there exists a decomposition

for any

. Given two Taylor models

, we will say that

is included in

, and we write

for any

. This holds in particular if

for all

, in which case we say that

is strongly included in

and write

. We finally define

Addition, subtraction and scalar multiplication are defined in a natural way on Taylor models. For multiplication, we need a projection

with

for all

and

. One way to construct such a mapping is as follows. For

, we must take

. For

, let

be largest such that

. Then we recursively define

. Given

, we now define their product by

Using the observation that

, this product satisfies the inclusion property that

for any analytic functions

and

3.3.

-stable Taylor models

For some applications, it is convenient to use Taylor models for enclosing both an analytic function and a certain number of its derivatives. Let us show how to incorporate this in our formalism. Throughout this section, we assume that

and that

is an initial segment with

Given a Taylor model

and

, we notice that

can be regarded as a Taylor model in

with

. Let

be an analytic function and

. We define the relations

and

Let

be an operation. Then

is said to

-lift to

, if for all

and all

, we have

. Addition, subtraction and scalar multiplication

-lift in the usual way. As to multiplication, we take

In order to see that

satisfies the

-inclusion property, it suffices to check that

3.4.Overestimation

The major problem in the area of ball arithmetic is overestimation. For example, even though the expression

evaluates to zero for any

, its evaluation at any ball in

with a non zero radius is not identically equal to zero. For instance,

Algorithms which rely on ball arithmetic have to be designed with care in order to avoid this kind of overly pessimistic error bounds. In particular, if we evaluate a dag using ball arithmetic, then a symbolically equivalent dag might lead to better error bounds.

Consider a continuous function

with

as in section 3.1. We recall that

is said to lift into an operation

if the inclusion property

is satisfied for all

and

. Clearly, such a lift is not unique: for any

with

for all

, the function

is also a lift of

. If we require that

, then the best possible lift is given by

In general, this lift may be expensive to compute. Nevertheless, its existence suggest the following definition of the quality of a lift. The overestimation

is defined by

This quantity is easier to study if we let

tend to zero. Accordingly, we also define the pointwise overestimation function

is computed by evaluating a dag

, then it would be nice to have explicit formulas for the pointwise overestimation. For

and assuming that the lift

is evaluated using the default ball implementations of

and

from section 3.1 , we claim that there exists a dag

with

The example shows that we have an infinite amount of overestimation near double zeros, except if the dag is explicitly given as a square near the double zero. More generally, for the dag

with an

-fold zero, we obtain

At a distance

of the zero, ball arithmetic thus produces bounds which are

times too pessimistic.

Remark 3. An interesting problem is whether a good understanding of the pointwise overestimation also helps us to bound the overestimation on more general balls. One concrete question is whether we have

for all polynomial dags

and balls

. This inequality seems to hold in all easy cases that we have looked at, but we do not have a proof that it holds in general.

3.5.Reducing the overestimation

The example 2 shows that standard ball arithmetic generally produces an infinite amount of overestimation near double or multiple zeros. This raises the problem how to compute better ball lifts which do not present this drawback.

One possible remedy is to systematically compute the ball lifts using Taylor models. Indeed, assume that we want to evaluate

at the ball

. Let

and

be as in section 3.2 and let

be the corresponding domain of Taylor models in

. Let

and consider the Taylor model evaluation of

yields an enclosure of

. Although the evaluation of

is usually far more expensive than the evaluation of

, let us now study how much the overestimation has been reduced.

Let

and let us introduce the operator

, which generalizes the mapping

. The operator is defined by induction over the size of

4.Numeric path tracking

4.1.General framework

Let

be an open subset of

and

an analytic function. We consider

as a function

and the time

, where

and

, and also call

a homotopy. Assuming that

for some

and that we are not in a “degenerate” case, there exists a unique analytic function

with

for all

. We are interested in the value of

when

. More generally, given a vector

of vectors, there exists a unique function

with

for all

The goal of a numeric path tracker is to approximate the function

as well and as quickly possible and, above all, to compute its value

at the “end point”

. In what follows, we will denote by

the set of floating point numbers with

bit mantissas. We also define

and assume that we have a program for computing a numeric approximation

. Given

with

, we thus want to compute

with

, by following the homotopy.

of polynomial equations. The number

of solutions to a generic system of this kind is given by the Bezout number

, where

is the total degree of

for each

. For suitable scaling parameters

, we now define

If the system (11) is zero dimensional and the values

are complex and sufficiently random (we also say that the homotopy is in general position), then the system

is also zero dimensional for every

. In what follows we will always assume that the homotopy has been chosen in such a way.

4.2.Solutions at infinity

One classical difficulty with homotopy methods for solving a polynomial system (11) is that many of the solution paths

may tend to infinity in the sense that

for some

and

. Computations which infinities can be avoided by rewriting the equations in projective coordinates. More precisely, setting

, the projectivation

of a polynomial

is defined by

the composite system (12–13) is again zero dimensional, but without solutions at infinity. It is easy to reconstruct solutions to (11) from solutions to (12–13) and vice versa.

4.3.Predictor corrector methods

Assume that we have a way to approximate the Jacobian

. For instance, if

is given by a dag, then a dag for

can be computed using forward differentiation, and

just corresponds to the approximated evaluation of this dag.

Assume that we are given

and

at a certain point where

. We may write

as the horizontal join of two matrices

and

. Given

close to

, we may find a

for which

using Euler-Newton's method

The replacement

is called a prediction step. We may still apply the formula when

, in which case

is usually a better approximation than

to a genuine zero of

than

. In this situation, the replacement

is called a correction step.

From the computational point of view, the evaluation of the Jacobian

is usually about

times more expensive than the evaluation of the function

itself (except for large

and sparse

). Instead of reevaluating the Jacobian after the prediction step at

, it may therefore be worth it to perform a few correction steps using the Jacobian at

instead:

Since the convergence of

is only linear, the number

is typically chosen quite small (

). One full prediction-correction cyclus now just consists of the replacement

From the complexity point of view, the evaluation of

and

is usually far more expensive than the cost

of linear algebra at size

, at least for the examples we will be interested in here. Therefore, it will not be necessary to device the linear algebra algorithms with special care (for instance, we may simply compute the inverse

once and for all, instead of using LU decompositions). On the other hand, we typically want to increase the step size

as much as possible, while trying to stay reasonably close to the true solution path.

4.4.Precision control

One obvious source of numeric errors is when the numeric precision being used is insufficient for producing sensible results. In [BSHW08], a strategy has been proposed for selecting a sufficient precision for homotopy methods to be numerically reliable. We will now propose an alternative method for finding such a precision, whose justification is based on a simpler argument.

Let

be the current working precision. Our method is based on the following idea: when evaluating

, the actual precision

of the result is usually smaller than

and of the form

for some fixed constant. We will call

the effective precision and we may expect the numeric evaluations to be reliable as long as

is picked sufficiently large such that

remains above a certain threshold

(e.g.

We still need a more precise definition of the effective precision or a simple way to compute it. Assuming that

admits a ball lift, we may evaluate

at the ball

. Then

provides an estimate for the relative precision of

. If

, then this precision is potentially quite low. In that case, we may also consider

at the next time

. Instead of performing one extra ball evaluation, we may also use the following approximation of

4.5.Step size control

Since purely numeric homotopy methods are usually being designed for speed, the main focus is not on being 100% fool proof. Nevertheless, it remains worth it to search for cheap ways in order to detect errors and adapt the stepsize so as to avoid potential errors.

Now assume that we perform one full prediction correction cyclus

. We first need a criterion for when to accept such a step. The main problem with the design of numeric criteria is there is no way to decide whether a numeric quantity is small or large; such checks can only be performed with respect to other quantities. Instead of checking whether we remain close to the genuine solution path, it is therefore more robust to check that the Jacobian

does not change not change to quickly on the interval

for a fixed threshold

(e.g.

). Here we may use any matrix norm

, so it is most convenient to chose one which is easy to compute:

The condition (14) is not fully satisfactory yet, since it relies on the expensive computation of a Jacobian

. This is acceptable if the step has a good chance of being accepted (since we will need the Jacobian anyway for the next step), but annoying if the step is to be rejected. Before checking (14), it is therefore wise to perform a few cheaper checks in order to increase the probability that (14) will hold indeed. In particular, if

, then we may verify that

for the max-norm on vectors, where

(e.g.

) and

. This simplified check is linked to (14) by remarking that

. The new check (15) should not be applied when

and

are too close for

and

to be computed with sufficient precision. More precisely, it should really be replaced by the check

where

is slightly smaller than one (e.g.

) and

stands for the “effective working precision” from section 4.4.

In addition to the above checks, one might wish to ensure that

is reasonably small after each step. Unfortunately, there is no satisfactory reference with respect which smallness can be checked, except for

. The best we can do therefore consists of checking whether

tend to

at some indicated rate:

for all

, where

(e.g.

). Again, we need to insert a safety exemption for the case when the convergence is exceptionally good.

Once that we have a criterion on whether a step

should be accepted, an algorithm for automatic stepsize control is easily implemented: assuming that we are walking from

, we start by setting

. Given

and

, we try a step

until

. If the step fails, then we set

with

(e.g.

), and retry for the smaller stepsize. Otherwise, we accept the step

and set

for the next step, where

(e.g.

4.6.Near collisions

Another way to look at the numerical error problem is to investigate what can actually go wrong. Theoretically speaking, around each true solution path

, there exists a small tube

of variable polyradius

, where Newton's method converges to the true solution

. As long as our current approximation

at time

remains in this tube

, no errors will occur. Now the Newton iterations have a strong tendency of projecting back into the tubes, especially if we use the additional safeguard (17). Nevertheless, it might happen that we jump from one tube into another tube, whenever two solution paths come close together.

If we are considering a homotopy for solving a polynomial system

, then various solution paths will actually meet at

if the system admits multiple roots. Such multiple roots are an intrinsic difficulty and we will need dedicated “end game” strategies to ensure good numeric convergence in this case (see section 5 below).

For

, and for suitably prepared functions

, the Lebesgue probability that two solutions paths meet at a point is zero. Nevertheless, we may have near collisions, which usually occur in pairs: the probability that more than two paths simultaneously pass close to a same point is extremely low.

So assume that we have a near collision of two solution paths. Then we have a true collision at

for some complex time

near the real axis. Locally around this collision point, the two paths are then given by

for some vector

. If we only know

at a few points, then we may try to compute

and

, and also check whether the second path

indeed exists.

Now assume that we have approximated

and derivative

at two times

. Denote these approximations by

and

. Then

We next perform several safety checks. First of all, we obtained

as the division of two vectors; we may use the mean value of the componentwise divisions and check that the variance remain small. We next verify that

and

are reasonably close. We also verify that the Newton iteration starting at

converges to a solution close to

. We finally verify that the same thing holds for

instead of

, where

We will not go into technical details on the precise numerical checks here, since section 5.3 below contains a similar discussion for the case of multiple roots at

. We may also adapt the herd iteration from section 5.2 below to near collisions, which allows for the simultaneous continuation of

and

. Contrary to the case when

, we also need to recompute better estimations of

at every step, which can be done via the simultaneous computation of

and the two “conjugate” paths

with

. Indeed, using the higher order expansion

from which we may deduce high quality approximations of

and

. As soon as

is small with respect to

, then the junction between paths and their conjugates occurs and we know how to traverse the near collision.

5.Multiple roots

5.1.Straightforward Euler-Newton type methods

Consider a homotopy induced by a polynomial system (11) with a zero dimensional set of solutions. It frequently occurs that some of the solutions are multiple roots, in which case the predictor corrector algorithm slows down significantly when

approaches

. This is due to the fact that Newton's method only has a linear convergence if we are approaching a multiple root, whereas the convergence is quadratic for single roots.

In order to get a better understanding of this phenomenon, it is instructive to quantify the slow down in the case of an

-fold root of a univariate polynomial

, which is more or less representative for the general case. In the neighbourhood of the root

, we have

In particular, we see that we need roughly

iterations in order to divide

. We also notice that

is roughly divided by

at every iteration. For complexity measures, it is more reasonable to study the speed of convergence of

rather than

itself. Indeed, the relative precision of an

-fold root is intrinsically

times smaller than the working precision.

If we are rather considering a homotopy

, then we usually have

. Locally, we may thus write

Roughly speaking, this means that

. Hence,

is multiplied by

at every step and

is multiplied by

every

steps.

5.2.The herd iteration

For high precision computations, it would be nice to have an algorithm with quadratic convergence in

. Before we give such an algorithm, let us first introduce some terminology and study the behaviour of the solutions paths when

By assumption, we are given a system (11) with an

-fold root

. Consider a solution path

for the homotopy with

. Since

is algebraic in

, we may expand

as a Puiseux series in

for a certain ramification index

(which we assume to be taken minimal). Now letting

turn around

once, we have

where

. When turning repeatedly, we thus obtain

pairwise distinct solutions paths

with

. We will call such a family of solution paths a herd.

Contrary to the homotopy methods from section 4, which operate on individual paths, the iteration that we will present now simultaneously operates on all paths in a herd. Consider a solution path

with

as above and the corresponding herd

with

. We assume that both

and

are known for a given

and all

. Let

and

denote the FFT-transforms of the vectors

and

with respect to

. Then we have

for all

. We call (18) the herd prediction. This prediction may be corrected using

conventional Newton iterations at time

, for a fixed constant

. A complete cyclus of this type will be called a herd iteration.

5.3.Step size control for herd iterations

Several technical details need to be settled in order to obtain a robust implementation of herd iterations. First of all, we need a numeric criterion for deciding when the approximations

and

are of a sufficient quality for starting our herd iteration. Clearly, the error of the approximation should be in

We may first ensure ourselves that the approximation can not substantially be improved using Newton iterations: let

be the result of applying one Newton iteration to

at time

. Then we check whether

for some threshold

, such as

(although this check becomes unstable if

, we notice that this situation cannot arise systematically for

The check (19) for

does not yet guarantee that the

correspond to approximate evaluations of the Puiseux expansions. In order to check that this is indeed the case, we first compute the

as described in the previous section. Defining

we next evaluate

for all

and apply one Newton iteration at time

to the results, yielding

. We now check whether

for some threshold

, such as

, and all

. Of course, this second check is more expensive than the first check (19). The thresholds should therefore be adjusted in such a way that the second check is likely to succeed whenever the first one does.

The above criteria can also be used for deciding whether a proposed herd iteration from

should be accepted or not. We still have to decide how to chose

. For a fixed constant

and a positive integer

which may change at every step, we will take

If a step is accepted, then we increase

by one or a larger integer smaller than

. If a step is not accepted, then we decrease

by one and repeat the same procedure until acceptance or

. If

, then we have either reached the best possible accuracy for the current working precision, or our

paths did not really converge to the same point

. The first case occurs whenever the effective precision from section 4.4 drops below a given threshold. In the latter case, we revert to individual homotopies for further continuation.

5.4.Detection of clusters

Let us now go back to the initial polynomial system (11) and assume that we have computed numerical approximations of all

individual homotopies

up till a certain time

. We need a way to partition the individual paths into herds. One obvious way is to follow all solution paths from

and deduce the corresponding permutation of

. However, this computation is quite expensive, so it would be nice to have something faster.

A first step towards the detection of herds is to find all clusters, i.e. all groups of paths which tend to the same limit

. Here we notice that one cluster may contain several herds, as in the example

where all four solution paths

with

tend to the quadruple root

. This cluster contains two herds

and

The radii of these balls has been chosen with care, such that, with high probability, any two paths which belong to the same herd are also in the same connected component of

. This is best verified on the case of path

. Then the next path in the cluster is

and

An efficient way to separate different connected components of

is via projection. Let

be a random vector of real numbers of length

. Then any point

may be projected to the vector product

. Applying this projection to our balls

, we obtain intervals

. We may sort the

(and the corresponding

) on their centers in time

and compute the various connected components of

using a linear pass. Whenever

and

are in different connected components, then so are

and

. Assuming that

is sufficiently small, application of this procedure for

random vectors

results with probability one in the separation of all connected components corresponding to different clusters.

5.5.Detection of herds

Let

be a set of indices such that the

with

form a cluster with limit

. We still need a way to find the various herds inside the cluster. In a similar way as in section 5.3, we may improve the quality of our approximations

and

via Newton iteration until

and

. From now on, we assume that we have done this.

is not too large (so that

has a small numerator and denominator), then we also obtain reasonably accurate approximations

and

is indeed close to some

with

. Doing this for all

, we thus obtain a candidate permutation

with

for all

. Each cycle in this permutation induces a candidate herd. Using the criteria from 5.3, we may next check whether the quality of the candidate herd is sufficient. If not, then we may always resort to the more expensive computation of the solution path from

5.6.Synchronization

Our algorithms for the previous sections for cluster and herd detection rely on the availability of approximations

on all paths at the same time

. Usually the individual homotopies are launched in parallel and advance at different speeds. Consequently, the synchronization of all paths at the same time

is a non trivial matter.

Strictly speaking, we notice that it is not necessary to synchronize all paths, but rather those paths which belong to the same cluster or herd. In particular, we will concentrate on those paths which tend to multiple roots.

So consider a path

which tends to a multiple root

. As long as

is approximated using an individual continuation, we have seen that the convergence to

is linear. For a fixed

(such as

), the computation of

at all “checkpoints”

thus only requires a constant overhead. At every checkpoint, we may now launch the algorithm for the detection of clusters. For every candidate cluster

, we next determine the checkpoint

with highest

at which

is available for all

. We launch our algorithm for the detection of herds at this checkpoint

In addition, it is a good practice to check that we still have points on all

paths at every checkpoint. For paths

which tend to a single root, we may approximate

for large

using a single step continuation from

. For the approximation of

using (21), we notice that it important that no paths of the cluster are missing or counted twice. Indeed, in the contrary case, we only have

with

for all

, which is insufficient for the computation of accurate approximations of

and

6.Certified homotopies

6.1.Certification of Newton's method

Consider an analytic function

on some open subset

and assume that

admits a ball lift. Given an isolated root

, it is well known that Newton's method converges to

in a small neighbourhood of

. It is a natural question to explicitly compute a ball neighbourhood for which this is the case. One method which is both efficient and quite tight was proposed by Krawczyk [Kra69]. Recall that

denotes the Jacobian of

It follows that

is an analytic function from the compact ball

into itself. By Brouwer's fixed point theorem, we conclude that there exists a

with

The above method is still a bit unsatisfactory in the sense that it does not guarantee the uniqueness of the solution. Denoting by

the interior of a subset

, the following sharpening of the method is due to Rump [Rum80].

Proof. Let us first show that the spectral norm (i.e. the norm of the largest eigenvalue) of any

. Indeed, our assumption implies

Now consider

. By what precedes, any matrix

is invertible. For any two distinct points

, we have

By what precedes, it follows that

. We conclude that

. The existence of a fixed point follows from theorem 4.

Assuming that we have computed a numeric approximation

to a root

, a second question is how to find a suitable ball

for which the corollaries apply. Starting with

, a simple solution is consider the sequence defined by

Whenever

, then we are done. In order to ensure the convergence of this method, we need to tweak the recurrence (22) and replace it by

for suitable small positive constants

and

. We refer to [Rum80] for more details on this technique, which is called

-inflation.

6.2.Certification of a numeric homotopy continuation

Assume that the polynomial system (11) admits only simple roots and that we have obtained numeric approximations

for all these roots using a numeric path tracker. Then theorem ? suffices for the joint certification of the numeric approximations

. Indeed, using the above technique, we first compute balls

for which theorem ? applies. To conclude, it then suffices to check that these balls are pairwise disjoint. This can be done using the same algorithm as for the detection of clusters, which was described in section 5.4.

In the case when two balls

and

do intersect, then we recompute approximations for the paths

and

using a smaller step size, that is, by lowering the constant

in (14). We keep doing so until none of the balls

intersect; even if some of the paths

may have been permuted due to numerical errors, the final set of all

is correct if none of the balls intersect. Indeed, each of the balls contains a solution and there can be no more solutions than the number predicted by the Bezout bound.

and

intersect then, instead of recomputing the paths

and

using smaller and smaller step sizes, we may also search for a way to certify the entire homotopy computations. This will be the topic of the remainder of this section. Let us first show how to adapt the theory from the previous section to certified path tracking. From now on, we assume that

is an analytic function which admits a ball lift.

Proof. Let

and consider the function

. Then

encloses

and

encloses

, and we conclude by theorem 6.

Clearly, for any

, theorem 8 ensures the existence of a unique solution path from

in the tube

. As at the end of the previous section, the question again arises how to compute balls

and

for which the conditions of the theorem are likely to be satisfied. Since the computation of

is expensive, it is important to keep down the number of iterations of the type (22) or (23) as much as possible (say at most one iteration).

Now assume that we performed a numeric homotopy computation from

. Then a reasonable first guess is to take

for some

, say

. Unfortunately, if one of the components of

tends to zero, then this guess turns out to be inadequate. Therefore, it is recommended to use an additional inflation proportional to the norm of

for some small

, say

. Another idea is to use the radius of the previous step as a reference (except for the very first step, of course). For instance, if our previous step went from

, then we may take

6.3.Certification via tubular models

One important disadvantage of the method from the previous section for the certification of one path tracking step is that we use global error bounds on the tube

. Consequently, the inaccuracy

is proportional to the step size

, whence any overestimation in the evaluation of

due to the inaccuracy in

requires a reduction of the step size.

For this reason, it is much better to follow the solution path as closely as possible instead of enclosing it in a “square tube”. This can be achieved via the use of Taylor models. Using

-stable Taylor models, it is possible to simultaneously compute of accurate enclosures for

and

on the tube.

More precisely, let

and

. For a fixed

, let

be an initial segment of

of the form

and let

. A

-stable Taylor model in

will also be called a tubular model. We will write

for the set of tubular models. Given

, we let

and

be such that

Proof. For an illustration of the proof, see figure 1. Let

and

. By construction, and using the facts that

and

, we have

for any

and

. For a fixed

, it follows that

encloses

on the disk

. Our hypothesis (24) also implies that

In order to apply the theorem, it remains to be shown how to find a good tube, i.e. how to choose

and

. For a fixed order

of the approximation, the idea is to adjust

and

such that

can be chosen minimal.

Let us first consider the first order case

. Assume that we performed a numeric path continuation from

and that both

and

are approximatively known. Then there exists a unique curve

of degree three with

and

. Let

be a linear curve which minimizes the maximum

for every

. Then we take

and

. We may also take

for some fixed

such as

. However, for better performance it is recommended to apply an additional inflation to

, similar to what we did in the previous section.

For higher orders

, we proceed in an essentially similar way. We first compute a high order numeric polynomial approximation

. For orders

, this may require the accurate approximation of additional points

with

on the solution path. We next find a

-th order polynomial

which approximates

as good as possible and choose our tube in a similar way as above. It should be noticed that the evaluation

in theorem 9 is at least thrice as expensive as the numeric evaluation of

. This makes it worth it to improve the quality of the numeric approximations of points

on the curve using one or more additional Newton iterations. The use of higher order approximations makes it possible to choose

very small, thereby avoiding a great deal of the overestimation due to the use of ball arithmetic.

7.Certification of multiple roots

7.1.The univariate case

In section 5.1, we have studied in detail the numeric determination of a multiple root of a univariate polynomial. It is instructive to take up this study and examine how we certify such multiple roots. Since the property of being an

-fold root is lost under small perturbations, this is actually impossible using ball arithmetic. The best we can hope for is to certify the existence of

roots in a small ball, or the existence of an

-fold root of a small perturbation of the polynomial (see also [Rum10]). In this section we adopt the first point of view; a variant of the approach to perturb the polynomial itself will be studied in the next sections.

So consider a polynomial

with an approximate

-fold root at

and assume that we wish to certify that

admits exactly

roots in the ball

, for some

. One first strategy is to make use of the Taylor series expansion of

. More precisely, let

be the set of univariate Taylor models in

with

and

for some

. Evaluating

, we obtain a Taylor model

with the property that

for any

. It remains to be shown that any

admits

roots in

. We claim that this is the case if

for all

with

. By Rouché's theorem, it follows that

and

admit the same number of roots on

. If

becomes large, or if

admits other roots close to

, then the bound (25) often does not hold. In that case, one may use more sophisticated techniques from [Sch82, vdH11] in order to certify that

admits

roots in

. From the complexity point of view, the series expansion method requires

evaluations of

, where

denotes the cost of multiplying two polynomials of degrees

Another approach is to apply Rouché's theorem in a more direct way by computing

on a path

starting at

and which circles around the origin once. If the reliable image

of this path avoids the origin, then the number of roots of

coincides with the number of times that

turns around the origin. More precisely, let

for a suitable

(see also below) and let

for

. Then we evaluate

and check whether

for all

. If this is the case, then

yields the exact number of roots of

inside

. This method requires

evaluations of

, but

needs to be sufficiently large if we want to ensure a reasonable chance of success for the method.

Let us investigate the choice of an appropriate

in more detail on the simplest example when

and

We recall from example 2 that

also corresponds to the punctual overestimation of the ball evaluation of

. If we want to reduce

to a quantity which does not depend on

, then it follows from the considerations in section 3.5 that we need to evaluate

using Taylor models of order at least

. However, in that case, we might just as well use the first method based on a direct series expansion of

7.2.Certification of herd homotopies

Let us now consider a more general system (11) and assume that we are given a herd of solution paths

which all tend (at least approximately) to the same limit

. Instead of viewing the

as distinct individual paths, we would like to consider the whole herd

as a single multivalued path.

From the algebraic point of view, it is more convenient to rather consider the ideal

which annihilates

instead of

itself. There are several ways to represent this ideal

by a system

of polynomial equations. One option is to require that

be a reduced Gröbner basis for

. Another option is to use Kronecker representations. Since we are computing with balls of a fixed bit precision, coefficient growth is not a problem, so it best to choose a simple representation which minimizes the number of coefficient parameters.

Now recall that each

can be considered as a vector

of Puiseux series

of valuations

. Setting

we notice that

is invariant if we turn

once around the origin. This means that

is really an analytic function in

at the origin. Assuming general position,

is actually the minimal annihilator of

. In what follows, we will represent

by the system of polynomials

where

for each

. Now instead of evaluating the homotopy

at an ordinary point

, we may evaluate

at a point

, where

is the quotient algebra

. When evaluating at the multivalued point

represented by (26), we would take

. This leads to a lift of

as a homotopy

, for some open subset

In order to apply theorem 8, we need a homotopy over

rather than

. Therefore, let us show how to reformulate

as a homotopy

for a suitable open subset

. The idea is to encode the system (26) by a vector

we denote

for each

. Let

and consider the evaluation

and

. This is possible if

for any root

of the system

. There exists a unique point

such that

for all

. We take

. If

can be computed by a dag of size

then

can be computed by a dag of size

, since a multiplication in

can be done using a dag of size

Proof. By theorem 8, there exists a unique solution

to the system

, for each

, whence a unique set

with

. The uniqueness implies that this set coincides with the set of analytic continuations of the solution paths of

from

for each

. After reordering, this shows that there exist unique paths

with

and

, for all

and

. Now for

, solutions of monic equations of the form

remain bounded. By continuity, the

therefore tend to limits in

, and

Using the univariate root certification methods from section 7.1, we may also compute ball enclosures for

. The theorem therefore provides a way to certify all solutions of a numeric homotopy associated to a polynomial system (11), even in the presence of multiple solutions.

Indeed, let

be the set of all solution paths. For some small

, we perform a certified homotopy continuation from

until

, using the techniques from section 6. This is possible since the

are pairwise distinct, when assuming general position. We next partition

, such that

is either a singleton or a herd for each

. For each singleton, we try to apply theorem 8 for

and for each herd, we try to apply theorem 10. If this works, then we obtain the desired enclosures for the solutions of (11), counted with multiplicities. If not, then we choose a smaller

and repeat the same procedure.

For the termination of this algorithm, it remains to be checked that theorem 10 indeed applies if

is sufficiently small. In other words, setting

, we have to show that

is invertible. We will only give a rough justification, which we intend to work out in a forthcoming paper. Assuming the contrary and “general position”, the perturbation

would exhibit a non trivial monodromy in

, and contradict our assumption that the set

is stable under monodromy. We finally notice that our algorithm also works in degenerate situations if we let

be a cluster of paths instead of a herd.

Remark 11. As we already noticed before, there is no purely numeric test for knowing whether a herd tends to an

-fold root

. Nevertheless, if we assume that this is indeed the case, then we notice that

can be approximated with a precision which is close to the current working precision. Indeed, if the herd is given by the system (26), then we may approximate

using

7.3.Algorithmic improvements

Although we have shown how to translate the homotopy

into a homotopy

, we do want to exploit the multiplication in

for computational purposes. In particular, we want to exploit this structure for the computation of the Jacobian

. As in the case of usual Jacobians, we may compute

by evaluating

and

in the deformed algebra

Denoting the result of this evaluation by

, we may write

, where

and

are polynomials of degrees

. Reinterpreting the

as elements of

(which is correct modulo an error in

), we obtain

Now for any

, multiplication by

can be represented by a matrix

in the monomial basis

. Similarly, any matrix

induces a

block matrix

Now the computation of

and its inverse can be done using dags of sizes

, whence multiplication of

or its inverse with a vector in

can be done a dag of size

. In particular, the evaluation of the left hand side of (27) can be done using a dag of size

over

. This is better than a direct computation of

which requires a dag of size

A second issue which has been hidden by the current presentation concerns numeric stability. If the herd indeed tends to an

-fold root

when

, then

tends to

. Now straightforward arithmetic in

tends to be quite unstable. For instance, the evaluation of a polynomial of degree

typically gives rise to a precision loss of

digits. This can be avoided using a shift: instead of working in

and evaluating at

, we rather work in

and evaluate at

. More generally, if

, then we take

, where

, and evaluate at

Another question is whether we can avoid using the extra time parameter

for the final certification. Indeed, corollary 7 is sufficient for the certification of a single isolated root. More generally, let

be a cluster of solutions which tend to an

-fold root

. We may forget about the last equation

and, for some small

, compute all solutions to the system

which are close to

(e.g. by homotopies starting at the

). The system

may then be regarded as a homotopy with respect to the time

. Given a herd

of solution paths, we may then try to construct the system

as usual and consider

as a polynomial equation on the

-dimensional solution surface of this system. This more intrinsic technique is particularly effective if

, in which case we really have to compute the roots in a disk of a univariate analytic equation. For

, the homotopy

does not need to be in general position, and we have not yet worked out the corresponding theory in detail.

7.4.Global certification

Even if the coefficients of the system (11) are all rational or algebraic, then the computed solutions are only numeric approximations. For some applications, it is useful to have exact representations of the solutions. This allows for instance to check whether a given other polynomial with rational coefficients vanishes on the solution set or on some points of the solutions set.

The Kronecker representation provides one useful exact representation for the set of solutions. Modulo a generic linear change of coordinates, we may assume without loss of generality that for any distinct solutions

of (11), their first coordinates

and

are also distinct. Let

be the distinct solutions of (11). Then the Kronecker representation for

is the unique

-tuple

of univariate polynomials with

, such that

is annihilated by the system

for all

. This yields an efficient dichotomic algorithm for the numeric computation of

Assume now that (11) has rational coefficients and that we have computed numeric approximations

with bit precision

. By remark 11, even if

approximates a multiple root, then

is still known with a precision close to

. Using the above method, we may thus compute a numeric approximation

with an accuracy of approximately

bits. We apply rational number reconstruction [GG02, Chapter 5] in order to provide a guess for

with rational coefficients. We may check whether this guess is correct by evaluating

over

. By evaluating over suitable algebras with nilpotent elements, we may check in a similar way whether the multiplicity of each root matches with the numeric multiplicity. If one of these checks fails, then we double the bit precision, use Newton's method to improve the approximations

, and keep iterating.

7.5.Local algebras

Consider a cluster

of paths which tend to a common root

and decompose

such that

is a herd in the cluster for each

. Each herd

gives rise to an ideal

which is represented by a system of the form (26). The intersection

can be computed by Gröbner basis techniques and the limit

yields the local ideal of the system (11) at the multiple root.

Bibliography

[AH83]: G. Alefeld and J. Herzberger. Introduction to interval analysis. Academic Press, New York, 1983.
[ANS08]: ANSI/IEEE. IEEE standard for binary floating-point arithmetic. Technical report, ANSI/IEEE, New York, 2008. ANSI-IEEE Standard 754-2008. Revision of IEEE 754-1985, approved on June 12, 2008 by IEEE Standards Board.
[BHSW06]: D. Bates, J. Hauenstein, A. Sommese, and C. Wampler. Bertini: Software for numerical algebraic geometry. http://www.nd.edu/~sommese/bertini/, 2006.
[BSHW08]: D.J. Bates, A.J. Sommese, J.D. Hauenstein, and C.W. Wampler. Adaptive multiprecision path tracking. SIAM Journal on Numerical Mathematics, 46(2):722–746, 2008.
[Dur08]: C. Durvye. Algorithmes pour la décomposition primaire des idéaux polynomiaux de dimension nulle donnés en évaluation. PhD thesis, Univ. de Versailles (France), 2008.
[GG02]: J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.
[GHH+97]: M. Giusti, K. Hägele, J. Heintz, J.E. Morais, J.L Monta na, and L.M. Pardo. Lower bounds for diophantine approximation. Journal of Pure and Applied Algebra, 117–118:277–317, 1997.
[GHMP95]: M. Giusti, J. Heintz, J.E. Morais, and L.M. Pardo. When polynomial equation systems can be solved fast? In G. Cohen, M. Giusti, and T. Mora, editors, Proc. AAECC'11, volume 948 of Lecture Notes in Computer Science. Springer Verlag, 1995.
[HLRZ00]: G. Hanrot, V. Lefèvre, K. Ryde, and P. Zimmermann. MPFR, a C library for multiple-precision floating-point computations with exact rounding. http://www.mpfr.org, 2000.
[JKDW01]: L. Jaulin, M. Kieffer, O. Didrit, and E. Walter. Applied interval analysis. Springer, London, 2001.
[Kea94]: R.B. Kearfott. An interval step control for continuation methods. SIAM J. Numer. Anal., 31(3):892–914, 1994.
[Kra69]: R. Krawczyk. Newton-algorithmen zur bestimmung von nullstellen mit fehler-schranken. Computing, 4:187–201, 1969.
[Kul08]: U.W. Kulisch. Computer Arithmetic and Validity. Theory, Implementation, and Applications. Number 33 in Studies in Mathematics. de Gruyter, 2008.
[Lec01]: G. Lecerf. Une alternative aux méthodes de réécriture pour la résolution des systes algébriques. PhD thesis, École polytechnique, 2001.
[Ley09]: A. Leykin. NAG. http://www.math.uic.edu/~leykin/NAG4M2, 2009. Macaulay 2 package for numerical algebraic geometry.
[MB96]: K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.
[MB04]: K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
[MKC09]: R.E. Moore, R.B. Kearfott, and M.J. Cloud. Introduction to Interval Analysis. SIAM Press, 2009.
[Moo66]: R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, N.J., 1966.
[Mor87]: A.P. Morgan. Solving polynomial systems using continuation for] engineering and scientific problems. Prentice-Hall, Englewood Cliffs, N.J., 1987.
[Neu90]: A. Neumaier. Interval methods for systems of equations. Cambridge university press, Cambridge, 1990.
[Rum80]: S.M. Rump. Kleine Fehlerschranken bei Matrixproblemen. PhD thesis, Universität Karlsruhe, 1980.
[Rum10]: S.M. Rump. Verification methods: Rigorous results using floating-point arithmetic. Acta Numerica, 19:287–449, 2010.
[Sch82]: A. Schönhage. The fundamental theorem of algebra in terms of computational complexity. Technical report, Math. Inst. Univ. of Tübingen, 1982.
[SW05]: A.J. Sommese and C.W. Wampler. The Numerical Solution of Systems of Polynomials Arising in Engineering and Science. World Scientific, 2005.
[vdH09]: J. van der Hoeven. Ball arithmetic. Technical report, HAL, 2009. http://hal.archives-ouvertes.fr/hal-00432152/fr/.
[vdH11]: J. van der Hoeven. Efficient root counting for analytic functions on a disk. Technical report, HAL, 2011. http://hal.archives-ouvertes.fr/hal-00583139/fr/.
[Ver96]: J. Verschelde. Homotopy continuation methods for solving polynomial systems. PhD thesis, Katholieke Universiteit Leuven, 1996.
[Ver99]: J. Verschelde. PHCpack: A general-purpose solver for polynomial systems by homotopy continuation. ACM Transactions on Mathematical Software, 25(2):251–276, 1999. Algorithm 795.

1.Introduction

2.Notations

Positive elements

Vector notation

Matrix notation

Directed acyclic graphs

3.Reliable arithmetic

3.1.Ball arithmetic

3.2.Taylor models

3.3.-stable Taylor models

3.4.Overestimation

3.5.Reducing the overestimation

4.Numeric path tracking

4.1.General framework

4.2.Solutions at infinity

4.3.Predictor corrector methods

4.4.Precision control

4.5.Step size control

4.6.Near collisions

5.Multiple roots

5.1.Straightforward Euler-Newton type methods

5.2.The herd iteration

5.3.Step size control for herd iterations

5.4.Detection of clusters

5.5.Detection of herds

5.6.Synchronization

6.Certified homotopies

6.1.Certification of Newton's method

6.2.Certification of a numeric homotopy continuation

6.3.Certification via tubular models

7.Certification of multiple roots

7.1.The univariate case

7.2.Certification of herd homotopies

7.3.Algorithmic improvements

7.4.Global certification

7.5.Local algebras

Bibliography

			(c∈𝕂)
			(k∈{1,…,r})

			(c∈𝕂)
			(k∈{1,…,r})

			(8)
			(9)
			(10)