Reliable homotopy continuation
Preliminary version

Joris van der Hoeven

LIX, CNRS

École polytechnique

91128 Palaiseau Cedex

France

Email: vdhoeven@lix.polytechnique.fr

Web: http://lix.polytechnique.fr/~vdhoeven

November 3, 2023

. This work has been supported by the ANR-09-JCJC-0098-01 MaGiX project, the Digiteo 2009-36HD grant and Région Ile-de-France.

In this paper, we present several algorithms for certified homotopy continuation. One typical application is to compute the roots of a zero dimensional system of polynomial equations. We both present algorithms for the certification of single and multiple roots. We also present several ideas for improving the underlying numerical path tracking algorithms, especially in the case of multiple roots.

Keywords: Homotopy continuation, polynomial system solving, ball arithmetic, interval arithmetic, reliable computing

A.M.S. subject classification: 65H20, 65H04, 65G20, 30C15, 13P15

Disclaimer. This paper is a preliminary version, which is mainly intended as a working program for an ongoing implementation in the Mathemagix system. It is expected that several adjustments and corrections will have to be made during the implementation of the algorithms in this paper and their proof reading by colleagues.

1.Introduction

Besides Gröbner basis computations, homotopy methods are a popular technique for solving systems of polynomial equations. In this paper, we will only consider zero dimensional systems. Given such a system

(1)

with and , the idea is to find a suitable starting system

(2)

of which all solutions are known, to introduce the homotopy

(3)

and to compute the solutions to by following the solutions of from to . Two main approaches exist:

Algebraic homotopies

In this setting, the polynomial equations have exact rational or algebraic coefficients. The homotopy continuation is done exactly using suitable resultants. At the end of the homotopy, the solutions of the system are again given exactly, as the solutions of simpler systems. The theory was developed in [GHMP95, GHH+97, Dur08] and a concrete implementation is available in the Kronecker system [Lec01].

Numeric homotopies

An alternative approach is to follow the solution paths using a numeric path tracking algorithm; see [Mor87, Ver96, SW05] and references therein. This approach is usually faster, partly because most of the operations can be done at a significantly lower precision. However, the end result is only approximate. In particular, it cannot be used for the reliable resolution of overdetermined systems. Several implementations exist for numeric path tracking [Ver99, BHSW06, Ley09].

It is surprising that little effort has been undertaken so far in order to bring both approaches closer together. Particularly interesting challenges are how to make numeric homotopy as reliable as possible and how to reconstruct exact end results from the numeric output. Part of this situation might be due to the fact that interval analysis [Moo66, AH83, Neu90, JKDW01, Kul08, MKC09, Rum10] is not so well-known in the communities where homotopy methods were developed, with the exception of one early paper [Kea94]. The main objective paper is to systematically exploit interval analysis techniques in the context of homotopy continuation. We will show how to certify homotopy continuations as well as single and multiple solutions of the polynomial system .

Section 3 is devoted to preliminaries from the area of reliable computation. In section 3.1, we start by recalling the basic principles of ball arithmetic [vdH09], which is a more suitable variant of interval arithmetic for our purposes. In section 3.2, we pursue by recalling the concept of a Taylor model [MB96, MB04], which is useful in order to compute with reliable enclosures of multivariate analytic functions on polydisks. We also introduce a variant of Taylors models in section 3.3, which simultaneously encloses an analytic function and a finite number of its derivatives. In sections 3.4 and 3.5, we discuss the well known problem of overestimation which is inherent to ball arithmetic. We will provide some techniques to analyze, quantify and reduce overestimation.

Before attacking the topic of certified path tracking, it is useful to review the theory of numeric path tracking first. In section 4, we start with the case of non singular paths, in which case we use a classical predictor corrector approach based on Euler-Newton's method. The goal of a numeric path tracker is to advance as fast as possible on the solution path while minimizing the risk of errors. Clearly, the working precision has to be sufficiently large in order to ensure that function evaluations are reasonably accurate. In section 4.4, we show how to find a suitable working precision using ball arithmetic. We consider this approach to be simpler, more robust and more general than the one proposed in [BSHW08]. In order to reduce the risk of jumping from one path to another path, we also need a criterion for checking whether our numeric approximations stay reasonably close to the true solution path. A numerically robust way to do this is to ensure that the Jacobian of does not change to rapidly during each step; see section 4.5 and [BSHW08] for a related approach. Another technique is to detect near collisions of paths and undertake special action in this case; see section 4.6.

In section 5, we turn our attention to homotopies (3) such that the end system (1) admits multiple solutions. We will see that Euler-Newton iterations only admit a linear convergence near multiple solutions. Therefore, it is useful to search for alternative iterations which admit a better convergence. Now the solution path near a multiple solution is given by a convergent Puiseux series in . When letting turn around the origin, we thus fall on another solution path. The collection of paths which are obtained through repeated rotations of this kind is called a herd. In sections 5.2 and 5.3, we will describe a new path tracking method with quadratic convergence, which operates simultaneously on all paths in a herd. The remaining issue of how to detect clusters and herds will be described in sections 5.4, 5.5 and 5.6.

In section 6, we turn our attention to the certification of single roots of (1) and single steps of a path tracker. An efficient and robust method for the certification of solutions to systems of non linear equations is Krawczyk's method [Kra69], with several improvements by Rump [Rum80]. In section 6.1, we adapt this classical method to the setting of ball arithmetic. In section 6.2, we will see that an easy generalization of this method provides an algorithm for certified path tracking. An alternative such algorithm was given in [Kea94], but the present algorithm presents similar advantages as Krawczyk's method with respect to other methods for the certification of solutions to systems of non linear equations. However, both methods still suffer from overestimation due to the fact that error bounds are computed on a polydisk which contains the solution path. Using the technique of Taylor models, we will show in section 6.3 that it possible to compute the error bounds in small tubes around the actual solution path, thereby reducing the problem of overestimation.

In the last section, we consider the more difficult problem of certifying multiple roots. Deflation is a classical technique in order to solve this difficulty. However, deflation usually requires the computation of a large number of derivatives of the system, which becomes prohibitive for large clusters of solutions. Notice that solutions which tend to infinity should also be considered as being part of one or more large clusters if we want to compute all solutions to (1). Our certification strategy is again based on the simultaneous consideration of all solution paths in a herd. In sections 7.2 and 7.3, we will show that a herd of solution paths can be considered as a single isolated solution path of a new “suitable fattened” system of equations. From the complexity point of view, if the herd contains paths, then the evaluation of the fattened system is only times more expensive than the evaluation of the original system, up to logarithmic factors. Moreover, a single large cluster often contains many different herds, which can be considered separately for our technique. This is particularly useful for the separation of the various paths which tend to infinity. In section 7.4, we will show how to certify the global set of solutions to the system and how to reconstruct equations for the exact solutions.

2.Notations

Positive elements
Given a subset , we denote

Vector notation
Unless stated otherwise, we will use the -norm for vectors :

(4)

This norm should not be confused with taking componentwise absolute values

For we also define

If are formal variables, then we write

Matrix notation
We write for the set of matrices over a set . The matrix norm of a matrix corresponding to the -norm (4) for vectors

Directed acyclic graphs
We recall that labeled directed acyclic graphs are often used for the representation of symbolic expressions with potential common subexpressions. For instance,

is a typical dag for the expression . We will denote by the size of a dag . For instance, the size of the above dag is .

3.Reliable arithmetic

3.1.Ball arithmetic

Let us briefly recall the principles behind ball arithmetic. Given a normed vector space , we will denote by or the set of closed balls with centers in and radii in . Given such a ball , we will denote its center by and its radius by . Conversely, given and , we will denote by the closed ball with center and radius .

A continuous operation is said to lift into an operation on balls, which is usually also denoted by , if the inclusion property

(5)

is satisfied for any and . We also say that is an enclosure for the set , whenever (5) holds. For instance, if is a Banach algebra, then we may take

Similar formulas can be given for division and elementary functions. Certified upper and lower bounds for will be denoted by and .

It is convenient to extend the notion of a ball to more general radius types, which only carry a partial ordering. This allows us for instance to regard a vector of balls as a “vectorial ball” with center and radius . If , then we write if and only if for all . A similar remark holds for matrices and power series with ball coefficients.

In concrete machine computations, numbers are usually approximated by floating point numbers with a finite precision. Let be the set of floating point numbers at a given working precision, which we will assume fixed. It is customary to include the infinities in as well. The IEEE754 standard [ANS08] specifies how to perform basic arithmetic with floating point numbers in a predictable way, by specifying a rounding mode among “down”, “up” and “nearest”. A multiple precision implementation of this standard is available in the Mpfr library [HLRZ00]. Given an operation , we will denote by its approximation using floating pointing arithmetic with rounding mode . This notation extends to the case when and are replaced by their complexifications and .

Let and or and . We will denote by or the set of closed balls in with centers in and radii in . In this case, we will also allow for balls with an infinite radius. A continuous operation is again said to lift to an operation on balls if (5) holds for any and . The formulas for the ring operations may now be adapted to

where , and are reliable bounds for the rounding errors induced by the corresponding floating point operations on the centers; see [vdH09] for more details.

In order to ease the remainder of our exposition, we will avoid technicalities related to rounding problems, and compute with “idealized” balls with centers in and radii in . For those who are familiar with rounding errors, it should not be difficult though to adapt our results to more realistic machine computations.

Remark 1. In classical interval analysis so called interval lifts of operations are sometimes required to satisfy the inclusion monotonicity property

for all , which clearly implies the usual inclusion property (5). For floating intervals, it is easy to ensure this stronger property using correct rounding. In the ball setting, the exact ring operations in and are clearly inclusion monotonic, but it seems cumbersome to preserve this stronger property for floating balls. For this reason, we systematically develop our theory without assuming inclusion monotonicity.

3.2.Taylor models

If we are computing with analytic functions on a disk, or multivariate analytic functions on a polydisk, then Taylor models [MB96, MB04] provide a suitable functional analogue for ball arithmetic. We will use a multivariate setup with as our coordinates and a polydisk for a fixed . Taylor models come in different blends, depending on whether we use a global error bound on or individual bounds for the coefficients of the polynomial approximation. Individual bounds are sharper (especially if we truncate up to an small order such that the remainder is not that small), but more expensive to compute. Our general setup covers all possible blends of Taylor models.

We first need some more definitions and notations. Assume that is given the natural partial ordering. Let denote the -th canonical basis vector of , so that and for . For every , recall that . A subset is called an initial segment, if for any and with , we have . In that case, we write and . In what follows, we assume that and are fixed initial segments of with . For instance, we may take and or or .

Let or . Given a series , we will write for its support. Given a subset and a subset , we write and . If is analytic on , then we denote its sup-norm by

A Taylor model is a tuple , where , and are as above, and . We will write for the set of such Taylor models. Given and , we will also denote and . Given an analytic function on , we write , if there exists a decomposition

with and for all . In particular, if , then

for any . Given two Taylor models , we will say that is included in , and we write if for any . This holds in particular if for all , in which case we say that is strongly included in and write . We finally define by

so that for all and .

Addition, subtraction and scalar multiplication are defined in a natural way on Taylor models. For multiplication, we need a projection with for all and if . One way to construct such a mapping is as follows. For , we must take . For , let be largest such that . Then we recursively define . Given , we now define their product by

Using the observation that , this product satisfies the inclusion property that for any analytic functions and on .

3.3.-stable Taylor models

For some applications, it is convenient to use Taylor models for enclosing both an analytic function and a certain number of its derivatives. Let us show how to incorporate this in our formalism. Throughout this section, we assume that and that is an initial segment with .

Given a Taylor model and , we notice that can be regarded as a Taylor model in with . Let be an analytic function and . We define the relations and by

Clearly, for all and .

Let be an operation. Then is said to -lift to , if for all and all , we have . Addition, subtraction and scalar multiplication -lift in the usual way. As to multiplication, we take

with

In order to see that satisfies the -inclusion property, it suffices to check that

for all . This is clear if . Otherwise,

For any with , there exists a with . Hence,

In the particularly useful case when , we notice that for all and for all .

3.4.Overestimation

The major problem in the area of ball arithmetic is overestimation. For example, even though the expression evaluates to zero for any , its evaluation at any ball in with a non zero radius is not identically equal to zero. For instance,

Algorithms which rely on ball arithmetic have to be designed with care in order to avoid this kind of overly pessimistic error bounds. In particular, if we evaluate a dag using ball arithmetic, then a symbolically equivalent dag might lead to better error bounds.

Consider a continuous function with as in section 3.1. We recall that is said to lift into an operation if the inclusion property

is satisfied for all and . Clearly, such a lift is not unique: for any with for all , the function is also a lift of . If we require that , then the best possible lift is given by

In general, this lift may be expensive to compute. Nevertheless, its existence suggest the following definition of the quality of a lift. The overestimation of at is defined by

(6)

This quantity is easier to study if we let tend to zero. Accordingly, we also define the pointwise overestimation function by

(7)

Here means that and .

If is computed by evaluating a dag , then it would be nice to have explicit formulas for the pointwise overestimation. For and assuming that the lift is evaluated using the default ball implementations of and from section 3.1 , we claim that there exists a dag with

for . Indeed, we may compute using the rules

(c∈𝕂)
(k∈{1,…,r})

where stands for the -th coordinate function. Now we also have

for . Consequently,

If , then this formula simplifies to

Example 2. With , let us compare the dags and . We have and , whence

The example shows that we have an infinite amount of overestimation near double zeros, except if the dag is explicitly given as a square near the double zero. More generally, for the dag with an -fold zero, we obtain

At a distance of the zero, ball arithmetic thus produces bounds which are times too pessimistic.

Remark 3. An interesting problem is whether a good understanding of the pointwise overestimation also helps us to bound the overestimation on more general balls. One concrete question is whether we have

for all polynomial dags and balls . This inequality seems to hold in all easy cases that we have looked at, but we do not have a proof that it holds in general.

3.5.Reducing the overestimation

The example 2 shows that standard ball arithmetic generally produces an infinite amount of overestimation near double or multiple zeros. This raises the problem how to compute better ball lifts which do not present this drawback.

One possible remedy is to systematically compute the ball lifts using Taylor models. Indeed, assume that we want to evaluate at the ball . Let , and be as in section 3.2 and let be the corresponding domain of Taylor models in . Let and consider the Taylor model evaluation of at

Then

yields an enclosure of . Although the evaluation of is usually far more expensive than the evaluation of , let us now study how much the overestimation has been reduced.

Let and let us introduce the operator , which generalizes the mapping . The operator is defined by induction over the size of :

(c∈𝕂)
(k∈{1,…,r})

For as above, we then have

Now assume that and let be the valuation of at . If , then we have

(8)
(9)
(10)

If , then we still have (8), but (9) and (10) become

If , then we generally have

although may occur in lucky cases.

4.Numeric path tracking

4.1.General framework

Let be an open subset of and an analytic function. We consider as a function in and the time , where and , and also call a homotopy. Assuming that for some and that we are not in a “degenerate” case, there exists a unique analytic function with for all . We are interested in the value of when . More generally, given a vector of vectors, there exists a unique function with for all .

The goal of a numeric path tracker is to approximate the function as well and as quickly possible and, above all, to compute its value at the “end point” . In what follows, we will denote by the set of floating point numbers with bit mantissas. We also define , and assume that we have a program for computing a numeric approximation of . Given with , we thus want to compute with , by following the homotopy.

In many cases, we will be interested in homotopies for solving a system

(11)

of polynomial equations. The number of solutions to a generic system of this kind is given by the Bezout number , where is the total degree of for each . For suitable scaling parameters , we now define by

Let

For any , the point

clearly satisfies , whereas any with satisfies .

If the system (11) is zero dimensional and the values are complex and sufficiently random (we also say that the homotopy is in general position), then the system is also zero dimensional for every . In what follows we will always assume that the homotopy has been chosen in such a way.

4.2.Solutions at infinity

One classical difficulty with homotopy methods for solving a polynomial system (11) is that many of the solution paths may tend to infinity in the sense that for some and . Computations which infinities can be avoided by rewriting the equations in projective coordinates. More precisely, setting , the projectivation of a polynomial is defined by

Applying this to the system (11), we obtain a new system

(12)

of homogeneous equations in . For a random hyperplane

(13)

the composite system (1213) is again zero dimensional, but without solutions at infinity. It is easy to reconstruct solutions to (11) from solutions to (1213) and vice versa.

4.3.Predictor corrector methods

Assume that we have a way to approximate the Jacobian of by . For instance, if is given by a dag, then a dag for can be computed using forward differentiation, and just corresponds to the approximated evaluation of this dag.

Assume that we are given and at a certain point where . We may write as the horizontal join of two matrices and . Given close to , we may find a for which using Euler-Newton's method

The replacement is called a prediction step. We may still apply the formula when , in which case is usually a better approximation than to a genuine zero of at than . In this situation, the replacement is called a correction step.

From the computational point of view, the evaluation of the Jacobian is usually about times more expensive than the evaluation of the function itself (except for large and sparse ). Instead of reevaluating the Jacobian after the prediction step at , it may therefore be worth it to perform a few correction steps using the Jacobian at instead:

Since the convergence of is only linear, the number is typically chosen quite small (). One full prediction-correction cyclus now just consists of the replacement .

From the complexity point of view, the evaluation of and is usually far more expensive than the cost of linear algebra at size , at least for the examples we will be interested in here. Therefore, it will not be necessary to device the linear algebra algorithms with special care (for instance, we may simply compute the inverse once and for all, instead of using LU decompositions). On the other hand, we typically want to increase the step size as much as possible, while trying to stay reasonably close to the true solution path.

4.4.Precision control

One obvious source of numeric errors is when the numeric precision being used is insufficient for producing sensible results. In [BSHW08], a strategy has been proposed for selecting a sufficient precision for homotopy methods to be numerically reliable. We will now propose an alternative method for finding such a precision, whose justification is based on a simpler argument.

Let be the current working precision. Our method is based on the following idea: when evaluating , the actual precision of the result is usually smaller than and of the form for some fixed constant. We will call the effective precision and we may expect the numeric evaluations to be reliable as long as is picked sufficiently large such that remains above a certain threshold (e.g. ).

We still need a more precise definition of the effective precision or a simple way to compute it. Assuming that admits a ball lift, we may evaluate at the ball . Then

provides an estimate for the relative precision of . If , then this precision is potentially quite low. In that case, we may also consider at the next time . Instead of performing one extra ball evaluation, we may also use the following approximation of :

We now take

for the current effective precision at and assuming a current step size .

4.5.Step size control

Since purely numeric homotopy methods are usually being designed for speed, the main focus is not on being 100% fool proof. Nevertheless, it remains worth it to search for cheap ways in order to detect errors and adapt the stepsize so as to avoid potential errors.

Now assume that we perform one full prediction correction cyclus . We first need a criterion for when to accept such a step. The main problem with the design of numeric criteria is there is no way to decide whether a numeric quantity is small or large; such checks can only be performed with respect to other quantities. Instead of checking whether we remain close to the genuine solution path, it is therefore more robust to check that the Jacobian does not change not change to quickly on the interval .

More precisely, let , , and . Then it is natural to only accept steps for which

(14)

for a fixed threshold (e.g. ). Here we may use any matrix norm , so it is most convenient to chose one which is easy to compute:

The condition (14) is not fully satisfactory yet, since it relies on the expensive computation of a Jacobian . This is acceptable if the step has a good chance of being accepted (since we will need the Jacobian anyway for the next step), but annoying if the step is to be rejected. Before checking (14), it is therefore wise to perform a few cheaper checks in order to increase the probability that (14) will hold indeed. In particular, if , then we may verify that

(15)

for the max-norm on vectors, where (e.g. ) and . This simplified check is linked to (14) by remarking that . The new check (15) should not be applied when and are too close for and to be computed with sufficient precision. More precisely, it should really be replaced by the check

(16)

where is slightly smaller than one (e.g. ) and stands for the “effective working precision” from section 4.4.

In addition to the above checks, one might wish to ensure that is reasonably small after each step. Unfortunately, there is no satisfactory reference with respect which smallness can be checked, except for . The best we can do therefore consists of checking whether tend to at some indicated rate:

(17)

for all , where (e.g. ). Again, we need to insert a safety exemption for the case when the convergence is exceptionally good.

Once that we have a criterion on whether a step should be accepted, an algorithm for automatic stepsize control is easily implemented: assuming that we are walking from to , we start by setting . Given and , we try a step until . If the step fails, then we set with (e.g. ), and retry for the smaller stepsize. Otherwise, we accept the step and set for the next step, where (e.g. ).

4.6.Near collisions

Another way to look at the numerical error problem is to investigate what can actually go wrong. Theoretically speaking, around each true solution path , there exists a small tube of variable polyradius , where Newton's method converges to the true solution . As long as our current approximation at time remains in this tube , no errors will occur. Now the Newton iterations have a strong tendency of projecting back into the tubes, especially if we use the additional safeguard (17). Nevertheless, it might happen that we jump from one tube into another tube, whenever two solution paths come close together.

If we are considering a homotopy for solving a polynomial system , then various solution paths will actually meet at if the system admits multiple roots. Such multiple roots are an intrinsic difficulty and we will need dedicated “end game” strategies to ensure good numeric convergence in this case (see section 5 below).

For , and for suitably prepared functions , the Lebesgue probability that two solutions paths meet at a point is zero. Nevertheless, we may have near collisions, which usually occur in pairs: the probability that more than two paths simultaneously pass close to a same point is extremely low.

So assume that we have a near collision of two solution paths. Then we have a true collision at for some complex time near the real axis. Locally around this collision point, the two paths are then given by

for some vector . If we only know at a few points, then we may try to compute , and , and also check whether the second path indeed exists.

Now assume that we have approximated and derivative at two times . Denote these approximations by , , and . Then

for , whence we may use the following approximations for and :

We next perform several safety checks. First of all, we obtained as the division of two vectors; we may use the mean value of the componentwise divisions and check that the variance remain small. We next verify that and are reasonably close. We also verify that the Newton iteration starting at converges to a solution close to . We finally verify that the same thing holds for instead of , where .

We will not go into technical details on the precise numerical checks here, since section 5.3 below contains a similar discussion for the case of multiple roots at . We may also adapt the herd iteration from section 5.2 below to near collisions, which allows for the simultaneous continuation of and . Contrary to the case when , we also need to recompute better estimations of at every step, which can be done via the simultaneous computation of and the two “conjugate” paths with . Indeed, using the higher order expansion

we get

from which we may deduce high quality approximations of and . As soon as is small with respect to , then the junction between paths and their conjugates occurs and we know how to traverse the near collision.

5.Multiple roots

5.1.Straightforward Euler-Newton type methods

Consider a homotopy induced by a polynomial system (11) with a zero dimensional set of solutions. It frequently occurs that some of the solutions are multiple roots, in which case the predictor corrector algorithm slows down significantly when approaches . This is due to the fact that Newton's method only has a linear convergence if we are approaching a multiple root, whereas the convergence is quadratic for single roots.

In order to get a better understanding of this phenomenon, it is instructive to quantify the slow down in the case of an -fold root of a univariate polynomial , which is more or less representative for the general case. In the neighbourhood of the root , we have

with . Hence, the Newton iteration becomes

In particular, we see that we need roughly iterations in order to divide by . We also notice that is roughly divided by at every iteration. For complexity measures, it is more reasonable to study the speed of convergence of rather than itself. Indeed, the relative precision of an -fold root is intrinsically times smaller than the working precision.

If we are rather considering a homotopy , then we usually have . Locally, we may thus write

Assume that we have for small and , so that

Then the Euler-Newton iteration for step size yields

Following our criterion (14), we should have

Roughly speaking, this means that . Hence, is multiplied by at every step and is multiplied by every steps.

5.2.The herd iteration

For high precision computations, it would be nice to have an algorithm with quadratic convergence in . Before we give such an algorithm, let us first introduce some terminology and study the behaviour of the solutions paths when .

By assumption, we are given a system (11) with an -fold root . Consider a solution path for the homotopy with . Since is algebraic in , we may expand

as a Puiseux series in for a certain ramification index (which we assume to be taken minimal). Now letting turn around once, we have

where . When turning repeatedly, we thus obtain pairwise distinct solutions paths with . We will call such a family of solution paths a herd.

Contrary to the homotopy methods from section 4, which operate on individual paths, the iteration that we will present now simultaneously operates on all paths in a herd. Consider a solution path with as above and the corresponding herd with . We assume that both and are known for a given and all . Let and denote the FFT-transforms of the vectors and with respect to . Then we have

for all . We now compute using the formulas

For of the order of , we now have

(18)

for all . We call (18) the herd prediction. This prediction may be corrected using conventional Newton iterations at time , for a fixed constant . A complete cyclus of this type will be called a herd iteration.

5.3.Step size control for herd iterations

Several technical details need to be settled in order to obtain a robust implementation of herd iterations. First of all, we need a numeric criterion for deciding when the approximations and are of a sufficient quality for starting our herd iteration. Clearly, the error of the approximation should be in .

We may first ensure ourselves that the approximation can not substantially be improved using Newton iterations: let be the result of applying one Newton iteration to at time . Then we check whether

(19)

for some threshold , such as (although this check becomes unstable if , we notice that this situation cannot arise systematically for ).

The check (19) for does not yet guarantee that the correspond to approximate evaluations of the Puiseux expansions. In order to check that this is indeed the case, we first compute the as described in the previous section. Defining

we next evaluate for all and apply one Newton iteration at time to the results, yielding . We now check whether

(20)

for some threshold , such as , and all . Of course, this second check is more expensive than the first check (19). The thresholds should therefore be adjusted in such a way that the second check is likely to succeed whenever the first one does.

The above criteria can also be used for deciding whether a proposed herd iteration from to should be accepted or not. We still have to decide how to chose . For a fixed constant and a positive integer which may change at every step, we will take

If a step is accepted, then we increase by one or a larger integer smaller than . If a step is not accepted, then we decrease by one and repeat the same procedure until acceptance or . If , then we have either reached the best possible accuracy for the current working precision, or our paths did not really converge to the same point . The first case occurs whenever the effective precision from section 4.4 drops below a given threshold. In the latter case, we revert to individual homotopies for further continuation.

5.4.Detection of clusters

Let us now go back to the initial polynomial system (11) and assume that we have computed numerical approximations of all individual homotopies up till a certain time . We need a way to partition the individual paths into herds. One obvious way is to follow all solution paths from to and deduce the corresponding permutation of . However, this computation is quite expensive, so it would be nice to have something faster.

A first step towards the detection of herds is to find all clusters, i.e. all groups of paths which tend to the same limit . Here we notice that one cluster may contain several herds, as in the example

where all four solution paths with tend to the quadruple root of . This cluster contains two herds and .

Now let and for all . For each , we consider the ball

The radii of these balls has been chosen with care, such that, with high probability, any two paths which belong to the same herd are also in the same connected component of . This is best verified on the case of path . Then the next path in the cluster is and

An efficient way to separate different connected components of is via projection. Let be a random vector of real numbers of length . Then any point may be projected to the vector product . Applying this projection to our balls , we obtain intervals . We may sort the (and the corresponding ) on their centers in time and compute the various connected components of using a linear pass. Whenever and are in different connected components, then so are and . Assuming that is sufficiently small, application of this procedure for random vectors results with probability one in the separation of all connected components corresponding to different clusters.

5.5.Detection of herds

Let be a set of indices such that the with form a cluster with limit . We still need a way to find the various herds inside the cluster. In a similar way as in section 5.3, we may improve the quality of our approximations and via Newton iteration until and . From now on, we assume that we have done this.

For each and , we may write

for some and . We obtain a good approximation using

(21)

If is not too large (so that has a small numerator and denominator), then we also obtain reasonably accurate approximations and by

and check whether

is indeed close to some with . Doing this for all , we thus obtain a candidate permutation with for all . Each cycle in this permutation induces a candidate herd. Using the criteria from 5.3, we may next check whether the quality of the candidate herd is sufficient. If not, then we may always resort to the more expensive computation of the solution path from to .

5.6.Synchronization

Our algorithms for the previous sections for cluster and herd detection rely on the availability of approximations on all paths at the same time . Usually the individual homotopies are launched in parallel and advance at different speeds. Consequently, the synchronization of all paths at the same time is a non trivial matter.

Strictly speaking, we notice that it is not necessary to synchronize all paths, but rather those paths which belong to the same cluster or herd. In particular, we will concentrate on those paths which tend to multiple roots.

So consider a path which tends to a multiple root . As long as is approximated using an individual continuation, we have seen that the convergence to is linear. For a fixed (such as ), the computation of at all “checkpoints” thus only requires a constant overhead. At every checkpoint, we may now launch the algorithm for the detection of clusters. For every candidate cluster , we next determine the checkpoint with highest at which is available for all . We launch our algorithm for the detection of herds at this checkpoint .

In addition, it is a good practice to check that we still have points on all paths at every checkpoint. For paths which tend to a single root, we may approximate for large using a single step continuation from to . For the approximation of using (21), we notice that it important that no paths of the cluster are missing or counted twice. Indeed, in the contrary case, we only have with for all , which is insufficient for the computation of accurate approximations of and .

6.Certified homotopies

6.1.Certification of Newton's method

Consider an analytic function on some open subset of and assume that admits a ball lift. Given an isolated root of , it is well known that Newton's method converges to in a small neighbourhood of . It is a natural question to explicitly compute a ball neighbourhood for which this is the case. One method which is both efficient and quite tight was proposed by Krawczyk [Kra69]. Recall that denotes the Jacobian of .

Theorem 4. Let , and let be an analytic function. Let be a ball enclosure of the set . If

then admits a fixed point .

Proof. For any , we have

Since is convex, we also have

Hence

It follows that is an analytic function from the compact ball into itself. By Brouwer's fixed point theorem, we conclude that there exists a with .

Corollary 5. Let , and let be an invertible matrix with . If and

then the equation admits a root .

Proof. We apply the theorem for .

The above method is still a bit unsatisfactory in the sense that it does not guarantee the uniqueness of the solution. Denoting by the interior of a subset of , the following sharpening of the method is due to Rump [Rum80].

Theorem 6. With the notations from theorem 4, if

then admits a unique fixed point in .

Proof. Let us first show that the spectral norm (i.e. the norm of the largest eigenvalue) of any is . Indeed, our assumption implies

Now consider the norm on . Then, for any and with , we have

whence . This is only possible if the spectral norm of is .

Now consider . By what precedes, any matrix in is invertible. For any two distinct points , we have

Since is convex, there exists a matrix with

By what precedes, it follows that . We conclude that or . The existence of a fixed point follows from theorem 4.

Corollary 7. With the notations of corollary 5, if

then the equation admits a unique root .

Proof. Application of theorem 6 for .

Assuming that we have computed a numeric approximation to a root of , a second question is how to find a suitable ball for which the corollaries apply. Starting with , a simple solution is consider the sequence defined by

(22)

where

Whenever , then we are done. In order to ensure the convergence of this method, we need to tweak the recurrence (22) and replace it by

(23)

for suitable small positive constants and . We refer to [Rum80] for more details on this technique, which is called -inflation.

6.2.Certification of a numeric homotopy continuation

Assume that the polynomial system (11) admits only simple roots and that we have obtained numeric approximations for all these roots using a numeric path tracker. Then theorem 5 suffices for the joint certification of the numeric approximations . Indeed, using the above technique, we first compute balls for which theorem 5 applies. To conclude, it then suffices to check that these balls are pairwise disjoint. This can be done using the same algorithm as for the detection of clusters, which was described in section 5.4.

In the case when two balls and do intersect, then we recompute approximations for the paths and using a smaller step size, that is, by lowering the constant in (14). We keep doing so until none of the balls intersect; even if some of the paths may have been permuted due to numerical errors, the final set of all is correct if none of the balls intersect. Indeed, each of the balls contains a solution and there can be no more solutions than the number predicted by the Bezout bound.

If and intersect then, instead of recomputing the paths and using smaller and smaller step sizes, we may also search for a way to certify the entire homotopy computations. This will be the topic of the remainder of this section. Let us first show how to adapt the theory from the previous section to certified path tracking. From now on, we assume that is an analytic function which admits a ball lift.

Theorem 8. Let be such that . Let and let be an invertible matrix with . If

then the equation admits a unique root for each .

Proof. Let and consider the function . Then encloses and encloses , and we conclude by theorem 6.

Clearly, for any , theorem 8 ensures the existence of a unique solution path from to in the tube . As at the end of the previous section, the question again arises how to compute balls and for which the conditions of the theorem are likely to be satisfied. Since the computation of is expensive, it is important to keep down the number of iterations of the type (22) or (23) as much as possible (say at most one iteration).

Now assume that we performed a numeric homotopy computation from to . Then a reasonable first guess is to take

for some , say . Unfortunately, if one of the components of tends to zero, then this guess turns out to be inadequate. Therefore, it is recommended to use an additional inflation proportional to the norm of :

for some small , say . Another idea is to use the radius of the previous step as a reference (except for the very first step, of course). For instance, if our previous step went from to , then we may take

for some small , say .

6.3.Certification via tubular models

One important disadvantage of the method from the previous section for the certification of one path tracking step is that we use global error bounds on the tube . Consequently, the inaccuracy of is proportional to the step size , whence any overestimation in the evaluation of or due to the inaccuracy in requires a reduction of the step size.

For this reason, it is much better to follow the solution path as closely as possible instead of enclosing it in a “square tube”. This can be achieved via the use of Taylor models. Using -stable Taylor models, it is possible to simultaneously compute of accurate enclosures for and on the tube.

More precisely, let , and . For a fixed in , let be an initial segment of of the form

and let . A -stable Taylor model in will also be called a tubular model. We will write for the set of tubular models. Given , we let and be such that

for all .

Figure 1. Illustration of a solution path in a tube.

Theorem 9. Let , , and let

be an approximation of the solution to . For instance, if and , then we may take , with and . Consider and with

Let , , . If

(24)

then the equation admits a unique solution , for every .

Proof. For an illustration of the proof, see figure 1. Let and . By construction, and using the facts that and , we have

for any and . For a fixed , it follows that encloses on the disk . Our hypothesis (24) also implies that

From theorem 6, we conclude that admits a unique fixed point .

In order to apply the theorem, it remains to be shown how to find a good tube, i.e. how to choose , , , and . For a fixed order of the approximation, the idea is to adjust and such that can be chosen minimal.

Let us first consider the first order case . Assume that we performed a numeric path continuation from to and that both and are approximatively known. Then there exists a unique curve of degree three with , , and . Let be a linear curve which minimizes the maximum of on for every . Then we take , , and . We may also take for some fixed such as . However, for better performance it is recommended to apply an additional inflation to , similar to what we did in the previous section.

For higher orders , we proceed in an essentially similar way. We first compute a high order numeric polynomial approximation of . For orders , this may require the accurate approximation of additional points with on the solution path. We next find a -th order polynomial which approximates as good as possible and choose our tube in a similar way as above. It should be noticed that the evaluation in theorem 9 is at least thrice as expensive as the numeric evaluation of . This makes it worth it to improve the quality of the numeric approximations of points on the curve using one or more additional Newton iterations. The use of higher order approximations makes it possible to choose very small, thereby avoiding a great deal of the overestimation due to the use of ball arithmetic.

7.Certification of multiple roots

7.1.The univariate case

In section 5.1, we have studied in detail the numeric determination of a multiple root of a univariate polynomial. It is instructive to take up this study and examine how we certify such multiple roots. Since the property of being an -fold root is lost under small perturbations, this is actually impossible using ball arithmetic. The best we can hope for is to certify the existence of roots in a small ball, or the existence of an -fold root of a small perturbation of the polynomial (see also [Rum10]). In this section we adopt the first point of view; a variant of the approach to perturb the polynomial itself will be studied in the next sections.

So consider a polynomial with an approximate -fold root at and assume that we wish to certify that admits exactly roots in the ball , for some . One first strategy is to make use of the Taylor series expansion of at . More precisely, let be the set of univariate Taylor models in with and for some . Evaluating at , we obtain a Taylor model with the property that for any . It remains to be shown that any admits roots in . We claim that this is the case if

(25)

Indeed, assume that we have (25) and let . Then

for all with . By Rouché's theorem, it follows that and admit the same number of roots on . If becomes large, or if admits other roots close to , then the bound (25) often does not hold. In that case, one may use more sophisticated techniques from [Sch82, vdH11] in order to certify that admits roots in . From the complexity point of view, the series expansion method requires evaluations of , where denotes the cost of multiplying two polynomials of degrees .

Another approach is to apply Rouché's theorem in a more direct way by computing on a path starting at and which circles around the origin once. If the reliable image of this path avoids the origin, then the number of roots of coincides with the number of times that turns around the origin. More precisely, let for a suitable (see also below) and let for . Then we evaluate and check whether for all . If this is the case, then

yields the exact number of roots of inside . This method requires evaluations of , but needs to be sufficiently large if we want to ensure a reasonable chance of success for the method.

Let us investigate the choice of an appropriate in more detail on the simplest example when and

Consider the evaluation of at . We have

For small , the condition thus implies

Roughly speaking, for , this means that

We recall from example 2 that also corresponds to the punctual overestimation of the ball evaluation of at . If we want to reduce to a quantity which does not depend on , then it follows from the considerations in section 3.5 that we need to evaluate using Taylor models of order at least . However, in that case, we might just as well use the first method based on a direct series expansion of at .

7.2.Certification of herd homotopies

Let us now consider a more general system (11) and assume that we are given a herd of solution paths which all tend (at least approximately) to the same limit . Instead of viewing the as distinct individual paths, we would like to consider the whole herd as a single multivalued path.

From the algebraic point of view, it is more convenient to rather consider the ideal which annihilates instead of itself. There are several ways to represent this ideal by a system of polynomial equations. One option is to require that be a reduced Gröbner basis for . Another option is to use Kronecker representations. Since we are computing with balls of a fixed bit precision, coefficient growth is not a problem, so it best to choose a simple representation which minimizes the number of coefficient parameters.

Now recall that each can be considered as a vector of Puiseux series in of valuations . Setting

we notice that is invariant if we turn once around the origin. This means that is really an analytic function in at the origin. Assuming general position, is actually the minimal annihilator of . In what follows, we will represent by the system of polynomials

(26)

where for each . Now instead of evaluating the homotopy at an ordinary point , we may evaluate at a point , where is the quotient algebra . When evaluating at the multivalued point represented by (26), we would take , . This leads to a lift of as a homotopy , for some open subset of .

In order to apply theorem 8, we need a homotopy over rather than . Therefore, let us show how to reformulate as a homotopy for a suitable open subset of . The idea is to encode the system (26) by a vector

More precisely, given a point

we denote for each . Let and consider the evaluation of at , and . This is possible if for any root of the system . There exists a unique point

such that for all . We take . If can be computed by a dag of size then can be computed by a dag of size , since a multiplication in can be done using a dag of size .

Theorem 10. Let be such that and . Assume that the system admits no multiple zeros for , and let and be such that for all and

Let and let be an invertible matrix with . If

(27)

then there exist unique paths with , and

for all and .

Proof. By theorem 8, there exists a unique solution to the system , for each , whence a unique set with . The uniqueness implies that this set coincides with the set of analytic continuations of the solution paths of from to for each . After reordering, this shows that there exist unique paths with , and , for all and . Now for , solutions of monic equations of the form remain bounded. By continuity, the therefore tend to limits in , and .

Using the univariate root certification methods from section 7.1, we may also compute ball enclosures for . The theorem therefore provides a way to certify all solutions of a numeric homotopy associated to a polynomial system (11), even in the presence of multiple solutions.

Indeed, let be the set of all solution paths. For some small , we perform a certified homotopy continuation from until , using the techniques from section 6. This is possible since the are pairwise distinct, when assuming general position. We next partition , such that is either a singleton or a herd for each . For each singleton, we try to apply theorem 8 for and for each herd, we try to apply theorem 10. If this works, then we obtain the desired enclosures for the solutions of (11), counted with multiplicities. If not, then we choose a smaller and repeat the same procedure.

For the termination of this algorithm, it remains to be checked that theorem 10 indeed applies if is sufficiently small. In other words, setting , we have to show that is invertible. We will only give a rough justification, which we intend to work out in a forthcoming paper. Assuming the contrary and “general position”, the perturbation of would exhibit a non trivial monodromy in , and contradict our assumption that the set is stable under monodromy. We finally notice that our algorithm also works in degenerate situations if we let be a cluster of paths instead of a herd.

Remark 11. As we already noticed before, there is no purely numeric test for knowing whether a herd tends to an -fold root . Nevertheless, if we assume that this is indeed the case, then we notice that can be approximated with a precision which is close to the current working precision. Indeed, if the herd is given by the system (26), then we may approximate using , .

7.3.Algorithmic improvements

Although we have shown how to translate the homotopy into a homotopy , we do want to exploit the multiplication in for computational purposes. In particular, we want to exploit this structure for the computation of the Jacobian . As in the case of usual Jacobians, we may compute by evaluating at , and in the deformed algebra

Denoting the result of this evaluation by , we may write , where and are polynomials of degrees in . Reinterpreting the as elements of (which is correct modulo an error in ), we obtain

Now for any , multiplication by in can be represented by a matrix in the monomial basis of . Similarly, any matrix induces a block matrix

By construction, we now have

Now the computation of and its inverse can be done using dags of sizes , whence multiplication of or its inverse with a vector in can be done a dag of size . In particular, the evaluation of the left hand side of (27) can be done using a dag of size over . This is better than a direct computation of which requires a dag of size .

A second issue which has been hidden by the current presentation concerns numeric stability. If the herd indeed tends to an -fold root when , then tends to . Now straightforward arithmetic in tends to be quite unstable. For instance, the evaluation of a polynomial of degree typically gives rise to a precision loss of digits. This can be avoided using a shift: instead of working in and evaluating at , we rather work in and evaluate at . More generally, if , then we take , where , and evaluate at .

Another question is whether we can avoid using the extra time parameter for the final certification. Indeed, corollary 7 is sufficient for the certification of a single isolated root. More generally, let be a cluster of solutions which tend to an -fold root . We may forget about the last equation and, for some small , compute all solutions to the system which are close to (e.g. by homotopies starting at the ). The system may then be regarded as a homotopy with respect to the time . Given a herd of solution paths, we may then try to construct the system as usual and consider as a polynomial equation on the -dimensional solution surface of this system. This more intrinsic technique is particularly effective if , in which case we really have to compute the roots in a disk of a univariate analytic equation. For , the homotopy does not need to be in general position, and we have not yet worked out the corresponding theory in detail.

7.4.Global certification

Even if the coefficients of the system (11) are all rational or algebraic, then the computed solutions are only numeric approximations. For some applications, it is useful to have exact representations of the solutions. This allows for instance to check whether a given other polynomial with rational coefficients vanishes on the solution set or on some points of the solutions set.

The Kronecker representation provides one useful exact representation for the set of solutions. Modulo a generic linear change of coordinates, we may assume without loss of generality that for any distinct solutions of (11), their first coordinates and are also distinct. Let be the distinct solutions of (11). Then the Kronecker representation for is the unique -tuple of univariate polynomials with , , , such that is annihilated by the system

Assume now that . Then we notice that can be computed in terms of and using

for all . This yields an efficient dichotomic algorithm for the numeric computation of .

Assume now that (11) has rational coefficients and that we have computed numeric approximations of with bit precision . By remark 11, even if approximates a multiple root, then is still known with a precision close to . Using the above method, we may thus compute a numeric approximation of with an accuracy of approximately bits. We apply rational number reconstruction [GG02, Chapter 5] in order to provide a guess for with rational coefficients. We may check whether this guess is correct by evaluating at over . By evaluating over suitable algebras with nilpotent elements, we may check in a similar way whether the multiplicity of each root matches with the numeric multiplicity. If one of these checks fails, then we double the bit precision, use Newton's method to improve the approximations , and keep iterating.

7.5.Local algebras

Consider a cluster of paths which tend to a common root and decompose such that is a herd in the cluster for each . Each herd gives rise to an ideal which is represented by a system of the form (26). The intersection

can be computed by Gröbner basis techniques and the limit at yields the local ideal of the system (11) at the multiple root.

Example 12. Let us consider the very simple example

with the homotopy

For small , we have the solution paths

Both and form a herd, with corresponding ideals

We have

and .

Bibliography

[AH83]

G. Alefeld and J. Herzberger. Introduction to interval analysis. Academic Press, New York, 1983.

[ANS08]

ANSI/IEEE. IEEE standard for binary floating-point arithmetic. Technical report, ANSI/IEEE, New York, 2008. ANSI-IEEE Standard 754-2008. Revision of IEEE 754-1985, approved on June 12, 2008 by IEEE Standards Board.

[BHSW06]

D. Bates, J. Hauenstein, A. Sommese, and C. Wampler. Bertini: Software for numerical algebraic geometry. http://www.nd.edu/~sommese/bertini/, 2006.

[BSHW08]

D.J. Bates, A.J. Sommese, J.D. Hauenstein, and C.W. Wampler. Adaptive multiprecision path tracking. SIAM Journal on Numerical Mathematics, 46(2):722–746, 2008.

[Dur08]

C. Durvye. Algorithmes pour la décomposition primaire des idéaux polynomiaux de dimension nulle donnés en évaluation. PhD thesis, Univ. de Versailles (France), 2008.

[GG02]

J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.

[GHH+97]

M. Giusti, K. Hägele, J. Heintz, J.E. Morais, J.L Monta na, and L.M. Pardo. Lower bounds for diophantine approximation. Journal of Pure and Applied Algebra, 117–118:277–317, 1997.

[GHMP95]

M. Giusti, J. Heintz, J.E. Morais, and L.M. Pardo. When polynomial equation systems can be solved fast? In G. Cohen, M. Giusti, and T. Mora, editors, Proc. AAECC'11, volume 948 of Lecture Notes in Computer Science. Springer Verlag, 1995.

[HLRZ00]

G. Hanrot, V. Lefèvre, K. Ryde, and P. Zimmermann. MPFR, a C library for multiple-precision floating-point computations with exact rounding. http://www.mpfr.org, 2000.

[JKDW01]

L. Jaulin, M. Kieffer, O. Didrit, and E. Walter. Applied interval analysis. Springer, London, 2001.

[Kea94]

R.B. Kearfott. An interval step control for continuation methods. SIAM J. Numer. Anal., 31(3):892–914, 1994.

[Kra69]

R. Krawczyk. Newton-algorithmen zur bestimmung von nullstellen mit fehler-schranken. Computing, 4:187–201, 1969.

[Kul08]

U.W. Kulisch. Computer Arithmetic and Validity. Theory, Implementation, and Applications. Number 33 in Studies in Mathematics. de Gruyter, 2008.

[Lec01]

G. Lecerf. Une alternative aux méthodes de réécriture pour la résolution des systes algébriques. PhD thesis, École polytechnique, 2001.

[Ley09]

A. Leykin. NAG. http://www.math.uic.edu/~leykin/NAG4M2, 2009. Macaulay 2 package for numerical algebraic geometry.

[MB96]

K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.

[MB04]

K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.

[MKC09]

R.E. Moore, R.B. Kearfott, and M.J. Cloud. Introduction to Interval Analysis. SIAM Press, 2009.

[Moo66]

R.E. Moore. Interval Analysis. Prentice Hall, Englewood Cliffs, N.J., 1966.

[Mor87]

A.P. Morgan. Solving polynomial systems using continuation for] engineering and scientific problems. Prentice-Hall, Englewood Cliffs, N.J., 1987.

[Neu90]

A. Neumaier. Interval methods for systems of equations. Cambridge university press, Cambridge, 1990.

[Rum80]

S.M. Rump. Kleine Fehlerschranken bei Matrixproblemen. PhD thesis, Universität Karlsruhe, 1980.

[Rum10]

S.M. Rump. Verification methods: Rigorous results using floating-point arithmetic. Acta Numerica, 19:287–449, 2010.

[Sch82]

A. Schönhage. The fundamental theorem of algebra in terms of computational complexity. Technical report, Math. Inst. Univ. of Tübingen, 1982.

[SW05]

A.J. Sommese and C.W. Wampler. The Numerical Solution of Systems of Polynomials Arising in Engineering and Science. World Scientific, 2005.

[vdH09]

J. van der Hoeven. Ball arithmetic. Technical report, HAL, 2009. http://hal.archives-ouvertes.fr/hal-00432152/fr/.

[vdH11]

J. van der Hoeven. Efficient root counting for analytic functions on a disk. Technical report, HAL, 2011. http://hal.archives-ouvertes.fr/hal-00583139/fr/.

[Ver96]

J. Verschelde. Homotopy continuation methods for solving polynomial systems. PhD thesis, Katholieke Universiteit Leuven, 1996.

[Ver99]

J. Verschelde. PHCpack: A general-purpose solver for polynomial systems by homotopy continuation. ACM Transactions on Mathematical Software, 25(2):251–276, 1999. Algorithm 795.