
. This work has partially been supported by the ANR Gecko project.
Consider two polynomials and with multiple precision floating point coefficients. Although the product can in principle be computed efficiently using FFT multiplication, this algorithm is numerically stable only if the coefficients of are of the same order of magnitude and similarly for . In this paper, we present a new asymptotically fast multiplication algorithm which has a “better” numerically stability. We also provide some theoretical support for what we feel to be “better”, by introducing the concept of “relative Newton error”.

Let be the set of floating point numbers with an bit mantissa and an bit exponent. Each nonzero number of this kind can be written as , with
In what follows, is assumed to be fixed once and for all. For the sake of simplicity, we will not study overflows and underflows in detail and instead make the assumption that has been chosen large enough with respect to the input data. Consider two polynomials
with coefficients in . For definiteness, we will assume , , and . In this paper, we study the problem of multiplying and .
Using FFTmultiplication [CT65], the polynomials and can in principle be multiplied in time . Here is the complexity of bit integer multiplication [SS71] and it can be assumed that increases with . Alternatively, one may rewrite and as “floating point polynomials”, with an bit integer exponent and polynomials with bit integer coefficients as mantissas. Using Kronecker's method, we may then multiply and in time , where . If , then this complexity can be improved a bit further into by using multiplications of degrees .
In the case when the coefficients of are all of the same order of magnitude, and similarly for , the above algorithms are very efficient and numerically stable. In section 2, we will give a more detailed uniform error analysis. Unfortunately, the algorithms are very unstable if the coefficients of or are not of the same order of magnitude. For instance, taking with , the transformation of and into floating point polynomials and amounts to neglecting the coefficients and . In particular, when approximating , only the leading coefficient of the product will be computed with a satisfactory accuracy.
A situation which arises often in practice is that the above fast multiplication algorithms can be made numerically stable using a suitable scaling
For instance, in the above example when , we may multiply and using
This scaling method is particularly useful in the case when and are truncations of formal power series. In section 3, we will briefly explain why and we refer to [vdH02a, Section 6.2] for a more detailed analysis.
However, fast multiplication with scaling only works in the case when the coefficients of have similar orders of magnitude for a suitable . Geometrically speaking, this means that the roots of all lie in a “not so large” annulus around the circle of radius . In practice, this condition is not always satisfied, which leaves us with the question of designing a more general multiplication algorithm which is both asymptotically fast and numerically stable.
In its full generality, this problem is illposed. For instance, in cases when cancellation occurs, it is impossible to design an algorithm which computes all coefficients with a relative error bounded by . A simple example of this situation is the computation of for , . Nevertheless, such cancellations can only occur “below” the “numerical Newton polygon” of : when using the naive multiplication algorithm, based on the convolution sums
the coefficients which correspond to vertices of this Newton polygon are computed accurately, with a relative error bounded by . Numerical Newton polygons will be defined and discussed in section 4. They are the natural numeric counterpart of Newton polygons for polynomials with power series coefficients, based on the analogy between floating point numbers and (Laurent) series, which goes back to Newton himself.
A reasonable problem is therefore to compute with a similar kind of accuracy as the naive multiplication algorithm does, but faster. This also leads to the question of formalizing what can be meant by “similar kind of accuracy”. In section 5, we define the notion of “relative Newton error” for the computation of . We regard this notion as the natural generalization of the notion of “relative error”, when numbers are replaced by numerical polynomials. In section 9, we will state a few interesting properties of this concept, which provide further evidence for its usefulness.
The central part of the paper is devoted to the description of an asymptotically fast multiplication algorithm which has a good numerical stability in terms of the relative Newton error. The main ideas behind the algorithm are the following:
The product is modeled by a rectangle in which each pair corresponds to the computation of .
Only a subregion , which can be determined precisely (section 5), contributes substantially to the product . The complement can be neglected.
The region can be covered by a suitable disjoint union of rectangles (see sections 6 and 7).
The contribution of each such rectangle to can be computed efficiently using a fast multiplication algorithm with scaling (section 8).
The main result of the paper is the following:
Let us describe in more detail the second classical multiplication algorithm mentioned in the introduction, when the coefficients of are of the same order of magnitude, and similarly for . We start with the determination of the maximum exponents
and the approximation of and by integer polynomials and whose coefficients have bitsizes bounded by . At the second step, we use Kronecker's method for the multiplication of and . Taking , the product can be reconstructed [Kro82, Sch82] from the product of the integers and . Assuming that , these integers have sizes and , whence can be multiplied in time . We finally obtain an approximation of the product by fitting the coefficients of into the nearest available floating point numbers in .
Let us now analyze the errors induced by the above scheme, where we recall our assumption that no overflows or underflows occur during our computations. When computing and by rounding to the nearest, we obtain the uniform absolute error bounds
where
denotes the supnorm of if we consider as a vector of coefficients. Consequently,
On the other hand, we also have the bounds
The second bound follows by considering the FFT transforms of , and with respect to a th root of unity . Indeed, the supnorms of these transforms satisfy
whence
When , the combination of (1) and (2) yields the (rough) bound
In the case when , the combination of (1) and (3) leads to the same bound. In all cases, we thus obtain a uniform relative error bound
(4) 
Remark
In the case when the polynomials and are truncations of formal power series and , the asymptotic behaviour of the coefficients is usually of the form
(5) 
for some suitable slowly increasing function . It is natural to rescale by considering the polynomial
with , since its coefficients are comparable in magnitude:
If has a similar asymptotic behaviour
(6) 
we may then compute using the formula
and the algorithm from the previous section for the multiplication of and .
Now assume that we also have an asymptotic formula
Then it can be shown that the relative error of each individual coefficient is bounded by
(7) 
for all sufficiently large . In other words, the multiplication method with scaling amounts to a loss of bits of precision. Under the assumption that for a sufficiently large constant , we are thus left with at least correct leading bits in the mantissa of each coefficient. The asymptotic behaviour of is closely related to the dominant singularities of and and for many interesting applications in combinatorics or physics, one has . For more details, and the application of this technique to the relaxed computation of formal power series, we refer to [vdH02a, Section 6.2].
It sometimes happens that only “limsup versions”
of (5) and (6) are verified. In that case, the individual relative error estimates (7) cease to hold, but we still have the scaled uniform bound
(8) 
which is a direct consequence of the corresponding bound (4) in the previous section.
It also frequently occurs that does not have the same radius of convergence as , in which case we rather have an asymptotic form
Without loss of generality, we may assume that . When considering the convolution product
(9) 
we observe that the summands approximately form a geometric progression. In first approximation, only the last terms therefore contribute to the leading bits of the result (see figure 1). The computation of can thus be reduced to the computation of the products
using the notation
When computing each of these products using the scaling and a doubled precision , the individual coefficients of are again obtained with a relative precision of the form . The total cost of this algorithm is , which reduces to for large and fixed . A generalization of this idea will be analyzed in more detail in section 7.
Roughly speaking, the algorithms from sections 2 and 3 are numerically stable when the roots of and lie in an annulus around the unit circle resp. the circle of radius . When this is no longer the case, one may still try to apply the algorithm from section 3 for partial products , using scales which vary as a function of and .
A central concept for doing this is the numeric Newton polygon of , which is the convex hull of the halflines for all , with the convention that (see figure 2 for two examples). The numeric Newton polygon is characterized by its exponent polynomial
with coefficients
Equivalently, we may characterize by the first exponent and the slopes with . Notice that .
It is easy to compute the vertices , of the Newton polynomial of by induction over : at each step, we add at the right of the sequence and repeatedly remove the before last pair as long as it lies below the edge from to . Using this method, the vertices of , as well as the numbers and , can be computed in linear time.
The scaling operation has a simple impact on exponent polynomials:
The norms of and and their exponent polynomials are related by
since the maximum value of a convex hull of a set of pairs coincides with the maximum values of the points themselves.
Recall that the Minkowski sum of the convex sets and is defined by
An example of a Minkowski sum is shown in figure 2. It is convenient to regard and as “maxplus polynomials” over the semialgebra in which multiplication and addition are replaced by addition and the operator . These polynomials are convex in the sense that its sequences of coefficients are convex. The exponent polynomial of may then be reinterpreted as a “product” of polynomials:
Clearly, , and it is easily shown that the slopes of are obtained by merging the ordered lists and . This provides us with a lineartime algorithm for the computation of as a function of and . The crucial property of is that it approximates well.
Proof. Let . Applying (2) and (3), we have
Using (11) and the fact that , we obtain
Given arbitrary convex maxplus polynomials and with , it therefore suffices to show that
Indeed, for , let be such that . Then
The relation is proved similarly.
It is classical that the numeric properties of a polynomial can be greatly effected by slightly modifying one of its coefficients. For instance, if , then its root can move from to , when modifying its coefficient by an amount as small as . Nevertheless, as we will detail in section 9, one may expect that many of the numeric properties of do not sensibly change if we modify a coefficient by less than .
If the coefficients of are known with absolute errors , this motivates us to introduce the relative Newton error of by
For instance, if is an approximation of the product , then its relative Newton error would typically be given by
Alternatively, we may regard as the minimal uniform error of the approximations for all possible scales :
In this section and the subsequent sections, we propose an efficient algorithm for computing an approximation , which is numerically stable in the sense that is small.
The first main idea behind Newton multiplication is that only part of the products substantially contribute to : the pair and the corresponding product are said to be negligible if
Indeed, when neglecting the sum of all such products in the computation of , the corresponding increase of the relative Newton error is bounded by . This follows from proposition 3 and the fact that there are at most pairs which contribute to a fixed coefficient . The lefthand picture in figure 3 illustrates a typical zone of nonnegligible contributions . The lower and upper limits and of this zone are increasing functions in and can therefore be determined in linear time as a function of and .
A set of pairs is said to be negligible if each of its elements is negligible. A partial product is said to be negligible if is negligible and admissible if there exists no proper subrectangle of such that is negligible. Clearly, any nonnegligible partial product gives rise to a unique admissible partial product by taking to be the smallest subrectangle whose complement in is negligible. Any subdivision
(12) 
therefore gives rise to a new disjoint union
(13) 
of admissible subrectangles, whose complement in is negligible. The second disjoint union will be called the admissible part of the first one. The righthand side of figure 3 illustrates the admissible part of a socalled product subdivision (see section 7 below).
Our next task is to find a suitable subdivision (12) with admissible part (13), such that can be computed both efficiently and in a numerically stable way. The second main idea behind Newton multiplication is to search for a subdivision such that the products can be performed using suitable scalings and such that the rectangles are yet reasonably large. As a first step in this construction, we will approximate the Newton polygons of and by simpler ones, merging those edges whose slopes differ only by a small amount.
Given a slope and a convex maxplus polynomial , we define the discrepancy of by
Then the individual exponents of lie in a strip of height :
(14) 
In the particular case when equals the main slope of ,
we call the discrepancy of and also denote it by . For any and any second convex maxplus polynomial , the convexity of and implies
More generally, one defines the discrepancy of on a range by and similarly . We have
for any subdivision of the range into and .
Let us now return to the approximation of the Newton polygon by a new one with fewer edges. Let be an arbitrary but fixed constant, such as , and denote . We now construct the sequence by taking each maximal such that (see figure 4 below). Using the convexity of , it is not hard to check that this construction can be done in linear time using a single traversal of . The edges of between and are wellapproximated by a single edge from to . A similar sequence is constructed for .
A first candidate for the subdivision (12) is the product subdivision:
(18) 
where the and are as in the previous section. Figure 3 shows an example of a product subdivision.
Proof. Assume the contrary and let and be the top left and bottom right rectangles in the product subdivision, for which the admissible parts contribute to . We have or . By symmetry, we may assume . Now consider the convex maxplus polynomials
Using (17), we have . From (15) and (16), we thus get
Now coincides with , whence there exists an index with
which means that the pair is negligible. But this is only possible if either or is also negligible. This, in its turn, implies that either the rectangle or the rectangle is negligible, in contradiction with our hypothesis.
Proof. The first assertion follows from the facts that each diagonal with intersects at most rectangles and that there are less than such diagonals. The rectangles can be computed in time , by following the lower and upper limits and of the zone of nonnegligible pairs.
The subdivision we are really after refines (18). More precisely, consider a nonnegligible rectangle of the form . Let
be the slope of on . Swapping and if necessary, we may assume without loss of generality that . We now construct the sequence , by choosing each minimal such that , and subdivide
(19) 
We may reinterpret figure 1 as an illustration of the nonnegligible pairs inside the rectangle , together with the subdivision (19) and its corresponding admissible part. Performing the replacement (19) for all nonnegligible rectangles in (18), while leaving the negligible rectangles untouched, we obtain a finer subdivision of , which we call the Newton subdivision.
Proof. Consider a nonnegligible rectangle in the product subdivision. In view of lemma (4), it suffices to show that there exists at most rectangles in the admissible part of its subdivision (19) which contribute to .
Now assume that and satisfy and assume that satisfy . Setting , we then have
whence
This shows that the pair is negligible. We conclude that the diagonal intersects at most admissible rectangles.
We can now describe our algorithm for computing the product . We start by the computation of the admissible part of the Newton subdivision. For each rectangle in this admissible part, we compute its contribution to as follows. By construction, we have or for or . Setting , we compute the contribution of to using the multiplication method with scaling from section 3,
(20) 
using a precision of instead of bits. The final result is obtained by summing all partial products using straightforward floating point arithmetic. This algorithm leads to the following refinement of theorem 1.
(21) 
Proof. We claim that the computation of the product (20) using a precision of bits contributes to the relative Newton error of the global product by an amount which is bounded by . This clearly implies (21), since for a fixed coefficient in the product, at most partial products of the form contribute to .
Setting , and , the bound (8) yields
For any , the bounds (2), (14) and proposition 3 also imply
Combining both estimates, we obtain
This completes the proof of our claim.
The proof of the complexity bound relies on lemma 6. Let
be the admissible part of the Newton subdivision and let . Consider the total length
of the left borders and the lower borders of all rectangles . Since every diagonal hits the union of these borders in at most points, we have . The total cost of all multiplications is therefore bounded by
This completes the proof of our theorem.
Remark
Now that we have an efficient multiplication which is numerically stable in terms of the relative Newton error, it is interesting to investigate this concept a bit closer. First of all, its name is justified by the property
which follows from proposition 3. Some natural numeric quantities associated to a numeric polynomial are its coefficients, its roots and its evaluations at points. For our notion of relative Newton error to be satisfactory, a small relative Newton error should therefore imply small relative errors for the coefficients, the roots and evaluations at points.
Now the example from the introduction shows that it is impossible to obtain good relative errors for all coefficients. Nevertheless, given a polynomial with a small relative Newton error , we can at least guarantee small relative errors for those coefficients which correspond to vertices of the Newton polygon. For most practical purposes, we expect this to be sufficient. However, theoretically speaking, the naive multiplication algorithm of polynomials sometimes provides better relative errors for other coefficients. This is for instance the case when squaring a polynomial of the form , where is very small compared to .
For what follows, it will be convenient to redefine . Let denote the set of roots of , ordered by increasing magnitudes . By analogy with the power series setting, the norms roughly correspond to the slopes of the Newton polygon. We expect the existence of a bound
although we have not yet proved this. The worst case arises when admits a single root of multiplicity .
We define the relative distance between (not both zero) by
and the relative distance between and , counted with multiplicities, by
Clearly, the relative error of the evaluation of at increases when approaches . The following proposition shows that the relative error of can be kept small as long as is sufficiently large.
Proof. Modulo a suitable scaling and multiplication of by a constant, we may assume without loss of generality that and . We claim that . Indeed,
so, considering FFT transforms with respect to a th root of unity , we get
Since , its coefficients are known with absolute errors . We can therefore evaluate with absolute error and relative error .
Let us finally consider the computation of the roots in the case when is known with relative Newton error . The worst case again arises when has a single root of multiplicity . Indeed, a relative error inside gives rise to a relative error for , so that we cannot expect anything better than for each . Nevertheless, given an isolated root and , we have and
Using a similar argument as in the proof of proposition 10, it follows that
Indeed, after reduction to the case when and , we combine the facts that , and .
In this paper, we have presented a fast multiplication algorithm for polynomials over , which is numerically stable in a wide variety of cases. In order to capture the increased stability in a precise theorem, we have shown the usefulness of numeric Newton polygons and the relative Newton error. Even though our algorithm was presented in the case of real coefficients, it is straightforward to generalize it to the complex case.
Multiplication being the most fundamental operation on polynomials, it can be hoped that fast and numerically stable algorithms for other operations (division, g.c.d., multipoint evaluation, etc.) can be developed along similar lines. This might for instance have applications to polynomial root finding [Pan96]. Indeed, an operation such as the Graeffe transform can be done both efficiently and accurately using our algorithm.
The multiplication algorithm with scaling from [vdH02a,
Section 6.2] and section 3 has been implemented inside the
Only a preliminary version of Newton multiplication has been implemented
so far inside
J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.
T. Granlund et al. GMP, the GNU multiple precision arithmetic library.
http://www.swox.com/gmp, 1991.
G. Hanrot, V. Lefèvre, K. Ryde, and P. Zimmermann. MPFR, a C library for multipleprecision floatingpoint computations with exact rounding. http://www.mpfr.org, 2000.
L. Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. Jour. für die reine und ang. Math., 92:1–122, 1882.
Victor Y. Pan. On approximating complex polynomial zeros: Modified quadtree (Weyl's) construction and improved Newton's iteration. Technical Report RR2894, INRIA Sophia, 1996.
A. Schönhage. Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients. In J. Calmet, editor, EUROCAM '82: European Computer Algebra Conference, volume 144 of Lect. Notes Comp. Sci., pages 3–15, Marseille, France, April 1982. Springer.
A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.