Making fast multiplication of polynomials numerically stable

Consider two polynomials and with multiple precision floating point coefficients. Although the product can in principle be computed efficiently using FFT multiplication, this algorithm is numerically stable only if the coefficients of are of the same order of magnitude and similarly for . In this paper, we present a new asymptotically fast multiplication algorithm which has a “better” numerically stability. We also provide some theoretical support for what we feel to be “better”, by introducing the concept of “relative Newton error”.

Keywords: polynomial, floating point number, multiplication, algorithm, FFT

A.M.S. subject classification: 65T50, 42-04, 68W30

1.Introduction

Let

be the set of floating point numbers with an

-bit mantissa and an

-bit exponent. Each non-zero number

of this kind can be written as

, with

In what follows,

is assumed to be fixed once and for all. For the sake of simplicity, we will not study overflows and underflows in detail and instead make the assumption that

has been chosen large enough with respect to the input data. Consider two polynomials

with coefficients in

. For definiteness, we will assume

and

. In this paper, we study the problem of multiplying

and

Using FFT-multiplication [CT65], the polynomials

and

can in principle be multiplied in time

. Here

is the complexity of

-bit integer multiplication [SS71] and it can be assumed that

increases with

. Alternatively, one may rewrite

and

as “floating point polynomials”, with an

-bit integer exponent and polynomials with

-bit integer coefficients as mantissas. Using Kronecker's method, we may then multiply

and

in time

, where

. If

, then this complexity can be improved a bit further into

by using

multiplications of degrees

In the case when the coefficients of

are all of the same order of magnitude, and similarly for

, the above algorithms are very efficient and numerically stable. In section 2, we will give a more detailed uniform error analysis. Unfortunately, the algorithms are very unstable if the coefficients of

are not of the same order of magnitude. For instance, taking

with

, the transformation of

and

into floating point polynomials

and

amounts to neglecting the coefficients

and

. In particular, when approximating

, only the leading coefficient

of the product will be computed with a satisfactory accuracy.

A situation which arises often in practice is that the above fast multiplication algorithms can be made numerically stable using a suitable scaling

This scaling method is particularly useful in the case when

and

are truncations of formal power series. In section 3, we will briefly explain why and we refer to [vdH02a, Section 6.2] for a more detailed analysis.

However, fast multiplication with scaling only works in the case when the coefficients of

have similar orders of magnitude for a suitable

. Geometrically speaking, this means that the roots of

all lie in a “not so large” annulus around the circle of radius

. In practice, this condition is not always satisfied, which leaves us with the question of designing a more general multiplication algorithm which is both asymptotically fast and numerically stable.

In its full generality, this problem is ill-posed. For instance, in cases when cancellation occurs, it is impossible to design an algorithm which computes all coefficients

with a relative error bounded by

. A simple example of this situation is the computation of

for

. Nevertheless, such cancellations can only occur “below” the “numerical Newton polygon” of

: when using the naive

multiplication algorithm, based on the convolution sums

the coefficients

which correspond to vertices of this Newton polygon are computed accurately, with a relative error bounded by

. Numerical Newton polygons will be defined and discussed in section 4. They are the natural numeric counterpart of Newton polygons for polynomials with power series coefficients, based on the analogy between floating point numbers and (Laurent) series, which goes back to Newton himself.

A reasonable problem is therefore to compute

with a similar kind of accuracy as the naive

multiplication algorithm does, but faster. This also leads to the question of formalizing what can be meant by “similar kind of accuracy”. In section 5, we define the notion of “relative Newton error”

for the computation of

. We regard this notion as the natural generalization of the notion of “relative error”, when numbers are replaced by numerical polynomials. In section 9, we will state a few interesting properties of this concept, which provide further evidence for its usefulness.

The central part of the paper is devoted to the description of an asymptotically fast multiplication algorithm which has a good numerical stability in terms of the relative Newton error. The main ideas behind the algorithm are the following:

2.Multiplication of pre-conditioned polynomials

Let us describe in more detail the second classical multiplication algorithm mentioned in the introduction, when the coefficients of

are of the same order of magnitude, and similarly for

. We start with the determination of the maximum exponents

and the approximation of

and

by integer polynomials

and

whose coefficients have bit-sizes bounded by

. At the second step, we use Kronecker's method for the multiplication of

and

. Taking

, the product

can be reconstructed [Kro82, Sch82] from the product of the integers

and

. Assuming that

, these integers have sizes

and

, whence can be multiplied in time

. We finally obtain an approximation

of the product

by fitting the coefficients of

into the nearest available floating point numbers in

Let us now analyze the errors induced by the above scheme, where we recall our assumption that no overflows or underflows occur during our computations. When computing

and

by rounding to the nearest, we obtain the uniform absolute error bounds

denotes the sup-norm of

if we consider

as a vector of coefficients. Consequently,

The second bound follows by considering the FFT transforms of

and

with respect to a

-th root of unity

. Indeed, the sup-norms of these transforms satisfy

In the case when

, the combination of (1) and (3) leads to the same bound. In all cases, we thus obtain a uniform relative error bound

Remark 2. We have chosen Kronecker's method for the multiplication of integer polynomials, because it is relatively straightforward to implement on top of a fast library for integer arithmetic such as Gmp [Gra91]. Internally, such integer libraries rewrite numbers as polynomials, say in

, and rely on FFT multiplication of these polynomials. At the price of improving Gmp, better performance can be achieved by using a bi-variate FFT on both

and

. In general, this yields only a constant speed-up, but in the somewhat exotic case when

, this method still admits a complexity

instead of

3.Multiplication with scaling

In the case when the polynomials

and

are truncations of formal power series

and

, the asymptotic behaviour of the coefficients is usually of the form

for some suitable slowly increasing function

. It is natural to rescale

by considering the polynomial

Then it can be shown that the relative error of each individual coefficient

is bounded by

for all sufficiently large

. In other words, the multiplication method with scaling amounts to a loss of

bits of precision. Under the assumption that

for a sufficiently large constant

, we are thus left with at least

correct leading bits in the mantissa of each coefficient. The asymptotic behaviour of

is closely related to the dominant singularities of

and

and for many interesting applications in combinatorics or physics, one has

. For more details, and the application of this technique to the relaxed computation of formal power series, we refer to [vdH02a, Section 6.2].

of (5) and (6) are verified. In that case, the individual relative error estimates (7) cease to hold, but we still have the scaled uniform bound

which is a direct consequence of the corresponding bound (4) in the previous section.

It also frequently occurs that

does not have the same radius of convergence as

, in which case we rather have an asymptotic form

Without loss of generality, we may assume that

. When considering the convolution product

we observe that the summands

approximately form a geometric progression. In first approximation, only the last

terms therefore contribute to the leading

bits of the result (see figure 1). The computation of

can thus be reduced to the computation of the products

When computing each of these products using the scaling

and a doubled precision

, the individual coefficients

are again obtained with a relative precision of the form

. The total cost of this algorithm is

, which reduces to

for large

and fixed

. A generalization of this idea will be analyzed in more detail in section 7.

4.Numeric Newton polygons

Roughly speaking, the algorithms from sections 2 and 3 are numerically stable when the roots of

and

lie in an annulus around the unit circle resp. the circle of radius

. When this is no longer the case, one may still try to apply the algorithm from section 3 for partial products

, using scales

which vary as a function of

and

A central concept for doing this is the numeric Newton polygon

, which is the convex hull of the half-lines

for all

, with the convention that

(see figure 2 for two examples). The numeric Newton polygon

is characterized by its exponent polynomial

Equivalently, we may characterize

by the first exponent

and the slopes

with

. Notice that

It is easy to compute the vertices

of the Newton polynomial of

by induction over

: at each step, we add

at the right of the sequence and repeatedly remove the before last pair

as long as it lies below the edge from

. Using this method, the vertices of

, as well as the numbers

and

, can be computed in linear time.

since the maximum

-value of a convex hull of a set of pairs

coincides with the maximum

-values of the points themselves.

An example of a Minkowski sum is shown in figure 2. It is convenient to regard

and

as “max-plus polynomials” over the semi-algebra

in which multiplication and addition are replaced by addition and the operator

. These polynomials are convex in the sense that its sequences of coefficients are convex. The exponent polynomial of

may then be reinterpreted as a “product” of polynomials:

Clearly,

, and it is easily shown that the slopes of

are obtained by merging the ordered lists

and

. This provides us with a linear-time algorithm for the computation of

as a function of

and

. The crucial property of

is that it approximates

well.

Given arbitrary convex max-plus polynomials

and

with

, it therefore suffices to show that

5.Relative Newton error and truncation

It is classical that the numeric properties of a polynomial

can be greatly effected by slightly modifying one of its coefficients. For instance, if

, then its root can move from

, when modifying its coefficient

by an amount as small as

. Nevertheless, as we will detail in section 9, one may expect that many of the numeric properties of

do not sensibly change if we modify a coefficient

by less than

If the coefficients

are known with absolute errors

, this motivates us to introduce the relative Newton error of

For instance, if

is an approximation of the product

, then its relative Newton error would typically be given by

Alternatively, we may regard

as the minimal uniform error of the approximations

for all possible scales

In this section and the subsequent sections, we propose an efficient algorithm for computing an approximation

, which is numerically stable in the sense that

is small.

The first main idea behind Newton multiplication is that only part of the products

substantially contribute to

: the pair

and the corresponding product

are said to be negligible if

Indeed, when neglecting the sum of all such products in the computation of

, the corresponding increase of the relative Newton error is bounded by

. This follows from proposition 3 and the fact that there are at most

pairs

which contribute to a fixed coefficient

. The left-hand picture in figure 3 illustrates a typical zone of non-negligible contributions

. The lower and upper limits

and

of this zone are increasing functions in

and can therefore be determined in linear time as a function of

and

A set

of pairs

is said to be negligible if each of its elements is negligible. A partial product

is said to be negligible if

is negligible and admissible if there exists no proper subrectangle

such that

is negligible. Clearly, any non-negligible partial product

gives rise to a unique admissible partial product

by taking

to be the smallest subrectangle whose complement in

is negligible. Any subdivision

of admissible subrectangles, whose complement in

is negligible. The second disjoint union will be called the admissible part of the first one. The right-hand side of figure 3 illustrates the admissible part of a so-called product subdivision (see section 7 below).

6.Fitting the Newton polygon using fewer edges

Our next task is to find a suitable subdivision (12) with admissible part (13), such that

can be computed both efficiently and in a numerically stable way. The second main idea behind Newton multiplication is to search for a subdivision such that the products

can be performed using suitable scalings and such that the rectangles

are yet reasonably large. As a first step in this construction, we will approximate the Newton polygons of

and

by simpler ones, merging those edges whose slopes differ only by a small amount.

Given a slope

and a convex max-plus polynomial

, we define the

-discrepancy

we call

the discrepancy of

and also denote it by

. For any

and any second convex max-plus polynomial

, the convexity of

and

implies

More generally, one defines the

-discrepancy

on a range

and similarly

. We have

Let us now return to the approximation of the Newton polygon

by a new one with fewer edges. Let

be an arbitrary but fixed constant, such as

, and denote

. We now construct the sequence

by taking each

maximal such that

(see figure 4 below). Using the convexity of

, it is not hard to check that this construction can be done in linear time using a single traversal of

. The edges of

between

and

are well-approximated by a single edge from

. A similar sequence

is constructed for

7.The product and Newton subdivisions

where the

and

are as in the previous section. Figure 3 shows an example of a product subdivision.

Proof. Assume the contrary and let

and

be the top left and bottom right rectangles in the product subdivision, for which the admissible parts contribute to

. We have

. By symmetry, we may assume

. Now consider the convex max-plus polynomials

which means that the pair

is negligible. But this is only possible if either

is also negligible. This, in its turn, implies that either the rectangle

or the rectangle

is negligible, in contradiction with our hypothesis.

Proof. The first assertion follows from the facts that each diagonal

with

intersects at most

rectangles and that there are less than

such diagonals. The rectangles can be computed in time

, by following the lower and upper limits

and

of the zone of non-negligible pairs.

The subdivision we are really after refines (18). More precisely, consider a non-negligible rectangle

of the form

. Let

be the slope of

. Swapping

and

if necessary, we may assume without loss of generality that

. We now construct the sequence

, by choosing each

minimal such that

, and subdivide

We may re-interpret figure 1 as an illustration of the non-negligible pairs inside the rectangle

, together with the subdivision (19) and its corresponding admissible part. Performing the replacement (19) for all non-negligible rectangles

in (18), while leaving the negligible rectangles untouched, we obtain a finer subdivision of

, which we call the Newton subdivision.

Proof. Consider a non-negligible rectangle

in the product subdivision. In view of lemma (4), it suffices to show that there exists at most

rectangles in the admissible part of its subdivision (19) which contribute to

This shows that the pair

is negligible. We conclude that the diagonal

intersects at most

admissible rectangles.

8.Newton multiplication

We can now describe our algorithm for computing the product

. We start by the computation of the admissible part of the Newton subdivision. For each rectangle

in this admissible part, we compute its contribution to

as follows. By construction, we have

for

. Setting

, we compute the contribution

using the multiplication method with scaling from section 3,

using a precision of

instead of

bits. The final result

is obtained by summing all partial products

using straightforward floating point arithmetic. This algorithm leads to the following refinement of theorem 1.

Proof. We claim that the computation of the product (20) using a precision of

bits contributes to the relative Newton error of the global product

by an amount which is bounded by

. This clearly implies (21), since for a fixed coefficient

in the product, at most

partial products of the form

contribute to

be the admissible part of the Newton subdivision and let

. Consider the total length

of the left borders and the lower borders of all rectangles

. Since every diagonal hits the union of these borders in at most

points, we have

. The total cost of all multiplications

is therefore bounded by

Remark 9. Theorem 1 follows from theorem 8, when applying it for the increased precision

9.Properties of the relative Newton error

Now that we have an efficient multiplication which is numerically stable in terms of the relative Newton error, it is interesting to investigate this concept a bit closer. First of all, its name is justified by the property

which follows from proposition 3. Some natural numeric quantities associated to a numeric polynomial

are its coefficients, its roots and its evaluations at points. For our notion of relative Newton error to be satisfactory, a small relative Newton error should therefore imply small relative errors for the coefficients, the roots and evaluations at points.

Now the example

from the introduction shows that it is impossible to obtain good relative errors for all coefficients. Nevertheless, given a polynomial

with a small relative Newton error

, we can at least guarantee small relative errors

for those coefficients

which correspond to vertices

of the Newton polygon. For most practical purposes, we expect this to be sufficient. However, theoretically speaking, the naive multiplication algorithm of polynomials sometimes provides better relative errors for other coefficients. This is for instance the case when squaring a polynomial of the form

, where

is very small compared to

For what follows, it will be convenient to redefine

. Let

denote the set of roots of

, ordered by increasing magnitudes

. By analogy with the power series setting, the norms

roughly correspond to the slopes

of the Newton polygon. We expect the existence of a bound

although we have not yet proved this. The worst case arises when

admits a single root of multiplicity

Clearly, the relative error of the evaluation of

increases when

approaches

. The following proposition shows that the relative error of

can be kept small as long as

is sufficiently large.

Proof. Modulo a suitable scaling and multiplication of

by a constant, we may assume without loss of generality that

and

. We claim that

. Indeed,

Since

, its coefficients are known with absolute errors

. We can therefore evaluate

with absolute error

and relative error

Let us finally consider the computation of the roots

in the case when

is known with relative Newton error

. The worst case again arises when

has a single root of multiplicity

. Indeed, a relative error

inside

gives rise to a relative error

for

, so that we cannot expect anything better than

for each

. Nevertheless, given an isolated root

and

, we have

and

Indeed, after reduction to the case when

and

, we combine the facts that

and

10.Conclusion

In this paper, we have presented a fast multiplication algorithm for polynomials over

, which is numerically stable in a wide variety of cases. In order to capture the increased stability in a precise theorem, we have shown the usefulness of numeric Newton polygons and the relative Newton error. Even though our algorithm was presented in the case of real coefficients, it is straightforward to generalize it to the complex case.

Multiplication being the most fundamental operation on polynomials, it can be hoped that fast and numerically stable algorithms for other operations (division, g.c.d., multi-point evaluation, etc.) can be developed along similar lines. This might for instance have applications to polynomial root finding [Pan96]. Indeed, an operation such as the Graeffe transform can be done both efficiently and accurately using our algorithm.

The multiplication algorithm with scaling from [vdH02a, Section 6.2] and section 3 has been implemented inside the Mathemagix system [vdH02b]. We have experienced it to be very efficient for the expansion of power series. For instance, the accurate computation of

terms of

with a

-bit precision typically takes about one minute. One reason behind this efficiency stems from the reduction to Kronecker multiplication. Indeed, multiple precision libraries for floating point numbers, such as Mpfr [HLRZ00], involve a lot of overhead for every single operation. We rather reduce the whole problem to one big integer multiplication.

Only a preliminary version of Newton multiplication has been implemented so far inside Mathemagix. Although this implementation uses a simpler, but not always efficient subdivision, it already produces encouraging results. Work is in progress to improve the implementation and reduce its combinatorial overhead. It remains worth investigating whether there exist other, simpler and/or more efficient subdivision schemes.

Bibliography

[CT65]

J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.

[Gra91]

T. Granlund et al. GMP, the GNU multiple precision arithmetic library.

http://www.swox.com/gmp, 1991.

[HLRZ00]

G. Hanrot, V. Lefèvre, K. Ryde, and P. Zimmermann. MPFR, a C library for multiple-precision floating-point computations with exact rounding. http://www.mpfr.org, 2000.

[Kro82]

L. Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. Jour. für die reine und ang. Math., 92:1–122, 1882.

[Pan96]

Victor Y. Pan. On approximating complex polynomial zeros: Modified quadtree (Weyl's) construction and improved Newton's iteration. Technical Report RR-2894, INRIA Sophia, 1996.

[Sch82]

A. Schönhage. Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients. In J. Calmet, editor, EUROCAM '82: European Computer Algebra Conference, volume 144 of Lect. Notes Comp. Sci., pages 3–15, Marseille, France, April 1982. Springer.

[SS71]

A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.

[vdH02a]

J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.

[vdH02b]

J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.

			(2)
			(3)

			(10)
			(11)

			(15)
			(16)