On the bit-complexity ofsparse polynomial multiplication

It is classical [SS71, Sch77, CK91, Für07] that the product of two integers, or two univariate polynomials over any ring, can be performed in softly linear time for the usual dense representations. More precisely, two integers of bit-size at most

can be multiplied in time

on a Turing machine, and the product of two univariate polynomials can be done with

arithmetic operations in the coefficient ring.

Multivariate polynomials often admit many zero coefficients, so a sparse representation is usually preferred over a dense one: each monomial is a pair made of a coefficient and an exponent written in the dense binary representation. A natural problem is whether the product

of two sparse multivariate polynomials in

can be computed in softly linear time. We will assume to be given a subset

of size

that contains the support of

. We also let

be the minimal numbers with

For coefficient fields of characteristic zero, it is proved in [CKL89] that

can be computed using

operations over the ground field. This algorithm uses fast evaluation and interpolation at suitable points built from prime numbers. Unfortunately, the method hides an expensive growth of intermediate integers involved in the linear transformations, which prevents the algorithm from being softly linear in terms of bit-complexity.

In this paper, we turn our attention to various concrete coefficient rings for which it makes sense to study the problem in terms of bit-complexity. For these rings, we will present multiplication algorithms which admit softly linear bit-complexities, under the mild assumption that

. Our approach is similar to [CKL89], but relies on a different kind of evaluation points. Furthermore, finite fields form an important special case for our method.

Let us briefly outline the structure of this paper. In section 2, we start with a presentation of the main algorithm over an arbitrary effective algebra

with elements of sufficiently high order. In section 3, we treat various specific coefficient rings. In section 4 we give an application to multivariate power series multiplication. In the last section 5, we report on timings with our Mathemagix implementation [vdH+02b].

2.Sparse polynomial multiplication

2.1.General setting

Let

be an effective algebra over an effective field

, i.e. all algebra and field operations can be performed by algorithm.

We will denote by

the cost for multiplying two univariate polynomials of degree

over

in terms of the number of arithmetic operations in

. Similarly, we denote by

the time needed to multiply two integers of bit-size at most

. One can take

[CK91] and

[Für07], where

represents the iterated logarithm of

. Throughout the paper, we will assume that

and

are increasing. We also assume that

and

Given a multivariate polynomial

and an index

, we denote

and let

be the coefficient of

. The support of

is defined by

and we denote by

its cardinality.

In the sparse representation, the polynomial

is stored as a sequence of exponent-coefficient pairs

. Natural numbers are represented by their sequences of binary digits. The bit-size of an exponent

, where

. We let

be the bit-size of

In this and the next section, we are interested in the multiplication

of two sparse polynomials

. We assume given a finite set

, such that

. We will write

for its size and

for its bit-size. We also denote

and

. For each

, we introduce

, which sharply bounds the partial degree in

for each monomial in

. We denote

2.2.Evaluation, interpolation and transposition

Given

pairwise distinct points

and

, let

be the linear map which sends

, with

. In the canonical basis, this map corresponds to left multiplication by the generalized Vandermonde matrix

The computation of

and its inverse

(if

) correspond to the problems of multi-point evaluation and interpolation of a polynomial. Using binary splitting, it is classical [MB72, Str73, BM74] that both problems can be solved in time

. Notice that the algorithms only require vectorial operations in

(additions, subtractions and multiplications with elements in

Our main algorithm relies on the efficient computations of the transpositions

and

. The map

corresponds to left multiplication by

By the transposition principle [Bor56, Ber], the operations

and

can again be computed in time

There is an efficient direct approach for the computation of

[BLS03]. Given a vector

with entries

, the entries

are identical to the first

coefficients of the power series

The numerator and denominator of this rational function can be computed using binary splitting. If

, then this requires

vectorial operations in

[GG02, Theorem 10.10]. The truncated division of the numerator and denominator at order

requires

vectorial operations in

. If

, then we cut the sum into

parts of size

, and obtain the complexity bound

Inversely, assume that we wish to recover

from

, when

. For simplicity, we assume that the

are non-zero (this will be the case in the sequel). Setting

and

, we notice that

for all

. Hence, the computation of the

reduces to two multi-point evaluations of

and

divisions. This amounts to a total of

vectorial operations in

and

divisions in

2.3.General multiplication algorithm

Assume now that we are given an element

of multiplicative order at least

and consider the following evaluation map

Given

, let

be the matrix of

restricted to the space of polynomials with support included in

. Setting

, we have

Taking

resp.

, this allows us to compute

and

using our algorithm for transposed multi-point evaluation from section 2.2. We obtain

using one Hadamard product

. Taking

, the points

are pairwise distinct, since the

are smaller than the order of

. Hence

is invertible and we recover

from

using transposed multi-point interpolation.

Proof. By classical binary powering, the computation of the sequence

takes

operations in

because each

does appear in the entries of

. Then the computation of all the

for

(resp.

and

) requires

(resp.

and

) products in

. Using the complexity results from section 2.2, we may compute

and

using

vectorial operations in

. We deduce

using

more multiplications in

. Again using the results from section 2.2, we retrieve the coefficients

after

further vectorial operations in

and

divisions in

. Adding up the various contributions, we obtain the theorem.

Remark 2. Similar to FFT multiplication, our algorithm falls into the general category of multiplication algorithms by evaluation and interpolation. This makes it possible to work in the so-called “transformed model” for several other operations besides multiplication.

3.Various coefficient rings

3.1.Finite fields

is the finite field

with

elements, then its multiplicative group is cyclic of order

. Whenever

, it follows that the main theorem 1 applies for any primitive element

of this group.

Usually,

is given as the quotient

for some monic and irreducible polynomial

of degree

. In that case, a multiplication in

amounts to

ring operations in

. An inversion in

requires an extended gcd computation in

and gives rise to

operations in

. Using Kronecker multiplication, we can also take

. Using these estimates, Theorem 1 implies:

Applying the general purpose algorithm from [CK91], two polynomials of degree

over

can be multiplied in time

. Alternatively, we may lift the multiplicants to polynomials in

, use Kronecker multiplication and reduce modulo

. As long as

, this yields the better complexity bound

. Theorem 1 therefore implies:

Remark 5. If

then it is always possible to build an algebraic extension of suitable degree

in order to apply the corollary. Such constructions are classical, see for instance [GG02, Chapter 14]. We need to have

, so

should be taken of the order

, which also corresponds to the additional overhead induced by this method.

Remark 6. Under the generalized Riemann hypothesis, a primitive element in

can be constructed in polynomial time [BS91]. If

is odd, then

is a primive element if and only if

. In

, the smallest

such that

is a primitive element satisfies

3.2.Integer coefficients

3.2.1.Big prime algorithm

One approach for the multiplication

of polynomials with integer coefficients is to reduce the problem modulo a suitable prime number

. This prime number should be sufficiently large such that

can be read off from

and such that

admits elements of order

Let

denote the maximal bit-length of the coefficients of

and similarly for

and

. Since

Remark 8. Let

denote the

-th prime number. The prime number theorem implies that

. Cramér's conjecture [Cra36] states that

the conjecture implies that the smallest prime number

with

satisfies

. Using a polynomial time primality test [AKS04], it follows that this number can be computed by brute force in time

. In addition, in order to satisfy the complexity bound it suffices to tabulates prime numbers of sizes 2, 4, 8, 16, etc.

3.2.2.Chinese remaindering

In our algorithm and Theorem 1, we regard the computation of a prime number

as a precomputation. This is reasonable if

is not too large. Now the quantity

usually remains reasonably small. Hence, our assumption that

is not too large only gets violated if

becomes large. In that case, we will rather use Chinese remaindering. We first compute

prime numbers

with

Each

contains a primitive root of unity

of order

. We next proceed as before, with

and

such that

for each

. Indeed, the fact that each

is invertible implies that

is invertible.

We will say that

form a reduced sequence of prime moduli with order

and capacity

, whenever

and

. We then have the following refinement of Corollary 7:

3.3.Floating point coefficients

An important kind of sparse polynomials are power series in several variables, truncated by total degree. Such series are often used in long term integration of dynamical systems [MB96, MB04], in which case their coefficients are floating point numbers rather than integers. Assume therefore that

and

are polynomials with floating coefficients with a precision of

bits.

Let

be the maximal exponent of the coefficients of

. For a so called discrepancy

, fixed by the user, we let

be the integer polynomial with

for the sup-norm on the coefficients. If all coefficients of

have a similar order of magnitude, in the sense that the minimal exponent of the coefficients is at least

, then we actually have

. Applying a similar decomposition to

, we may compute the product

using the algorithm from section 2 and convert the resulting coefficients back into floating point format.

Usually, the coefficients

of a univariate power series

are approximately in a geometric progression

. In that case, the coefficients of the power series

with

are approximately of the same order of magnitude. In the multivariate case, the coefficients still have a geometric increase on diagonals

, but the parameter

depends on the diagonal. After a suitable change of variables

, the coefficients in a big zone near the main diagonal become of approximately the same order of magnitude. However, the discrepancy usually needs to be chosen proportional to the total truncation degree in order to ensure sufficient accuracy elsewhere.

3.4.Rational coefficients

Let us now consider the case when

. Let

and

denote the least common multiples of the denominators of the coefficients of

resp.

. One obvious way to compute

is to set

, and compute

using one of the methods from section 3.2. This approach works well in many cases (e.g. when

and

are truncations of exponential generating series). Unfortunately, this approach is deemed to be very slow if the size of

is much larger than the size of any of the coefficients of

An alternative, more heuristic approach is the following. Let

be an increasing sequence of prime numbers with

and such that each

is relatively prime to the denominators of each of the coefficients of

and

. For each

, we may then multiply

and

using the algorithm from section 2. For

, we may recover

using Chinese remaindering and attempt to reconstruct

from

using rational number reconstruction [GG02, Chapter 5]. If this yields the same result for a given

and

, then the reconstructed

is likely to be correct at those stages.

Of course, if we have an a priori bound on the bit sizes of the coefficients of

, then we may directly take a sufficient number of primes

such that

can be reconstructed from its reduction modulo

3.5.Algebraic coefficients

Let

be an algebraic number field. For some algebraic integer

, we may write

. Let

be the monic polynomial of minimal degree

with

. Given a prime number

, the polynomial

induces an algebraic extension

, where

. Reduction modulo

of a sparse polynomial

then yields a sparse polynomial

. We have seen in section 3.1 how to multiply sparse polynomials over the finite field

. Choosing one or more sufficiently large prime numbers

, we may thus apply the same approaches as in section 3.2 in order to multiply sparse polynomials over

. Using the techniques from section 3.4, we next deal with the case of sparse polynomials over

4.Fast products of power series

4.1.Total degree

is the set of polynomials of total degree

. Given

, the aim of this section is to describe efficient algorithms for the computation of

. We will follow and extend the strategy described in [LS03].

Remark 10. The results of this section generalize [vdH02a] to so called pondered total degrees

with

, but for the sake of simplicity, we will stick to ordinary total degrees.

4.2.Projective coordinates

Inversely, for any

, there exists a unique

with

. The transformation

is an injective morphism of

-algebras. Consequently, given

, we will compute the truncated product

using

4.3.Multiplication by evaluation and interpolation

Let

be an element of

of sufficiently high order

. Taking

as above, the construction in section 2.3 yields a

-linear and invertible evaluation mapping

4.4.Complexity analysis

Proof. The transforms

and

require a negligible amount of time. The computation of the evaluation points

only involves

products in

, when exploiting the fact that

is an initial segment. The computation of

and

requires

vectorial operations in

. The computation of

can be done using

general operations in

. Recovering

again requires

vectorial operations in

, as well as

divisions in

In the case when

, the assumption

, with

, guarantees that the coefficients of the result

can be reconstructed from their reductions modulo

. Combining this observation with Chinese remaindering, we obtain:

5.Implementation and timings

We have implemented the fast series product of the previous section within the C++ library multimix of Mathemagix [vdH+02b]. We report on timings for when

, with

, on a 2.4 GHz Intel(R) Core(TM)2 Duo platform. Recall that

is the number of the variables and

the truncation order. Timings are given in milliseconds. The line naive corresponds to the naive multiplication, that performs all the two by two monomial products, while the line fast stands for the algorithm of the previous section. We have added the size

of the support of the series, together with the cost

of our univariate multiplication in size

for comparison. An empty cell corresponds to a computation that needed more than 10 minutes. The following tables demonstrate that the new fast algorithms are relevant to practice and that the theoretical soflty linear asymptotic cost can really be observed.

Bibliography

[AKS04]: M. Agrawal, N. Kayal, and N. Saxena. Primes is in p. Annals of Mathematics, 160(2):781–793, 2004.
[Ber]: D. Bernstein. The transposition principle. Available from
http://cr.yp.to/transposition.html.
[BLS03]: A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Proceedings of ISSAC 2003, pages 37–44. ACM, 2003.
[BM74]: A. Borodin and R.T. Moenck. Fast modular transforms. Journal of Computer and System Sciences, 8:366–386, 1974.
[Bor56]: J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research B: Electrophysics, Acoustics, Optics, Mathematical Methods, 6:1–74, 1956.
[BS91]: Johannes Buchmann and Victor Shoup. Constructing nonresidues in finite fields and the extended riemann hypothesis. In STOC '91: Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 72–79, New York, NY, USA, 1991. ACM.
[CK91]: D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.
[CKL89]: J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proc. ISSAC '89, pages 121–128, Portland, Oregon, A.C.M., New York, 1989. ACM Press.
[Cra36]: H. Cramér. On the order of magnitude of the difference between consecutive prime numbers. Acta Arithmetica, 2:23–46, 1936.
[Für07]: M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.
[GG02]: J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.
[LS03]: G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.
[MB72]: R.T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.
[MB96]: K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.
[MB04]: K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.
[Sch77]: A. Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. Acta Informatica, 7:395–398, 1977.
[SS71]: A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.
[Str73]: V. Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. Numer. Math., 20:238–251, 1973.
[vdH02a]: J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.
[vdH+02b]: J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.

1.Introduction

2.Sparse polynomial multiplication

2.1.General setting

2.2.Evaluation, interpolation and transposition

2.3.General multiplication algorithm

3.Various coefficient rings

3.1.Finite fields

3.2.Integer coefficients

3.2.1.Big prime algorithm

3.2.2.Chinese remaindering

3.3.Floating point coefficients

3.4.Rational coefficients

3.5.Algebraic coefficients

4.Fast products of power series

4.1.Total degree

4.2.Projective coordinates

4.3.Multiplication by evaluation and interpolation

4.4.Complexity analysis

5.Implementation and timings

Bibliography