On the bit-complexity ofsparse polynomial multiplication

 Joris van der Hoeven

 Laboratoire de Mathématiques UMR 8628 CNRS Université Paris-Sud 91405 Orsay Cedex France

 Email: joris@texmacs.org

 Web: http://www.math.u-psud.fr/~vdhoeven

 Grégoire Lecerf

 Laboratoire de Mathématiques UMR 8100 CNRS Université de Versailles 45, avenue des États-Unis 78035 Versailles Cedex France

 Email: Gregoire.Lecerf@math.uvsq.fr

 Web: http://www.math.uvsq.fr/~lecerf

 January 5, 2015

 In this paper, we present fast algorithms for the product of two multivariate polynomials in sparse representation. The bit complexity of our algorithms are studied in detail for various types of coefficients, and we derive new complexity results for the power series multiplication in many variables. Our algorithms are implemented and freely available within the Mathemagix software. We show that their theoretical costs are well-reflected in practice. Keywords: sparse multiplication, power series, multi-point evaluation, algorithm A.M.S. subject classification: 68W30, 12-04, 30B10, 42-04, 11Y05

## 1.Introduction

It is classical [SS71, Sch77, CK91, Für07] that the product of two integers, or two univariate polynomials over any ring, can be performed in softly linear time for the usual dense representations. More precisely, two integers of bit-size at most can be multiplied in time on a Turing machine, and the product of two univariate polynomials can be done with arithmetic operations in the coefficient ring.

Multivariate polynomials often admit many zero coefficients, so a sparse representation is usually preferred over a dense one: each monomial is a pair made of a coefficient and an exponent written in the dense binary representation. A natural problem is whether the product of two sparse multivariate polynomials in can be computed in softly linear time. We will assume to be given a subset of size that contains the support of . We also let be the minimal numbers with .

For coefficient fields of characteristic zero, it is proved in [CKL89] that can be computed using operations over the ground field. This algorithm uses fast evaluation and interpolation at suitable points built from prime numbers. Unfortunately, the method hides an expensive growth of intermediate integers involved in the linear transformations, which prevents the algorithm from being softly linear in terms of bit-complexity.

In this paper, we turn our attention to various concrete coefficient rings for which it makes sense to study the problem in terms of bit-complexity. For these rings, we will present multiplication algorithms which admit softly linear bit-complexities, under the mild assumption that . Our approach is similar to [CKL89], but relies on a different kind of evaluation points. Furthermore, finite fields form an important special case for our method.

Let us briefly outline the structure of this paper. In section 2, we start with a presentation of the main algorithm over an arbitrary effective algebra with elements of sufficiently high order. In section 3, we treat various specific coefficient rings. In section 4 we give an application to multivariate power series multiplication. In the last section 5, we report on timings with our Mathemagix implementation [vdH+02b].

## 2.Sparse polynomial multiplication

### 2.1.General setting

Let be an effective algebra over an effective field , i.e. all algebra and field operations can be performed by algorithm.

We will denote by the cost for multiplying two univariate polynomials of degree over in terms of the number of arithmetic operations in . Similarly, we denote by the time needed to multiply two integers of bit-size at most . One can take [CK91] and [Für07], where represents the iterated logarithm of . Throughout the paper, we will assume that and are increasing. We also assume that and .

Given a multivariate polynomial and an index , we denote and let be the coefficient of in . The support of is defined by and we denote by its cardinality.

In the sparse representation, the polynomial is stored as a sequence of exponent-coefficient pairs . Natural numbers are represented by their sequences of binary digits. The bit-size of an exponent is , where . We let be the bit-size of .

In this and the next section, we are interested in the multiplication of two sparse polynomials . We assume given a finite set , such that . We will write for its size and for its bit-size. We also denote and . For each , we introduce , which sharply bounds the partial degree in for each monomial in . We denote .

### 2.2.Evaluation, interpolation and transposition

Given pairwise distinct points and , let be the linear map which sends to , with . In the canonical basis, this map corresponds to left multiplication by the generalized Vandermonde matrix

The computation of and its inverse (if ) correspond to the problems of multi-point evaluation and interpolation of a polynomial. Using binary splitting, it is classical [MB72, Str73, BM74] that both problems can be solved in time . Notice that the algorithms only require vectorial operations in (additions, subtractions and multiplications with elements in ).

Our main algorithm relies on the efficient computations of the transpositions of and . The map corresponds to left multiplication by

By the transposition principle [Bor56, Ber], the operations and can again be computed in time .

There is an efficient direct approach for the computation of [BLS03]. Given a vector with entries , the entries of are identical to the first coefficients of the power series

The numerator and denominator of this rational function can be computed using binary splitting. If , then this requires vectorial operations in [GG02, Theorem 10.10]. The truncated division of the numerator and denominator at order requires vectorial operations in . If , then we cut the sum into parts of size , and obtain the complexity bound .

Inversely, assume that we wish to recover from , when . For simplicity, we assume that the are non-zero (this will be the case in the sequel). Setting , and , we notice that for all . Hence, the computation of the reduces to two multi-point evaluations of and at and divisions. This amounts to a total of vectorial operations in and divisions in .

### 2.3.General multiplication algorithm

Let be a new variable. We introduce the vector spaces

Given , let . The Kronecker isomorphism , is the unique -linear map with

It corresponds to the evaluation at , so that .

Assume now that we are given an element of multiplicative order at least and consider the following evaluation map

We propose to compute though the equality .

Given , let be the matrix of restricted to the space of polynomials with support included in . Setting , we have

Taking resp. , this allows us to compute and using our algorithm for transposed multi-point evaluation from section 2.2. We obtain using one Hadamard product . Taking , the points are pairwise distinct, since the are smaller than the order of . Hence is invertible and we recover from using transposed multi-point interpolation.

Theorem 1. Given two polynomials and in and an element of order at least d, then the product can be computed using products in , inversions in , products in , and vectorial operations in .

Proof. By classical binary powering, the computation of the sequence takes operations in because each does appear in the entries of . Then the computation of all the for (resp. and ) requires (resp. and ) products in . Using the complexity results from section 2.2, we may compute and using vectorial operations in . We deduce using more multiplications in . Again using the results from section 2.2, we retrieve the coefficients after further vectorial operations in and divisions in . Adding up the various contributions, we obtain the theorem.

Remark 2. Similar to FFT multiplication, our algorithm falls into the general category of multiplication algorithms by evaluation and interpolation. This makes it possible to work in the so-called “transformed model” for several other operations besides multiplication.

## 3.Various coefficient rings

### 3.1.Finite fields

If is the finite field with elements, then its multiplicative group is cyclic of order . Whenever , it follows that the main theorem 1 applies for any primitive element of this group.

Usually, is given as the quotient for some monic and irreducible polynomial of degree . In that case, a multiplication in amounts to ring operations in . An inversion in requires an extended gcd computation in and gives rise to operations in . Using Kronecker multiplication, we can also take . Using these estimates, Theorem 1 implies:

Corollary 3. Assume . Given two polynomials and in , the product can be computed using

ring operations in and inversions in .

Applying the general purpose algorithm from [CK91], two polynomials of degree over can be multiplied in time . Alternatively, we may lift the multiplicants to polynomials in , use Kronecker multiplication and reduce modulo . As long as , this yields the better complexity bound . Theorem 1 therefore implies:

Corollary 4. Assume and . Given two polynomials and in , the product can be computed in time

Remark 5. If then it is always possible to build an algebraic extension of suitable degree in order to apply the corollary. Such constructions are classical, see for instance [GG02, Chapter 14]. We need to have , so should be taken of the order , which also corresponds to the additional overhead induced by this method.

Remark 6. Under the generalized Riemann hypothesis, a primitive element in can be constructed in polynomial time [BS91]. If is odd, then is a primive element if and only if . In , the smallest such that is a primitive element satisfies .

### 3.2.Integer coefficients

#### 3.2.1.Big prime algorithm

One approach for the multiplication of polynomials with integer coefficients is to reduce the problem modulo a suitable prime number . This prime number should be sufficiently large such that can be read off from and such that admits elements of order .

Let denote the maximal bit-length of the coefficients of and similarly for and . Since

we have

It therefore suffices to take . Corollary 4 now implies:

Corollary 7. Given and a prime number , we can compute in time

Remark 8. Let denote the -th prime number. The prime number theorem implies that . Cramér's conjecture [Cra36] states that

This conjecture is supported by numerical evidence. Setting

the conjecture implies that the smallest prime number with satisfies . Using a polynomial time primality test [AKS04], it follows that this number can be computed by brute force in time . In addition, in order to satisfy the complexity bound it suffices to tabulates prime numbers of sizes 2, 4, 8, 16, etc.

#### 3.2.2.Chinese remaindering

In our algorithm and Theorem 1, we regard the computation of a prime number as a precomputation. This is reasonable if is not too large. Now the quantity usually remains reasonably small. Hence, our assumption that is not too large only gets violated if becomes large. In that case, we will rather use Chinese remaindering. We first compute prime numbers with

Each contains a primitive root of unity of order . We next proceed as before, with and such that for each . Indeed, the fact that each is invertible implies that is invertible.

We will say that form a reduced sequence of prime moduli with order and capacity , whenever , , and . We then have the following refinement of Corollary 7:

Corollary 9. Given and a reduced sequence of prime moduli with order and capacity , we can compute in time

### 3.3.Floating point coefficients

An important kind of sparse polynomials are power series in several variables, truncated by total degree. Such series are often used in long term integration of dynamical systems [MB96, MB04], in which case their coefficients are floating point numbers rather than integers. Assume therefore that and are polynomials with floating coefficients with a precision of bits.

Let be the maximal exponent of the coefficients of . For a so called discrepancy , fixed by the user, we let be the integer polynomial with

for all . We have and

for the sup-norm on the coefficients. If all coefficients of have a similar order of magnitude, in the sense that the minimal exponent of the coefficients is at least , then we actually have . Applying a similar decomposition to , we may compute the product

using the algorithm from section 2 and convert the resulting coefficients back into floating point format.

Usually, the coefficients of a univariate power series are approximately in a geometric progression . In that case, the coefficients of the power series with are approximately of the same order of magnitude. In the multivariate case, the coefficients still have a geometric increase on diagonals , but the parameter depends on the diagonal. After a suitable change of variables , the coefficients in a big zone near the main diagonal become of approximately the same order of magnitude. However, the discrepancy usually needs to be chosen proportional to the total truncation degree in order to ensure sufficient accuracy elsewhere.

### 3.4.Rational coefficients

Let us now consider the case when . Let and denote the least common multiples of the denominators of the coefficients of resp. . One obvious way to compute is to set , , and compute using one of the methods from section 3.2. This approach works well in many cases (e.g. when and are truncations of exponential generating series). Unfortunately, this approach is deemed to be very slow if the size of or is much larger than the size of any of the coefficients of .

An alternative, more heuristic approach is the following. Let be an increasing sequence of prime numbers with and such that each is relatively prime to the denominators of each of the coefficients of and . For each , we may then multiply and using the algorithm from section 2. For , we may recover using Chinese remaindering and attempt to reconstruct from using rational number reconstruction [GG02, Chapter 5]. If this yields the same result for a given and , then the reconstructed is likely to be correct at those stages.

Of course, if we have an a priori bound on the bit sizes of the coefficients of , then we may directly take a sufficient number of primes such that can be reconstructed from its reduction modulo .

### 3.5.Algebraic coefficients

Let be an algebraic number field. For some algebraic integer , we may write . Let be the monic polynomial of minimal degree with . Given a prime number , the polynomial induces an algebraic extension of , where . Reduction modulo of a sparse polynomial then yields a sparse polynomial . We have seen in section 3.1 how to multiply sparse polynomials over the finite field . Choosing one or more sufficiently large prime numbers , we may thus apply the same approaches as in section 3.2 in order to multiply sparse polynomials over . Using the techniques from section 3.4, we next deal with the case of sparse polynomials over .

## 4.Fast products of power series

### 4.1.Total degree

Given , let . The total degree of a polynomial is defined by

Given a subset , we define the restriction of to by

For , we define initial segments of by

Then

is the set of polynomials of total degree . Given , the aim of this section is to describe efficient algorithms for the computation of . We will follow and extend the strategy described in [LS03].

Remark 10. The results of this section generalize [vdH02a] to so called pondered total degrees with , but for the sake of simplicity, we will stick to ordinary total degrees.

### 4.2.Projective coordinates

Given a polynomial , we define its projective transform by

If , then , where

Inversely, for any , there exists a unique with . The transformation is an injective morphism of -algebras. Consequently, given , we will compute the truncated product using

Given a polynomial and , let

If , then , with

### 4.3.Multiplication by evaluation and interpolation

Let be an element of of sufficiently high order . Taking as above, the construction in section 2.3 yields a -linear and invertible evaluation mapping

such that for all with , we have

 (1)

This map extends to using

Given and , the relation (1) yields

In particular, if , then

Since is invertible, this yields an efficient way to compute .

### 4.4.Complexity analysis

The number of coefficients of a truncated series in is given by

The size of is smaller by a factor between and :

Theorem 11. Given and an element of order at least , we can compute using inversions in , general operations in , and vectorial operations in .

Proof. The transforms and require a negligible amount of time. The computation of the evaluation points only involves products in , when exploiting the fact that is an initial segment. The computation of and requires vectorial operations in . The computation of can be done using general operations in . Recovering again requires vectorial operations in , as well as divisions in .

Corollary 12. Given , where is a prime number with , we can compute in time .

In the case when , the assumption , with , guarantees that the coefficients of the result can be reconstructed from their reductions modulo . Combining this observation with Chinese remaindering, we obtain:

Corollary 13. Given and a reduced sequence of prime moduli with order and capacity , we can compute in time

## 5.Implementation and timings

We have implemented the fast series product of the previous section within the C++ library multimix of Mathemagix [vdH+02b]. We report on timings for when , with , on a 2.4 GHz Intel(R) Core(TM)2 Duo platform. Recall that is the number of the variables and the truncation order. Timings are given in milliseconds. The line naive corresponds to the naive multiplication, that performs all the two by two monomial products, while the line fast stands for the algorithm of the previous section. We have added the size of the support of the series, together with the cost of our univariate multiplication in size for comparison. An empty cell corresponds to a computation that needed more than 10 minutes. The following tables demonstrate that the new fast algorithms are relevant to practice and that the theoretical soflty linear asymptotic cost can really be observed.

 12 22 42 82 162 naive 0.4 3.6 41 578 10119 fast 3 12 66 322 1509 78 253 903 3403 13203 0.1 0.4 1.6 7.4 28

Table 1. Series product with 2 variables

 12 22 42 naive 50 4630 fast 172 2345 45702 1365 12650 148995 3 28 902

Table 2. Series product with 4 variables

 7 12 17 22 27 naive 25 5346 fast 159 3775 30773 168873 612310 924 12376 74613 296010 906192 1.6 28 243 886 2887

Table 3. Series product with 6 variables

## Bibliography

[AKS04] M. Agrawal, N. Kayal, and N. Saxena. Primes is in p. Annals of Mathematics, 160(2):781–793, 2004.

[Ber] D. Bernstein. The transposition principle. Available from

http://cr.yp.to/transposition.html.

[BLS03] A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In Proceedings of ISSAC 2003, pages 37–44. ACM, 2003.

[BM74] A. Borodin and R.T. Moenck. Fast modular transforms. Journal of Computer and System Sciences, 8:366–386, 1974.

[Bor56] J. L. Bordewijk. Inter-reciprocity applied to electrical networks. Applied Scientific Research B: Electrophysics, Acoustics, Optics, Mathematical Methods, 6:1–74, 1956.

[BS91] Johannes Buchmann and Victor Shoup. Constructing nonresidues in finite fields and the extended riemann hypothesis. In STOC '91: Proceedings of the twenty-third annual ACM symposium on Theory of computing, pages 72–79, New York, NY, USA, 1991. ACM.

[CK91] D.G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.

[CKL89] J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proc. ISSAC '89, pages 121–128, Portland, Oregon, A.C.M., New York, 1989. ACM Press.

[Cra36] H. Cramér. On the order of magnitude of the difference between consecutive prime numbers. Acta Arithmetica, 2:23–46, 1936.

[Für07] M. Fürer. Faster integer multiplication. In Proceedings of the Thirty-Ninth ACM Symposium on Theory of Computing (STOC 2007), pages 57–66, San Diego, California, 2007.

[GG02] J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, 2-nd edition, 2002.

[LS03] G. Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. SADIO Electronic Journal on Informatics and Operations Research, 5(1):1–10, September 2003.

[MB72] R.T. Moenck and A. Borodin. Fast modular transforms via division. In Thirteenth annual IEEE symposium on switching and automata theory, pages 90–96, Univ. Maryland, College Park, Md., 1972.

[MB96] K. Makino and M. Berz. Remainder differential algebras and their applications. In M. Berz, C. Bischof, G. Corliss, and A. Griewank, editors, Computational differentiation: techniques, applications and tools, pages 63–74, SIAM, Philadelphia, 1996.

[MB04] K. Makino and M. Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004.

[Sch77] A. Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. Acta Informatica, 7:395–398, 1977.

[SS71] A. Schönhage and V. Strassen. Schnelle Multiplikation grosser Zahlen. Computing, 7:281–292, 1971.

[Str73] V. Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. Numer. Math., 20:238–251, 1973.

[vdH02a] J. van der Hoeven. Relax, but don't be too lazy. JSC, 34:479–542, 2002.

[vdH+02b] J. van der Hoeven et al. Mathemagix, 2002. http://www.mathemagix.org.