> <\body> <\hide-preamble> \; <\doc-data|sparse polynomial multiplication>|||<\author-address> Laboratoire de Mathématiques UMR 8628 CNRS Université Paris-Sud 91405 Orsay Cedex France ||>||<\author-address> Laboratoire de Mathématiques UMR 8100 CNRS Université de Versailles 45, avenue des États-Unis 78035 Versailles Cedex France ||>> \; ||>> <\abstract> In this paper, we present fast algorithms for the product of two multivariate polynomials in sparse representation. The bit complexity of our algorithms are studied in detail for various types of coefficients, and we derive new complexity results for the power series multiplication in many variables. Our algorithms are implemented and freely available within the software. We show that their theoretical costs are well-reflected in practice. It is classical that the product of two integers, or two univariate polynomials over any ring, can be performed in for the usual dense representations. More precisely, two integers of bit-size at most can be multiplied in time (n)=n*(log n)> on a Turing machine, and the product of two univariate polynomials can be done with (n)> arithmetic operations in the coefficient ring. Multivariate polynomials often admit many zero coefficients, so a is usually preferred over a dense one: each monomial is a pair made of a coefficient and an exponent written in the dense binary representation. A natural problem is whether the product of two sparse multivariate polynomials in ,\,z> can be computed in softly linear time. We will assume to be given a subset > of size that contains the support of . We also let ,\,d\\> be the minimal numbers with {0,\,d-1}>. For coefficient fields of characteristic zero, it is proved in that can be computed using (s)> operations over the ground field. This algorithm uses fast evaluation and interpolation at suitable points built from prime numbers. Unfortunately, the method hides an expensive growth of intermediate integers involved in the linear transformations, which prevents the algorithm from being softly linear in terms of bit-complexity. In this paper, we turn our attention to various concrete coefficient rings for which it makes sense to study the problem in terms of bit-complexity. For these rings, we will present multiplication algorithms which admit softly linear bit-complexities, under the mild assumption that >. Our approach is similar to, but relies on a different kind of evaluation points. Furthermore, finite fields form an important special case for our method. Let us briefly outline the structure of this paper. In section, we start with a presentation of the main algorithm over an arbitrary effective algebra > with elements of sufficiently high order. In section, we treat various specific coefficient rings. In section we give an application to multivariate power series multiplication. In the last section, we report on timings with our implementation. Let > be an effective algebra over an effective field >, all algebra and field operations can be performed by algorithm. We will denote by (n)=>(n)> the cost for multiplying two univariate polynomials of degree over > in terms of the number of arithmetic operations in>. Similarly, we denote by (n)> the time needed to multiply two integers of bit-size at most . One can take(n)=O(n*log n*log log n)> and (n)=O(n*log n*2> n>)>, where >> represents the iterated logarithm of . Throughout the paper, we will assume that (n)/n> and (n)/n> are increasing. We also assume that (O(n))=O((n))> and (O(n))=O((n))>. Given a multivariate polynomial \[z]=\[z,\,z]> and an index \>, we denote =z>*\*z>> and let > be the coefficient of > in . The of is defined by \:P\0}> and we denote by =\|supp P\|> its cardinality. In the sparse representation, the polynomial is stored as a sequence of exponent-coefficient pairs)>\\\\>. Natural numbers are represented by their sequences of binary digits. The of an exponent \> is =l>>, where i>=\log (i+1)\>. We let =supp P>l> be the bit-size of . In this and the next section, we are interested in the multiplication of two sparse polynomials \[z]>. We assume given a finite set \\>, such that \supp R>. We will write \|> for its size and >=>l> for its bit-size. We also denote =s+s+s> and =e+e+e>. For each , we introduce =min{k\\:\i \,i=k,z\}+1>, which sharply bounds the partial degree in > for each monomial in >. We denote *\*d\0>. Given pairwise distinct points ,\,\\\>> and \>, let \\> be the linear map which sends ,\,a)> to ),\,A(\))>, with *u+\+a>. In the canonical basis, this map corresponds to left multiplication by the generalized Vandermonde matrix <\equation*> V,\,\>=|>|>|>>||>|>|>>|>|>||>>||>|>|>>>>>. The computation of and its inverse 1>> (if ) correspond to the problems of multi-point evaluation and interpolation of a polynomial. Using binary splitting, it is classical that both problems can be solved in time t/s\*(s)*log s)>. Notice that the algorithms only require vectorial operations in > (additions, subtractions and multiplications with elements in >). Our main algorithm relies on the efficient computations of the transpositions >,(E1>)>:(\)>\(\)>> of and 1>>. The map >> corresponds to left multiplication by <\equation*> V,\,\>>=||>|>|>|>|>|>>|>|>||>>|>|>|>|>>>>>. By the transposition principle, the operations >> and )>> can again be computed in time t/s\*(s)*log s)>. There is an efficient direct approach for the computation of >>. Given a vector (\)>> with entries ,\,a>, the entries ,\,b> of >(a)> are identical to the first coefficients of the power series <\equation*> t>|1-\*u>. The numerator and denominator of this rational function can be computed using binary splitting. If s>, then this requires (s)*log s)> vectorial operations in>. The truncated division of the numerator and denominator at order requires (s))> vectorial operations in >. If s>, then we cut the sum into t/s\> parts of size > s>, and obtain the complexity bound t/s\*(s)*log s)>. Inversely, assume that we wish to recover ,\,a> from ,\,b>, when . For simplicity, we assume that the > are non-zero (this will be the case in the sequel). Setting *u+\+b>, *u)*\*(1-\*u)> and , we notice that 1>)=\a*(u*D)(\1>)> for all . Hence, the computation of the > reduces to two multi-point evaluations of and > at 1>,\,\1>> and divisions. This amounts to a total of (s)*log s)> vectorial operations in > and divisions in >. Let be a new variable. We introduce the vector spaces <\eqnarray*> >||\[z]:\j,deg> A\d}>>|>||\[u]:deg B\d}>>>> Given \>, let (i)=>+i*d+\+i*d*\*d>. The :\>, is the unique >-linear map with <\equation*> \(z>*\*z>)=u(i)>,\d,\,i\d>>. It corresponds to the evaluation at >,\,u*\*d>)>, so that (R)=\(P)*\(Q)>. Assume now that we are given an element \> of multiplicative order at least and consider the following evaluation map <\eqnarray*> :>[z]>|>|>>>>||>|(A)(1),\(A)(\),\,\(A)(\)).>>>> We propose to compute though the equality (R)=\(P)*\(Q)>. Given ,\,i}>\{0,\,d-1}>, let ,\>> be the matrix of > restricted to the space of polynomials with support included in >. Setting =\(i)>, we have <\equation*> V,\>\V>,\,\>>>=||>|>|>>|>>|>|>>>|>|>||>>|>>|>>|>|>>>>>>*. Taking =supp P> =supp Q>, this allows us to compute (P)> and (Q)> using our algorithm for transposed multi-point evaluation from section. We obtain (R)> using one Hadamard product (P)*\(Q)>. Taking =>, the points >,\,\>>> are pairwise distinct, since the > are smaller than the order of >. Hence ,\>> is invertible and we recover from (R)> using transposed multi-point interpolation. <\theorem> Given two polynomials and in [z,\,z]> and an element \> of order at least d, then the product can be computed using )> products in >, inversions in >, products in >, and |s>*(s)*log s> vectorial operations in >. <\proof> By classical binary powering, the computation of the sequence >,\,\\d>> takes operations in > because each -1> does appear in the entries of >. Then the computation of all the (i)>> for supp P> (resp. and >) requires )> (resp. )> and ) products in >. Using the complexity results from section, we may compute (P)> and (Q)> using s/s\+\s/s\)*(s)*log s)> vectorial operations in>. We deduce (R)> using more multiplications in >. Again using the results from section, we retrieve the coefficients > after (s)*log s)> further vectorial operations in> and divisions in >. Adding up the various contributions, we obtain the theorem. <\remark> Similar to FFT multiplication, our algorithm falls into the general category of multiplication algorithms by evaluation and interpolation. This makes it possible to work in the so-called ``transformed model'' for several other operations besides multiplication. If > is the finite field >> with > elements, then its multiplicative group is cyclic of order -1>. Whenever -1\d>, it follows that the main theorem applies for any primitive element > of this group. Usually, >> is given as the quotient [u]/G(u)> for some monic and irreducible polynomial of degree . In that case, a multiplication in > amounts to >(k))> ring operations in >. An inversion in > requires an extended gcd computation in [u]> and gives rise to >(k)*log k)> operations in >. Using Kronecker multiplication, we can also take >(n)=O(>(n*k))>. Using these estimates, Theorem implies: <\corollary> Assume -1\d>. Given two polynomials and in >>[z,\,z]>, the product can be computed using <\equation*> O|s>>*>(s*k)*log s+(s*log k+\)*>(k) ring operations in > and inversions in >. Applying the general purpose algorithm from, two polynomials of degree over> can be multiplied in time (log p)*n*log n*log log n)>. Alternatively, we may lift the multiplicants to polynomials in [u]>, use Kronecker multiplication and reduce modulo. As long as O(log p)>, this yields the better complexity bound (n*log p))>. Theorem therefore implies: <\corollary> Assume -1\d> and O(log p)>. Given two polynomials and in >>[z,\,z]>, the product can be computed in time <\equation*> O|s>>*(s*k*log p)*log s+(s*log k+\)*(k*log p)+s (log p) log log p. <\remark> If -1\d> then it is always possible to build an algebraic extension of suitable degree in order to apply the corollary. Such constructions are classical, see for instance . We need to have -1\d>, so should be taken of the order > d>, which also corresponds to the additional overhead induced by this method. <\remark> Under the generalized Riemann hypothesis, a primitive element in >> can be constructed in polynomial time. If is odd, then > is a primive element if and only if -1)/2>=\1>. In >, the smallest \\> such that mod p> is a primitive element satisfies =O((log p))>. One approach for the multiplication of polynomials with integer coefficients is to reduce the problem modulo a suitable prime number . This prime number should be sufficiently large such that can be read off from and such that > admits elements of order > d>. Let =max l\|>> denote the maximal bit-length of the coefficients of and similarly for and . Since <\equation*> max\|R\|\min(s,s)*max\|P\|*max\|Q\|, we have <\equation*> l\l\l+l+log min(s,s). It therefore suffices to take max(2,d)>. Corollary now implies: <\corollary> Given [z,\,z]> and a prime number max(2,d)>, we can compute in time <\equation*> O|s>>*(s*log p)*log s+\*(log p)+s (log p) log log p. <\remark> Let > denote the -th prime number. The prime number theorem implies that \n*log n>. Cramér's conjecture states that <\equation*> limsup\>-p|(log p)>=1. This conjecture is supported by numerical evidence. Setting <\equation*> N=max(2,d), the conjecture implies that the smallest prime number with N> satisfies N)>. Using a polynomial time primality test, it follows that this number can be computed by brute force in time >. In addition, in order to satisfy the complexity bound it suffices to tabulates prime numbers of sizes 2, 4, 8, 16, etc. In our algorithm and Theorem, we regard the computation of a prime number N=,d)>> as a precomputation. This is reasonable if is not too large. Now the quantity usually remains reasonably small. Hence, our assumption that is not too large only gets violated if +l> becomes large. In that case, we will rather use Chinese remaindering. We first compute prime numbers \\\p> with <\eqnarray*> >|>|>|*\*p>|>|.>>>> Each >> contains a primitive root of unity > of order > d>. We next proceed as before, with *\*p> and \\=\/p*\> such that mod p=\> for each . Indeed, the fact that each ,\> mod p=V,\>> is invertible implies that ,\>> is invertible. We will say that \\\p> form a reduced sequence of prime moduli with order and capacity , whenever \d>, *\*p\N>, *\*p\N> and =O(log d)>. We then have the following refinement of Corollary: <\corollary> Given [z,\,z]> and a reduced sequence \\\p> of prime moduli with order and capacity >, we can compute in time <\equation*> O\*l*log s*(s*log d)|s*log d>+\*l*(log d)|log d>+\*(l)*log l. An important kind of sparse polynomials are power series in several variables, truncated by total degree. Such series are often used in long term integration of dynamical systems, in which case their coefficients are floating point numbers rather than integers. Assume therefore that and are polynomials with floating coefficients with a precision of > bits. Let > be the maximal exponent of the coefficients of . For a so called \\>, fixed by the user, we let > be the integer polynomial with <\equation*> =\P*2+\-e>\ for all . We have >\\+\> and <\equation*> \|P-*2-\-\>\|\2-\-\-1> for the sup-norm on the coefficients. If all coefficients of have a similar order of magnitude, in the sense that the minimal exponent of the coefficients is at least -\>, then we actually have *2-\-\>>. Applying a similar decomposition to , we may compute the product <\equation*> P*Q=**2+e-2*\-\-\> using the algorithm from section and convert the resulting coefficients back into floating point format. Usually, the coefficients > of a univariate power series are approximately in a geometric progression \\*i>. In that case, the coefficients of the power series *z)> with =\\>> are approximately of the same order of magnitude. In the multivariate case, the coefficients still have a geometric increase on diagonals k*i\,\,\k*i\>\\,\,k>*i>, but the parameter ,\,k>> depends on the diagonal. After a suitable change of variables \\*z>, the coefficients in a big zone near the main diagonal become of approximately the same order of magnitude. However, the discrepancy usually needs to be chosen proportional to the total truncation degree in order to ensure sufficient accuracy elsewhere. Let us now consider the case when >>. Let > and > denote the least common multiples of the denominators of the coefficients of . One obvious way to compute is to set \P*q>, \Q*q>, and compute *> using one of the methods from section. This approach works well in many cases ( when and are truncations of exponential generating series). Unfortunately, this approach is deemed to be very slow if the size of > or > is much larger than the size of any of the coefficients of . An alternative, more heuristic approach is the following. Let \p\\> be an increasing sequence of prime numbers with \d> and such that each > is relatively prime to the denominators of each of the coefficients of and . For each , we may then multiply > and > using the algorithm from section. For >, we may recover *\*p> using Chinese remaindering and attempt to reconstruct from *\*p> using rational number reconstruction5>. If this yields the same result for a given and , then the reconstructed is likely to be correct at those stages. Of course, if we have an bound on the bit sizes of the coefficients of , then we may directly take a sufficient number of primes \\\p> such that can be reconstructed from its reduction modulo *\*p>. Let > be an algebraic number field. For some algebraic integer >, we may write =\[\]>. Let \[u]> be the monic polynomial of minimal degree with )=0>. Given a prime number , the polynomial induces an algebraic extension [|^>]> of>, where |^>)=0>. Reduction modulo of a sparse polynomial \[\][z]> then yields a sparse polynomial \[|^>][z]>. We have seen in section how to multiply sparse polynomials over the finite field [|^>]>. Choosing one or more sufficiently large prime numbers , we may thus apply the same approaches as in section in order to multiply sparse polynomials over [\]>. Using the techniques from section, we next deal with the case of sparse polynomials over =\[\]>. Given \>, let +\+i>. The of apolynomial \[z]> is defined by <\equation*> deg P=max{\|i\|:P\0}. Given a subset \>, we define the > of to by <\equation*> P=I>P*z. For \>, we define initial segments > of > by <\eqnarray*> >||\:\|i\|\d}>>>> Then <\equation*> \[z]>={P\\[z]:supp P\I}={P>:P\\[z]} is the set of polynomials of total degree > d>. Given \[z]>>, the aim of this section is to describe efficient algorithms for the computation of >>. We will follow and extend the strategy described in. <\remark> The results of this section generalize to so called pondered total degrees *i+\+\*i> with ,\,\\0>, but for the sake of simplicity, we will stick to ordinary total degrees. Given a polynomial \[z]>, we define its (P)\\[z]> by <\equation*> \(P)(z,\,z)=P(z*z,\,z*z,z). If I>, then (P)\J>, where <\eqnarray*> >||+i,\,i+i,i):i\I}.>>>> Inversely, for any \[z]>>, there exists a unique 1>(P)\\[z]>> with (\1>(P))>. The transformation > is an injective morphism of >-algebras. Consequently, given \\[z]>>, we will compute the truncated product >> using <\equation*> (P*Q)>=\1>((\(P)*\(Q))>). Given a polynomial \[z]> and \>, let <\equation*> P=,\,i>P,\,i,j>*z>*\*z>*z\\[z,\,z]. If J>, then \>, with <\eqnarray*> >||\:i+\+i\d}.>>>> Let > be an element of > of sufficiently high order > d>. Taking > as above, the construction in section yields a >-linear and invertible evaluation mapping <\equation*> \:\[z]>\\>, such that for all \[z]>> with \[z]>>, we have <\equation> \(P*Q)=\(P)*\(Q). This map extends to [z]>[z]> using <\eqnarray*> (P+\+P*z)>||(P)+\+\(P)*z\\>[z].>>>> Given \[z]>> and d>, the relation() yields <\equation*> \((P*Q))=\(P)*\(Q)+\+\(P)*\(Q). In particular, if >>, then <\equation*> \(R)=\(P)*\(Q) mod z. Since > is invertible, this yields an efficient way to compute . The number of coefficients of a truncated series in [z]>> is given by <\eqnarray*> |\|>||>>>> The size \|> of > is smaller by a factor between and : <\eqnarray*> ||=*\|I\|.>>>> <\theorem> Given \[z]>> and an element \\> of order at least >, we can compute >> using inversions in >, (d))> general operations in >, and (s)*log s)> vectorial operations in >. <\proof> The transforms > and 1>> require a negligible amount of time. The computation of the evaluation points >> only involves products in>, when exploiting the fact that> is an initial segment. The computation of (\(P))> and (\(Q))> requires (s)*log s)> vectorial operations in >. The computation of (\(P))*\(\(Q)) mod z> can be done using (d))> general operations in >. Recovering again requires (s)*log s)> vectorial operations in >, as well as divisionsin>. <\corollary> Given \[z]>>, where is a prime number with d>, we can compute >> in time (s*d*log p)*log s+s*d*(log p)*log log p)>. In the case when \[z]>>, the assumption max(2,d)>, with +l+log s>, guarantees that the coefficients of the result can be reconstructed from their reductions modulo. Combining this observation with Chinese remaindering, we obtain: <\corollary> Given \[z]>> and a reduced sequence \\\p> of prime moduli with order > and capacity >, we can compute >> in time <\equation*> O(s*d*n*log d)*|n*log d>*l*log s+(n*log d)|n*log d>*l*s*d*log n*log log d+\|I\|*(l)*log l. We have implemented the fast series product of the previous section within the library of . We report on timings for when =>, with +1>, on a 2.4GHz Intel(R) Core(TM)2 Duo platform. Recall that is the number of the variables and the truncation order. Timings are given in milliseconds. The line corresponds to the naive multiplication, that performs all the two by two monomial products, while the line stands for the algorithm of the previous section. We have added the size \|> of the support of the series, together with the cost (\|>)>> of our univariate multiplication in size \|> for comparison. An empty cell corresponds to a computation that needed more than 10 minutes. The following tables demonstrate that the new fast algorithms are relevant to practice and that the theoretical asymptotic cost can really be observed. >|||||>|>>|||||>|>>|||||>|\|>>>>|||||>|(\|>)>>>|||||>>>>|Series product with 2 variables> >|||>|>>|||>|>>|||>|\|>>>>|||>|(\|>)>>>|||>>>>|Series product with 4 variables> >|||||>|>>|||||>|>>|||||>|\|>>>>|||||>|(\|>)>>>|||||>>>>|Series product with 6 variables> <\bibliography|bib|alpha|all> <\bib-list|vdH+02b> M.Agrawal, N.Kayal, and N.Saxena. Primes is in p. , 160(2):781--793, 2004. D.Bernstein. The transposition principle. Available from. A.Bostan, G.Lecerf, and É.Schost. Tellegen's principle into practice. In , pages 37--44. ACM, 2003. A.Borodin and R.T. Moenck. Fast modular transforms. , 8:366--386, 1974. J.L. Bordewijk. Inter-reciprocity applied to electrical networks. , 6:1--74, 1956. Johannes Buchmann and Victor Shoup. Constructing nonresidues in finite fields and the extended riemann hypothesis. In , pages 72--79, New York, NY, USA, 1991. ACM. D.G. Cantor and E.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. J.Canny, E.Kaltofen, and Y.Lakshman. Solving systems of non-linear polynomial equations faster. In , pages 121--128, Portland, Oregon, A.C.M., New York, 1989. ACM Press. H.Cramér. On the order of magnitude of the difference between consecutive prime numbers. , 2:23--46, 1936. M.Fürer. Faster integer multiplication. In , pages 57--66, San Diego, California, 2007. J.vonzur Gathen and J.Gerhard. . Cambridge University Press, 2-nd edition, 2002. G.Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. , 5(1):1--10, September 2003. R.T. Moenck and A.Borodin. Fast modular transforms via division. In , pages 90--96, Univ. Maryland, College Park, Md., 1972. K.Makino and M.Berz. Remainder differential algebras and their applications. In M.Berz, C.Bischof, G.Corliss, and A.Griewank, editors, , pages 63--74, SIAM, Philadelphia, 1996. K.Makino and M.Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004. A.Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. , 7:395--398, 1977. A.Schönhage and V.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. V.Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. , 20:238--251, 1973. J.vander Hoeven. Relax, but don't be too lazy. , 34:479--542, 2002. J.vander Hoeven etal. Mathemagix, 2002. . <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> SS71 Sch77 CK91 Fur07 CKL89 CKL89 vdH:mmx CK91 Fur07 MB72 Str73 BM74 Bor56 Bern:transp BoLeSc:2003:tellegen GaGe02 CK91 GaGe02 BS91 Cra36 AKS04 MB96 MB04 GaGe02 LeSc03 vdH:relax vdH:mmx <\associate|table> > > > <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |math-font-series||font-shape||2.Sparse polynomial multiplication> |.>>>>|> |2.1.General setting |.>>>>|> > |2.2.Evaluation, interpolation and transposition |.>>>>|> > |2.3.General multiplication algorithm |.>>>>|> > |math-font-series||font-shape||3.Various coefficient rings> |.>>>>|> |3.1.Finite fields |.>>>>|> > |3.2.Integer coefficients |.>>>>|> > |3.2.1.Big prime algorithm |.>>>>|> > |3.2.2.Chinese remaindering |.>>>>|> > |3.3.Floating point coefficients |.>>>>|> > |3.4.Rational coefficients |.>>>>|> > |3.5.Algebraic coefficients |.>>>>|> > |math-font-series||font-shape||4.Fast products of power series> |.>>>>|> |4.1.Total degree |.>>>>|> > |4.2.Projective coordinates |.>>>>|> > |4.3.Multiplication by evaluation and interpolation |.>>>>|> > |4.4.Complexity analysis |.>>>>|> > |math-font-series||font-shape||5.Implementation and timings> |.>>>>|> |math-font-series||font-shape||Bibliography> |.>>>>|>