> <\body> <\doc-data|polynomial and series multiplication>|||<\author-address> Laboratoire d'Informatique UMR 7161 CNRS École polytechnique 91128 Palaiseau Cedex France ||>|<\doc-note> This work has been partly supported by the French project, and by the grant of the Région Ile-de-France. ||<\author-address> Laboratoire de Mathématiques UMR 8100 CNRS Université de Versailles 45, avenue des États-Unis 78035 Versailles Cedex France ||>> \; ||>> <\abstract> In this paper we present various algorithms for multiplying multivariate polynomials and series. All algorithms have been implemented in the C++ libraries of the system. We describe naive and softly optimal variants for various types of coefficients and supports and compare their relative performances. For the first time we are able to observe the benefit of non naive arithmetic for multivariate polynomials and power series, which might lead to speed-ups in several areas of symbolic and numeric computation. For the sparse representation, we present new softly linear algorithms for the product whenever the destination support is known, together with a detailed bit-complexity analysis for the usual coefficient types. As an application, we are able to count the number of the absolutely irreducible factors of a multivariate polynomial with a cost that is essentially quadratic in the number of the integral points in the convex hull of the support of the given polynomial. We report on examples that were previously out of reach. It is classical that the product of two integers, or two univariate polynomials over any ring, can be performed in for the usual dense representations. More precisely, two integers of bit-size at most can be multiplied in time (n)=n*(log n)> on a Turing machine, and the product of two univariate polynomials can be done with (n)> arithmetic operations in the coefficient ring. These algorithms are widely implemented in computer algebra systems and turn out to perform well even for problems of medium sizes. Concerning multivariate polynomials and series less effort has been dedicated towards such fast algorithms and implementations. One of the difficulties is that the polynomials and series behave differently according to their support. In this paper we propose several algorithms that cover a wide range of situations. Representations of multivariate polynomials and series with their respective efficiencies have been discussed since the early ages of computer algebra; for historical references we refer the reader to . The representation is an important issue which conditions the performance in an intrinsic way. It is customary to distinguish three main types of representations: dense, sparse, and functional. A is made of a compact description of the support of the polynomial and the sequence of its coefficients. The main example concerns -- it suffices to store the coordinates of two opposite vertices. In adense representation all the coefficients of the considered support are stored, even if they are zero. In fact, if a polynomial has only a few of non-zero terms in its bounding block, we shall prefer to use a which stores only the sequence of the non-zero terms as pairs of monomials and coefficients. Finally, a stores a function that can produce values of the polynomials at any given points. This can be a pure blackbox (which means that its internal structure is not supposed to be known) or a specific data structure such as (see Chapter4 of for instance). For dense representations with block supports, it is classical that the algorithms used for the univariate case can be naturally extended: the naive algorithm, Karatsuba's algorithm, and even fast Fourier transforms can be applied recursively in each variable, with good performance. Another classical approach is the Kronecker substitution which reduces the multivariate product to one variable only. We refer the reader to classical books such as. When the number of the variables is fixed and the partial degrees tend to infinity, these techniques lead to softly linear costs. For sparse representations, the naive school book algorithm, that performs all the pairwise term products, is implemented in all the computer algebra systems and several other dedicated libraries. It is a matter of constant improvements in terms of data structures, memory management, vectorization and parallelization (see for some comparisons between several implementations available in 2003). Multivariate dichotomic approaches have not been discussed much in the literature. Let us for instance consider Karatsuba's algorithm (see8> for details). With one variable only, the usual way the product is performed begins with splitting and into (z)+z*P(z)> and (z)+z*Q(z)>, respectively, where and are half of the degrees of and . Then we compute recursively Q>, Q>, and +P)*(Q+Q)>, and perform suitable linear combinations to recover the result. This approach is efficient in the dense block case because the sizes of the input are correctly divided in each recursive call of the product. In the sparse case this phenomenon hardly occurs, and it is commonly admitted that this approach is useless (see for instance3> or for a cost analysis). Nevertheless block versions have been suggested to be useful with several variables in6.3.3>, and refined in6>, but never tested in practice. Further improvements are under progress in.\ In the sparse case, the product can be decomposed into two subproblems: (1) determine the support of from those of and , (2) compute the coefficients of. These are independent in terms of complexity and applications. The computation of the support is the most expensive part, that can be seen as a special case of an even more difficult problem called . This is a cornerstone in higher level routines such as the greatest common divisor: to compute the of two polynomials in the sparse representation it is interesting to specialize all the variables but one at several points, compute as many univariate , and interpolate the result without a precise idea of the support (see for instance). We are not dealing with this problem in this paper, but most of the literature in this topic hides fast algorithms for the sparse product of polynomials as soon as the destination support is known, as explained in the next paragraphs. For coefficient fields of characteristic zero, it is proved in that the product of two polynomials in sparse representation can be computed in softly linear time in terms of operations over the ground field, once the destination support is known. This algorithm uses fast evaluation and interpolation at suitable points built from prime numbers. Unfortunately, the method hides an expensive growth of intermediate integers involved in the linear transformations, which prevents the algorithm from being softly linear in terms of bit-complexity. Indeed this algorithm was essentially a subroutine of the sparse interpolation algorithm of, with the suitable set of evaluation points borrowed from. For more references on sparse interpolation in characteristic zero we refer the reader to. For coefficients in a finite field, Grigoriev, Karpinski and Singer designed a specific sparse interpolation algorithm in, which was then improved in. These algorithms are based on special point-sets for evaluation and interpolation, built from a primitive element of the multiplicative subgroup of the ground field. As in a fast product might have been be deduced from this work, but to the best of our knowledge this has never been done until now. The main contributions of this paper are practical algorithms for faster computations with multivariate polynomials and series. In Sections and we describe naive algorithms for the dense and sparse representations of polynomials, we recall the Kronecker substitution technique, and discuss bit-complexities with regards to practical performances. We show that our implementations are competitive with the best other software, and we discuss thresholds between sparse and dense representations. Section is devoted to naive algorithms for power series. In Section, we turn to the sparse case. Assuming the destination support to be known, we will focus on the computation of the coefficients. Our approach is similar to, but relies on a different kind of evaluation points, which first appeared in. The fast product from, which only applies in characteristic zero, suffers from the swell of the intermediate integers. In contrast, our method is primarily designed for finite fields. For integer coefficients we either rely on large primes or the multi-modular methods to deduce new bit-complexity bounds. Section is devoted to the bit-complexity for the most frequently encountered coefficient rings. Of course, our assumption that the support of the product is known is very strong: in many cases, it can be feared that the computation of this support is actually the hardest part of the multiplication problem. Nevertheless, the computation of the support is negligible in many cases: <\enumerate> The coefficients of the polynomials are very large: the support can be computed with the naive algorithms, whereas the coefficients are deduced with the fast ones. A variant is when we need to compute the products of many pairs of polynomials with the same supports. The destination support can be deduced by a simple geometric argument. Amajor example concerns dense formal power series, truncated by total degree. In Section we adapt the method of to our new evaluation-interpolation scheme. The series product of applies in characteristic zero only and behaves badly in terms of bit-complexity. Our approach again starts with coefficient fields of positive characteristic and leads to much better bit-costs and useful implementations. When searching for factors of a multivariate polynomial , the destination support is precisely the support of . In Section we present a new algorithm for counting the number of absolutely irreducible factors of . We will prove a new complexity bound in terms of the size of the integral hull of the support of, and report on examples that were previously out of reach. In a future work, we hope to extend our method into a full algorithm for factoring . Our fast product can be expected to admit several other applications, such as polynomial system solving, following, but we have not tried. Most of the algorithms presented in this paper have been implemented in the library of the free computer algebra system (revision 4741, available from >). Up to our knowledge, this is the first implementation of a sparse multivariate multiplication algorithm with a softly linear asymptotic time complexity. In this section we recall several classical algorithms for computations with dense multivariate polynomials, using the so called ``block representation''. We will also analyze the additional bit-complexity due to operations on the exponents. The algorithms of this section do not depend on the type of the coefficients. We let> be an effective ring, which means that all ring operations can be performed by algorithm. We will denote by (n)> the cost for multiplying two univariate polynomials of degree, in terms of the number of arithmetic operations in >. Similarly, we denote by (n)> the time needed to multiply two integers of bit-size at most . One can take(n)=O(n*log n*log log n)> and (n)=O(n*log n*2> n>)>, where >> represents the iterated logarithm of. Throughout the paper, we will assume that (n)/n> and (n)/n> are increasing. We also assume that (O(n))\O((n))> and (O(n))\O((n))>. Any polynomial in [z,\,z]> is made of a sum of terms, with each term composed of a coefficient and an exponent seen as a vector in >. For an exponent ,\,e)>\\>, the monomial >*\*z>> will be written >. For any \>, we let> denote the coefficient of > in . The of is defined by \:P\0}>. A is a subset of > of the form {0,1,\,d-1}>, with ,\,d\\>. Given a polynomial \[z,\,z]>, its is the smallest block of the form <\equation*> dsupp(P)={0,1,\,d-1} with dsupp(P)>. In other words, assuming \0>, we have => P+1>> for ,n>. We will denote by =d*\*d> the cardinality of . In the dense of , we store the > and all the coefficients corresponding to the monomials of >. We order the exponents in the , so that <\eqnarray*> >\x>\x>\x>>|>|j,(e=f\\\e=f\e\f).>>>> In this way, the -th exponent ,\,e)=exponent(i,P)> of is defined by <\equation*> i=e+e*d+e*d*d+\+e*d*d\d. Conversely, we will write and call the of in . Notice that the index has values from to -1>. The coefficient of the exponent of index will be written ) or , according to the context. In the cost analysis of the algorithms below, we shall take into account the number of operations in > but also the bit-cost involved by the arithmetic operations with the exponents. Let and be the two polynomials that we want to multiply. Their product can be computed naively by performing the pairwise products of the terms of and as follows: <\algorithm> >> Set 0> For from to -1> do: <\enumerate-alpha> For from to -1> do: <\enumerate-roman> index(exponent(i,P)+*exponent(j,Q),R)>; Add to ; <\proposition> Assuming the block representation, the product of and can be computed using *d> operations in > plus (log d)+d*d*log d)> bit-operations. <\proof> Before entering Algorithm we compute all the > and discard all the variables > such that =1>. This takes no more that (log d))> bit-operations. Then we compute *d>, *d*d>, >, *d*\*d>, as well as *(d-1)>, *d*(d-1)>, >, *d*\*d*(d-1)>, and *(d-1)>, *d*(d-1)>, >, *d*\*d*(d-1)> in (log d))> bit-operations. For computing efficiently the index at step(i) we make use of the enumeration strategy presented in Lemma below. In fact, for each , we compute incrementally in the outer loop. Then for each we also obtain index(exponent(i,P)*exponent(j,Q),R)> incrementally during the inner loop. The conclusion again follows from Lemma. Notice that for small coefficients (in the field with two elements for instance), the bit-cost caused by the manipulation of the exponents is not negligible. This naive algorithm must thus be programmed carefully to be efficient with many variables in small degree. \ For running efficiently over all the monomials of the source and the destination supports we use the following subroutine: <\algorithm> supp(P)>, supp(Q)>, and the index of in . =exponent(index(f,Q)+1,Q)>, and ,R)>. <\enumerate> Let \f>; Let 1>; For from to do: <\enumerate> if =1> then continue; if \d-1> then return ,\,f,f+1,f,\,f)> \ and \d>; \0>; \d*(d-1)>; Return an error. The algorithm raises an error if, and only if, is the last exponent of . The proof of correctness is straightforward, hence is left to the reader. In fact, we are interested in the total cost spent in the successive calls of this routine to enumerate the indices of all the exponents of in , for any fixed exponent of : <\lemma> Assume that \2> for all , and let be an exponent of P. If *d>, *d*d>,..., *d*\*d> and *(d-1)>, *d*(d-1)>, >, *d*\*d*(d-1)> are given, and if is known, then the indices in of the exponents of > can be enumerated in increasing order with *log>)> bit-operations. <\proof> Let be the number of the > equal to . Each step of the loop of Algorithm takes if =1> or )> bit-operations otherwise. Let >,\,d>>> be the subsequence of )> which are not . When running over all the successive exponents of , this loop takes )> bit-operations for at most > exponents, and )> bit-operations for at most /d>> exponents, and )> bit-operations for at most /(d>*d>)> exponents, etc. Since the >\2> for all , this amounts to )*d)> operations. Since the \2> for all , the conclusion follows from )>. Let us briefly recall the Kronecker substitution. For computing , the Kronecker substitution we need is defined as follows: <\eqnarray*> >: [z,\,z]>|>|[x]>>||>|>,xd>,\,x\d>).>>>> It suffices to compute >(P)> and >(Q)>, perform their product, and recover by <\equation*> R=K>(K>(P)*K>(Q)). <\proposition> Assuming the block representation, the product can be computed using (d)> operations in > plus (log d)+(d*+d)*log>)> bit-operations. <\proof> As for the naive approach we start with computing all the > and we discard all the variables > such that =1>. Then we compute *d>, *d*d>,>, *d*\*d> and *(d-1)>, *d*(d-1)>,>, *d*\*d*(d-1)> and *(d-1)>, *d*(d-1)>,>, *d*\*d*(d-1)>. This takes (log d))> bit-operations. Thanks to Lemma, this allows to deduce >(P)> and >(Q)> with +d)log d)> bit-operations. Thanks to the reverse lexicographic ordering, the inverse >> is for free. <\remark> A similar complexity can be obtained using evaluation-interpolation methods, such as the fast Fourier transform or Schönhage-Strassen's variant. For instance, assuming that > has sufficiently many >-th roots of unity, we have (n)=O(n*log n)> using FFT-multiplication. In the multivariate case, the multiplication of and essentially reduces to /d> fast Fourier transforms of size > with respect to each of the variables >. This amounts to a total number of +\+log d)*d=O(d*log d)> operations in >. Over the integers, namely when =\>, one can further apply the Kronecker substitution to reduce to the multiplication of two large integers. For any integer we write =\log (\|a\|+1)\>> for its , and denote by =max l>> the maximal bit-length of the coefficients of (and similarly for and ). Since <\equation*> max\|R\|\min(d,d)*max\|P\|*max\|Q\|, we have <\equation*> h\h\h+h+l,d)>. The coefficients of thus have bit-length at most . We will be able to recover them (with their signs) from an approximation modulo >. The substitution works as follows: <\eqnarray*> ,h>: [z,\,z]>|>|>>||>|>(P)(2).>>>> One thus computes ,h>(P)> and ,h>(Q)>, does the integer product, and recovers <\equation*> R=K,h>(K,h>(P)*K,h>(Q)). <\corollary> With the above dense representation, the product of times in [z,\,z]> takes (h*d)(log d)+(d*+d)*log>>)>> bit-operations. <\proof> The evaluation at > takes linear time thanks to the binary representation of the integers being used. The result thus follows from the previous proposition. <\remark> In a similar way, we may use Kronecker substitution for the multiplication of polynomials with modular coefficients in =\/p*\>, \ {2,3,\}>>. Indeed, we first map \[z,\z]> to polynomials in ,p-1}[z,\,z]>\[z,\,z]>>, multiply them as integer polynomials, and finally reduce modulo . In this paper we will often illustrate the performances of our implementation for /p>, with 2>. Timings are measured in seconds or milliseconds, using one core of an X5450 at 3.0GHz running and 5.0.0. The symbol > in the timing tables means that the time needed is very high, and not relevant. In Tables and we multiply dense polynomials and with =d=d>, for > We observe that the Kronecker substitution is a very good strategy: it involves less operations on the exponents, and fully benefits from the performances of . <\big-table||||||,d>>|||||>|>>|||||616>>|>>|||||>|>, >>|||600>|000>|600>>|>>|||241>|281>|761>>>>>> Block polynomial product with 2 variables (in milliseconds) <\big-table||||||>, >, >>|||||>|>>||390>|643>|>>|>>>|>>||||208>|983>>|, >>>|000>|000>|000>|000>|096000>>|>>|859>|319>|039>|019679>|461759>>>>>> Block polynomial product with 3 variables (in milliseconds) In this section, we will study the naive multiplication of multivariate polynomials using a sparse representation. Our naive implementation involves an additional dichotomy for increasing its cache efficiency. In this paper, the of a polynomial \[z,\,z]> consists of a sequence of exponent-coefficient pairs)>\\\\>. This sequence is sorted according to the reverse lexicographic order on the exponents, already used for the block representation. Natural numbers in the exponents are represented by their sequences of binary digits. The total size of an exponent \> is written =n+l>>. We let =maxsupp P> l> for the maximum size of the exponents of , and =\|supp P\|> for the number of non-zero terms of >>. Comparing or adding two exponents and takes +l)> bit-operations. Therefore reading the coefficient of a given exponent in costs +l)*log s)> bit-operations by a dichotomic search. Adding and can be done with +s)> additions and copies in > plus +l)max(s,s))> bit-operations. Now consider the following algorithm for the computation of : <\algorithm> <\enumerate> If =0> then return . If =1> then return *Q)supp Q>>, where is the only exponent of . Split into > and > with respective sizes s/2> and -h>. Compute =P*Q> and =P*Q> recursively. Return +R>. <\proposition> Assuming the sparse representation of polynomials, the product can be computed using *s*log min(s,s))> operations in >, plus +l)s*s*log min(s,s))> bit-operations. <\proof> We can assume that \s> from the outset. The number of operations in> is clear because the depth of the recurrence is )>. Addition of exponents only appears in the leaves of the recurrence. The total cost of step2 amounts to +l)s*s)>. The maximum bit-size of the exponents of the polynomials in> and > in step4 never exceeds +l)>, which yields the claimed bound. <\remark> The logarithmic factor ,s)> tends to be quite pessimistic in practice. Especially in the case when \s*s>, the observed complexity is rather *s)>. Moreover, the constant corresponding to this complexity is quite small, due to the cache efficiency of the dichotomic approach. <\remark> For very large input polynomials it is useful to implemented an additional dichotomy on in order to ensure that fits in the cache, most of the time. In the next tables we provide timings for /p>, with . For Table we pick up =s> random monomials in the block ,10 000}>, with random coefficients. Here the exponents are ``packed'' into a single machine word if possible. This is a classical useful trick for when the coefficients are small that was already used in. For and the exponents are packed into a 64bits word. But for , the packing requires bits, and thus we operate directly on the vector representation. <\big-table||||||||||,s>>||||||||280>>|>>||||||||>|>>||||||||>|>||||||||500>>>>>> Sparse polynomial product (packed exponents, in milliseconds) In the following table we give the timings of same computations with 13: <\big-table||||||,s>>||||||||280>>|>>||||||||208>>|>>||||||||258>>|>||||||||636>>>>>> Sparse polynomial product with 13 (in milliseconds) These timings confirm that our implementation of the naive algorithms is already competitive. In the next table we consider polynomials with increasing size in afixed bounding hypercube ,d-1}>>. The row ``density'' indicates the ratio of non-zero terms with exponent in ,d-1}>. <\big-table||||||>, >|>, >>|%>||>|%>||>|%>||>|%>||244>>|%>||037>>|%>||980>>|||>>>>> Sparse polynomial product (in milliseconds) and comparison with Kronecker multiplication from Tables and. In Table, we see that the Kronecker substitution is faster from 5% of density in the first case and 3% in the second case. Roughly speaking, in order to determine which algorithm is faster one needs to compare *s> to (d)>. The efficiency of the Kronecker substitution relies on the underlying univariate arithmetic. Let us also consider the following example with =\>, borrowed from: <\eqnarray*> ||+z+z+z)>>|||+z+z+z)+1.>>>> This example has been very recently used in as a benchmark to compare the new implementation that will be available in 14 to other software: in their Table1, we see that 14 takes 2.26s, which is faster than , , , and . We could not reproduce their computation, because 14 is not available yet, but the platform we use is essentially the same, so that we could compare to our timings. Firstly the Kronecker substitution takes 3.5s, which is already faster than all other software they tested. The drawback of the Kronecker substitution is the memory consumption, but the direct call of the naive product takes 377s: the coefficients of the product have at most 83 bits, and the overhead involved by is important. Yet with Chinese remaindering this time drops to 16s (see for instance Section5 of for a general description of this classical technique). In this situation we can chose the moduli for the Chinese remaindering such that we can add several products of the preimage in one 64 bit-word before reduction. This classical trick leads to 8s when using primes of 27 bits. Finally the gap can be further lowered: by taking only two numbers that are coprime such as > and -1>>, for which the remainders are very cheap, the multiplication time is reduced to 4s. We shall consider multivariate series truncated in total degree. Such a series to order> is represented by the sequence of its homogeneous components up to degree -1>. For each component we use a sparse representation. Let and be two series to order>. The homogeneous component of degree in is written >. The naive algorithm for computing their product works as follows: <\algorithm> <\enumerate> Initialize \\\H-1>\0>. For from to -1> do For from to -1-i> do \H+F*G>.> The number of non-zero terms in is denoted by =s>+\+s-1>>>. The maximum size of the exponents of > is represented by =max{0,\,\-1}>l>>. <\proposition> With the above sparse representation, the product can be done with *s**log min(s,s))> operations in >, plus +l)s*s*log min(s,s))> bit-operations. <\proof> By Proposition, the total number of operations in > is in <\equation*> O-1>s>*s>*log min(s>>,s>)=O(s*s**log min(s,s)), and the total bit-cost follows in the same way. <\remark> Proposition is pessimistic in many cases: this cost bound is merely the one of the corresponding polynomial product discarding truncation. In the next subsection we are to take care of the truncation in terms of the dense size. Let => represent the number of the monomials of degree with variables, and let ,n>=h+\+h-1,n>=-1|n>> be the number of the monomials of degree at most -1>. We shall consider the product in the densest situation, when >=s>=h> for all . <\proposition> Assuming the sparse representation of polynomials, the product up till order > takes ,2*n> log s,n>)> operations in >, plus >*s,2*n>*log s,n>)>> bit-operations. <\proof> The result follows as in proposition thanks to the following identity: <\equation> -1>h*h=s,2*n>. This identity is already given in6> for a similar purpose. We briefly recall its proof for completeness. Let =h*h>, let (t)=0>c*t> for the generating series, and let also (t)=0>h*t>. On one hand, we have (t)=H(t)>. On the other hand, (t)=(1-t)>. It follows that (t)=H(t)>, and that =h>, whence(). Finally the bit-cost follows in the same way with > and > being bounded by >)>. If =2>, then =n+1> and =2*n+1>, so the naive sparse product is softly optimal when grows to infinity. If =n> and is large, then <\eqnarray*> ,n>>||\(log 4)*n>>|,2*n>>||\(log >)*n.>>>> In particular, we observe that the naive algorithm has a subquadratic complexity in this case. In general, for fixed and for when > tends to infinity, the cost is quadratic since the ratio <\eqnarray*> ,2*n>*|s,n>>>|-1)\\|(n+\-1)\\>*|(2*n)!>>|>>> tends to *n>/4>. Remark that we do not propose a specific dense representation for the series. This could be possible as we did for polynomials and gain in memory space (but not so much with respect to the ``packed exponent'' technique). However one can not expect more because there is no straightforward generalization of the fast algorithms for series to several variables such as the Kronecker substitution. In Table we report on timings for /p>, with . Comparing with Tables and, we observe that, for small , Kronecker substitution quickly becomes the most efficient strategy. However, it computes much more and its memory consumption is much higher. For small , one could also benefit from the truncated Fourier transform, as explained in Section6 of. In higher dimensions, say and order =20>, the Kronecker substitution is hopeless: the size of the univariate polynomials to be multiplied is \4.1\10>. <\big-table||>>|||||>|>|||||>|,2>>>||||240>|880>>|>||||775>|>>>|,n>>>||540>|480>|560>|>>>|>||772>|>>>|>>>|>>>>|,6>>>|005>|100>|>>|>>|>>>>>>> Dense series product (with packed exponents, in milliseconds) Let > be an effective algebra over an effective field >, all algebra and field operations can be performed by algorithm. Let and still be two multivariate polynomials in [z,\,z]> we want to multiply, and let be their product. We assume we are given a subset > of size > that contains the support of . We let ,\,d\\> be the minimal numbers with {0,\,d-1}>. With no loss of generality we can assume that \2> for all . To analyze the algorithms of this section we shall use the quantities =supp P>l>, =supp Q>l> and >=X>l>. We also introduce =s+s+s> and =e+e+e>, and =d*\*d>. Since the support of the product is now given, we will neglect the bit-cost due to computations on the exponents. Given pairwise distinct points ,\,\\\>> and \>, let \\> be the linear map which sends ,\,a)> to ),\,A(\))>, with *u+\+a>. In the canonical basis, this map corresponds to left multiplication by the generalized Vandermonde matrix <\equation*> V,\,\>=|>|>|>>||>|>|>>|>|>||>>||>|>|>>>>>. The computation of and its inverse 1>> (if ) correspond to the problems of multi-point evaluation and interpolation of a polynomial. Using binary splitting, it is classical that both problems can be solved in time t/s\*(s)*log s)>>; see also10> for a complete description. Notice that the algorithms only require vectorial operations in > (additions, subtractions and multiplications with elements in >). The algorithms of this section rely on the efficient computations of the transpositions >,(E1>)>:(\)>\(\)>> of and 1>>. The map >> corresponds to left multiplication by <\equation*> V,\,\>>=||>|>|>|>|>|>>|>|>||>>|>|>|>|>>>>>. By the transposition principle, the operations >> and )>> can again be computed in time t/s\*(s)*log s)>. There is an efficient direct approach for the computation of >>. Given a vector (\)>> with entries ,\,a>, the entries ,\,b> of >(a)> are identical to the first coefficients of the power series <\equation*> t>|1-\*u>. The numerator and denominator of this rational function can be computed using the classical divide and conquer technique, as described in. If s>, then this requires (s)*log s)> vectorial operations in>. The truncated division of the numerator and denominator at order requires (s))> vectorial operations in >. If s>, then we cut the sum into t/s\> parts of size > s>, and obtain the complexity bound t/s\*(s)*log s)>. Inversely, assume that we wish to recover ,\,a> from ,\,b>, when . For simplicity, we assume that the > are non-zero (this will be the case in the sequel). Setting *u+\+b>, *u)*\*(1-\*u)> and , we notice that 1>)=\a*(u*D)(\1>)> for all . Hence, the computation of the > reduces to two multi-point evaluations of and > at 1>,\,\1>> and divisions. This amounts to a total of (s)*log s)> vectorial operations in > and divisions in >. The Kronecker substitution >>, introduced in Section, sends any monomial >*\*z>> to >, where +i*d+\+i*d*\*d>. It defines an isomorphism between polynomials with supports in and univariate polynomials of degrees at most -1>, so that >(R)=\>(P)*\>(Q)>. Assume now that we are given an element \> of multiplicative order at least > and consider the following evaluation map <\eqnarray*> :>[z]>|>|X>>>>||>|>(A)(1),\>(A)(\),\,\>(A)(\-1>)).>>>> We propose to compute though the equality (R)=\(P)*\(Q)>. Given ,\,i}>\{0,\,d-1}>, let ,Y,\>> be the matrix of > restricted to the space of polynomials with support included in . Setting =index(i,X)>, we have <\equation*> V,Y,\>\V,\>,\,\>>>=||>|>|>>|>>|>|>>>|>|>||>>|-1)*k>>|-1)*k>>|>|-1)*k>>>>>>*. Taking , , this allows us to compute (P)> and (Q)> using our algorithm for transposed multi-point evaluation from the preceding subsection. We obtain (R)> using one Hadamard product (P)*\(Q)>. Taking , the points >,\,\>>> are pairwise distinct, since the > are smaller than the order of >. Hence ,X,\>> is invertible and we recover from (R)> using transposed multi-point interpolation. <\proposition> Given two polynomials and in [z,\,z]> and an element \> of order at least >, then the product can be computed using: <\itemize> )> products in > that only depend on , and ; )> inversions in >, )> products in >, and |s>*(s)*log s> vectorial operations in >. <\proof> By classical binary powering, the computation of the sequence >,\,\\d>> takes )> operations in > because each -1> does appear in the entries of . Then the computation of all the > for supp P> (resp. and ) requires )> (resp. )> and )>) products in >. Using the complexity results from Section, we may compute (P)> and (Q)> using s/s\+\s/s\)*(s)*log s)> vectorial operations in>. We deduce (R)> using )> more multiplications in >. Again using the results from Section, we retrieve the coefficients > after (s)*log s)> further vectorial operations in> and )> divisions in >. Adding up the various contributions, we obtain the theorem. For when the supports of and and also are fixed all the necessary powers of > can be shared and seen as a pretreatment, so that each product can be done in softly linear time. This situation occurs in the algorithm for counting the number of absolutely irreducible factors of a given polynomial, that we study in Section. Similar to FFT multiplication, our algorithm falls into the general category of multiplication algorithms by evaluation and interpolation. This makes it possible to work in the so-called ``transformed model'' for several other operations besides multiplication. In the rest of this section we describe how to implement the present algorithm for the usual coefficient rings and fields. We analyze the bit-cost in each case. If >> is the finite field >> with > elements, then its multiplicative group is cyclic of order -1>. Whenever -1\d>, Proposition applies for any primitive element > of this group. We assume that >> is given as the quotient [u]/G(u)> for some monic and irreducible polynomial of degree . <\corollary> Assume -1\d>, and assume given a primitive element > of>>>. Given >>[z,\,z]>, the product can be computed using <\itemize> *(k))> ring operations in > that only depend on , and; |s>>*(s*k)*log s+s*(k)*log k> ring operations in > and )> inversions in >. <\proof> A multiplication in >>> amounts to (k))> ring operations in >. An inversion in >>> requires an extended computation in [u]> and gives rise to (k)*log k)> ring operations in > and one inversion: this can be achieved with the fast Euclidean algorithm11>, with using pseudo-divisions instead of divisions. Using the Kronecker substitution, one product in >>[u]> in size takes (n*k))> operations in >. The conclusion thus follows from Proposition. Applying the general purpose algorithm from, two polynomials of degree over> can be multiplied in time (log p)*n*log n*log log n)>. Alternatively, we may lift the multiplicands to polynomials in [u]>, use Kronecker multiplication and reduce modulo. As long as , this yields the better complexity bound (n*log p))>. Corollary therefore further implies: <\corollary> Assume -1\d> and *k)=O(log p)>, and assume given a primitive element > of >>>. Given two polynomials and in >>[z,\,z]>, the product can be computed using <\itemize> *(k*log p))> bit-operations that only depend on , and ; |s>>*(s*k*log p)*log s+s*(k*log p)*log k*+s (log p) log log p> bit-operations. If -1\d> then it is always possible to build an algebraic extension of suitable degree in order to apply the corollary. Such constructions are classical, see for instance . We need to have -1\d>, so should be taken of the order > d>, which also corresponds to the additional overhead induced by this method. If > d>> exceeds )> and , then we notice that the softly linear cost is lost. This situation may occur for instance for polynomials over>. In practice, the determination of the primitive element > is a precomputation that can be done fast with randomized algorithms. Theoretically speaking, assuming the generalized Riemann hypothesis, and given the prime factorization of -1>, aprimitive element in >>> can be constructed in polynomial time1, >. Let =max l\|>> denote the maximal bit-size of the coefficients of and similarly for and . Since <\equation*> max\|R\|\min(s,s)*max\|P\|*max\|Q\|, we have <\equation*> h\h\h+h+l,s)>. One approach for the multiplication of polynomials with integer coefficients is to reduce the problem modulo a suitable prime number . This prime number should be sufficiently large such that can be read off from and such that> admits elements of order at least >. It suffices to take max(2,d)>, so Corollary now implies: <\corollary> Given [z,\,z]>, a prime number max(2,d)> and a primitive element > of >>, we can compute with <\itemize> *(log p))> bit-operations that only depend on , , and ; |s>>*(s*log p)*log s+s (log p) log log p> bit-operations. Let > denote the -th prime number. The prime number theorem implies that \n*log n>. Cramér's conjecture states that <\equation*> limsup\>-p|(log p)>=1. This conjecture is supported by numerical evidence, which is sufficient for our practical purposes. Setting ,d)>, the conjecture implies that the smallest prime number with N> satisfies N)>. Using a polynomial time primality test, it follows that this number can be computed by brute force in time >. In practice, in order to satisfy the complexity bound it suffices to tabulate prime numbers of sizes 2, 4, 8, 16, etc. In our algorithm and Corollary, we regard the computation of a prime number N=,d)>> as a precomputation. This is reasonable if is not too large. Now the quantity > usually remains reasonably small. Hence, our assumption that is not too large only gets violated if +h> becomes large. In that case, we will rather use Chinese remaindering. We first compute h/log d)> prime numbers \\\p> with <\eqnarray*> >|>|,>>|*\*p>|>|.>>>> Such a sequence is said to be a with order> and capacity >, if, in addition, we have that )>, where \p>. In fact Bertrand's postulate12, Theorem1.3> ensures us that there exists > between +1> and >, then one can take > between +1> and >, etc, so that =O(log d+r)>>. We stop this construction with \p\2> and \p\2>>, hence with )>>. This proves that such reduced sequences actually exist. Of course Bertrand's postulate is pessimistic, and in practice all the > are very close to >. Each >> contains a primitive root of unity > of order at least >. We next proceed as before, but with *\*p> and \\/p*\> such that mod p=\> for each. Indeed, even though /p*\> is not a field, the fact that each ,X,\> mod p=V,X,\>> is invertible implies that ,X,\>> is invertible, which is sufficient for our purposes. <\corollary> Given [z,\,z]>, a reduced sequence \\\p> of prime moduli with order > and capacity > and an element \\/p*\*p*\> of order at least>, we can compute with <\itemize> \*(log p)> bit-operations that only depend on , , and ; |s>*(s*log p)*log s+s*(log p)*log log p> bit-operations. <\proof> This follows from Proposition, following the proofs of Corollaries and, . Whenever =O(h)> we have that . Therefore, for fixed supports of and , and fixed , this method allows us to compute several products in softly linear time. Remark that for moderate sizes of the coefficients it is even more interesting to compute the products modulo each > in parallel, and then use Chinese remaindering to reconstruct the result. An important kind of sparse polynomials are power series in several variables, truncated by total degree. Such series are often used in long term integration of dynamical systems, in which case their coefficients are floating point numbers rather than integers. Assume therefore that and are polynomials with floating coefficients with a precision of > bits. Let > be the maximal exponent of the coefficients of . For a so called \\>, fixed by the user, we let > be the integer polynomial with <\equation*> =\P*2+\-\>\ for all . We have >\\+\> and <\equation*> \|P-*2-\-\>\|\2-\-\-1> for the sup-norm on the coefficients. If all coefficients of have a similar order of magnitude, in the sense that the minimal exponent of the coefficients is at least -\>, then we actually have *2-\-\>>. Applying a similar decomposition to , we may compute the product <\equation*> P*Q=**2+\-2*\-\-\> using the preceding algorithms and convert the resulting coefficients back into floating point format. Usually, the coefficients > of a univariate power series are approximately in a geometric progression \\*i>. In that case, the coefficients of the power series *z)> with =\\>> are approximately of the same order of magnitude. In the multivariate case, the coefficients still have a geometric increase on diagonals k*i\,\,\k*i\>\\,\,k>*i>, but the parameter ,\,k>> depends on the diagonal. After a suitable change of variables \\*z>, the coefficients in a big zone near the main diagonal become of approximately the same order of magnitude. However, the discrepancy usually needs to be chosen proportional to the total truncation degree in order to ensure sufficient accuracy elsewhere. Let us now consider the case when >>. Let > and > denote the least common multiples of the denominators of the coefficients of . One obvious way to compute is to set \P*q>, \Q*q>, and compute *> using one of the methods from Section. This approach works well in many cases ( when and are truncations of exponential generating series). Unfortunately, this approach is deemed to be very slow if the size of > or > is much larger than the size of any of the coefficients of . An alternative, more heuristic approach is the following. Let \p\\> be an increasing sequence of prime numbers with \d> and such that each > is relatively prime to the denominators of each of the coefficients of and . For each , we may then multiply > and > using the algorithm from Section. For >, we may recover *\*p> using Chinese remaindering and attempt to reconstruct from *\*p> using rational number reconstruction5>. If this yields the same result for a given and, then the reconstructed is likely to be correct at those stages. This strategy is well-suited to probabilistic algorithms, for polynomial factorization, polynomial system solving, etc. Of course, if we have an bound on the bit sizes of the coefficients of , then we may directly take a sufficient number of primes \\\p> such that can be reconstructed from its reduction modulo *\*p>. We illustrate the performances of the algorithms of this section for a prime finite field, which is the central case to optimize. We take /p>, with 2>. If the size of the product is of the order of *s> then the naive algorithm is already softly optimal. If the polynomials are close to being dense then the Kronecker substitution is the most efficient in practice. Here we consider a case which lies in between these two extremes. More precisely, we pick polynomials with terms of total degree at most > and at least -4> with random coefficients in >. The subset can be easily taken as the set of the monomials of degree at most > and at least -8>. In Table we compare the fast algorithm of Section to the naive one of Section and the direct use of the Kronecker substitution.\ <\big-table||>>|||||||||>|>>|||||||317>|457>|855>>|||||||408>|664>|765>|>>>|||||||352>|064>|918>|615>>|>>|||106>|226>|466>|946>|906>|826>|666>>>>>> Polynomial product with 2 variables of two strips from -3> to > (in milliseconds) With 2 (and also with 3 variables) the theoretical asymptotic behaviours are already well reflected. But the fast algorithm only gains for very large sizes. When sharing the same supports in several product the benefit of the fast product can be observed earlier. We shall illustrate this situation in Section, for a similar family of examples. In this section, we show how to build a multiplication algorithm for formal power series on top of the fast polynomial product from the previous section. We will only consider power series which are truncated with respect to the total degree. Given \>, we let +\+i>. The of a polynomial \[z]> is defined by \0}>. Given a subset \>, we define the > of to by <\equation*> P=I>P*z. For \\>, we define the initial segments >> of > by >={i\\:\|i\|\\}>. Then <\equation*> \[z]>>=\[[z]]>>={P\\[z]:supp P\I>}={P>>:P\\[z]} is the set of polynomials of total degree > \>. Given \[z]>>>, the aim of this section is to describe efficient algorithms for the computation of >>>. We will follow and extend the strategy described in. <\remark> The results of this section could be generalized in the same way as in to so called weighted total degrees *i+\+\*i> with ,\,\\0>, but for the sake of simplicity, we will stick to ordinary total degrees. Given a polynomial \[z]>, we define its (P)\\[z]> by <\equation*> \(P)(z,\,z)=P(z*z,\,z*z,z). If I>>, then (P)\J>>, where <\eqnarray*> >>||+i>,\,i-1>+i>,i>):i\I>}.>>>> Inversely, for any \[z]>>>, there exists a unique 1>(P)\\[z]>>> with (\1>(P))>. The transformation > is an injective morphism of >-algebras. Consequently, given \\[z]>>>, we will compute the truncated product >>> using <\equation*> (P*Q)>>=\1>((\(P)*\(Q))>>). Given a polynomial \[z]> and \>, let <\equation*> P=,\,i>P,\,i,j>*z>*\*z>*z\\[z,\,z]. If J>>, then \X>, with <\eqnarray*> ||\:i+\+i\\}.>>>> Let > be an element of > of order at least >. Taking as above, the construction in Section yields a >-linear and invertible evaluation mapping <\equation*> \:\[z]\\, such that for all \[z]> with \[z]>, we have <\equation> \(P*Q)=\(P)*\(Q). This map extends to [z][z]> using <\eqnarray*> (P+\+P*z)>||(P)+\+\(P)*z\\[z].>>>> Given \[z]>>> and \>, the relation() yields <\equation*> \((P*Q))=\(P)*\(Q)+\+\(P)*\(Q). In particular, if >>>, then <\equation*> \(R)=\(P)*\(Q) mod z>. Since > is invertible, this yields an efficient way to compute . The number of coefficients of a truncated series in [[z]]>>> is given by <\equation*> \|I>\|=s,n>=-1|n>. The size =\|X\|> of is smaller by a factor between and >: <\eqnarray*> >||-2|n-1>=-1>*\|I>\|.>>>> <\proposition> Given \[[z]]>>> and an element \\> of order at least >, we can compute >>> using *\)> inversions in >, *(\))> ring operations in>, and *(s)*log s)> vectorial operations in >. <\proof> We apply the algorithm described in the previous subsection. The transforms > and 1>> require a negligible amount of time. The computation of the evaluation points >> only involves )> products in>, when exploiting the fact that is an initial segment. The computation of (\(P))> and (\(Q))> requires *(s)*log s)> vectorial operations in >, as recalled in Section. The computation of (\(P))*\(\(Q)) mod z>> can be done using *(\))> ring operations in>. Recovering again requires *(s)*log s)> vectorial operations in >, as well as *\)> inversionsin>. <\remark> Since *\=(s,n>)> by3>, Proposition ensures that power series can be multiplied in softly linear time, when truncating with respect to the total degree. In the finite field case when \>[[z]]>>>, the techniques from Section lead to the following consequence of Proposition. <\corollary> Assume 2>, -1\\> and *k)=O(log p)>, and assume given a primitive element > of >>>. Given \>[[z]]>>>, we can compute >>> using <\equation*> O(\*(s*k*log p)*log s+s*\*((k*log p)*log k+(log p)*log log p)) bit-operations. <\proof> The *\> inversions in >> take *\*((k*log p)*log k+(log p)*log log p))> bit-operations, when using Kronecker substitution. The *(s)*log s)> ring operations in >> amount to *(s*k*log p)*log s)> more bit-operations. Since 2> we have =O(s)>, which implies *(\)=O(\*(s))>. The conclusion thus follows from Proposition. If -1\\>> does not hold, then it is possible to perform the product in an extension of degree log \/log p\>> of >>, so that )-1\\>. Doing so, the cost of the product requires (s*\)> operations in )>>, which further reduces to (s,n>)> by3>. The field extension and the construction of the needed> can be seen as precomputations for they only depend on , >, and. Since ,n>)>, we have therefore obtained a softly optimal uniform complexity bound in the finite field case. Notice that the direct use of Kronecker substitution for multiplying and yields (k*(2*\-1))> operations in >. In terms of the dense size of and , the latter bound becomes of order (2*n!*k*s,n>)>, which is more expensive than the present algorithm. If \[[z]]>>>, then the assumption 2>, with +h+l,n>>>, guarantees that the coefficients of the result can be reconstructed from their reductions modulo . If \\>, and if we are given a prime number [2,2)> and a primitive element > of >>>, where log \/log p\>, then >>> can be computed using (n*h*s,n>)> bit-operations, by Corollary. Otherwise, in the same way we did for polynomials in Section, Chinese remaindering leads to: <\corollary> Assume that 2>, that \\>, and that we are given areduced sequence \\\p> of prime moduli with order > and capacity>, and an element \\/p*\*p*\> of order at least >. Given \\[[z]]>>>, we can compute >>> using <\equation*> >*(s*h)*log s+s*\*(h)*log h)>>>> bit-operations. <\proof> Let \p>. Since =O(log p)>, we can use the Kronecker substitution with the algorithm underlying Proposition over /p*\> to perform the truncated product of and with *(s*log p)*log s+s*\*(log p)*log log p)>>> bit-operations. The conclusion thus follows from . Remark that the bound in Corollary is softly optimal, namely (h*s,n>)>, which is much better than a direct use of Kronecker substitution when becomes large. We take /p>, with . The following tables demonstrate that the the theoretical asymptotic costs can really be observed. |>>|||||>|>|||||151>>|>|||121>|248>|122>>>>>|Fast series product (in milliseconds)> When comparing Tables and, we see that our fast product does not compete with the naive algorithm up to order 160 in the bivariate case. For three variables we observe that it outperforms the naive approach at large orders, but that it is still slower than Kronecker substitution (see Table). |>>||||>|||||>|||||>>>|||||>>>>|Series products with 4 variables (in seconds)>>> For 4 variables we see in Table that the Kronecker product is slower than the naive approach (it also consumes a huge quantity of memory). The naive algorithm is fastest for small orders, but our fast algorithm wins for large orders. For 5 and more variables the truncation order can not grow very large, and the naive algorithm becomes subquadratic, so that the threshold for the fast algorithm is much higher: |>>||||||>|||||||>|||||>>|>>|>>>|||||||>>>>|Series products with 5 variables (in seconds)> Let > be a field, and let \[x,\,x,y]=\[x,y]>. In this section we are interested to count the number of the of , number of irreducible factors of over the algebraic closure |\>> of >. Let =deg F> denote the total degree in the variables , and let =deg F>. The of , which we will denote by , is the intersection of > and the convex hull of as a subset of >. The method we are to use is not new, but combined with our fast algorithms of Section, we obtain a new complexity bound, essentially (for fixed values of ) quadratic in terms of the size of S . Besides the support of , we need to introduce <\eqnarray*> >||((\\{(0,\,0)})\\)>>|>||(\\\\{0})>>|>||\S>>|||*+S)\(S+S).>>>> Notice that > consists of the elements S> with \0>. The set > contains the support of <\equation*> \ F=supp F>(e+\+e)*F*x*y, the set > contains the support of <\equation*> \ F=y*F|\y>=supp F>f*F*x*y, and > contains \ F>. The absolute factorization of mainly reduces to linear algebra by considering the following map: <\eqnarray*> |:\[x,y]>\\*[x,y]>>|>|[x,y]>>||>|F-H*\F-\G-\HF,>>>> where [x,y]>> represents the subset of the polynomials with support in > (and similarly for >, >, and ). <\proposition> Assume that > has characteristic 0 or at least *(2*d-1)+1>, that is primitive in , and that the discriminant of in is non-zero. Then the number of the absolutely irreducible factors of equals the dimension of the kernel of >. <\proof> This result is not original, but for a lack of an exact reference in the literature, we provide the reader with a sketch of the proof adapted from the bivariate case. Let ,\,\>> represent the distinct roots of in (x)|\>>. The assumption on the discriminant of ensures that all are simple. Now consider the partial fraction decompositions of and : <\equation*> =y*>|y-\>,=c(x)+*>|y-\>, where > and > belong to (x)|\>> and \(x)>. The fact that (G,H)=0> is equivalent to <\equation*> \=\, which rewrites into: <\with|mode|math> <\equation*> y*>(\)|y-\>+*\(\)|(y-\)>=-y>|(y-\)>. Therefore (\)> must vanish for all . In characteristic 0, this implies that the > actually belong to |\>>. If the characteristic is least *(2*d-1)+1> this still holds by the same arguments as in Lemma2.4 of. Let ,\,F> denote the absolutely irreducible factors of , and let |^>=F/F>. By applying classical facts on partial fraction decomposition, such as22.8> orA> for instance, we deduce that is a linear combination of the *\ F>, hence that belongs to the space spanned by the *\ F,*\ F)> over |\>>, for {1,\,r}>. Since *\ F)\S> and *\ F)\S>>, the couples *\ F,*\ F)>> form a basis of the kernel of > over |\>>, which concludes the proof. Proposition was first stated in in the bivariate case for the dense representation of the polynomials, and then in terms of the actual support of in but still for two variables. For several variables and block supports, generalizations have been proposed in2.3> but they require computing the partial derivatives in all the variables separately, which yields more linear equations to be solved than with the present approach. Let us recall that the kernel of> is nothing else than the first De Rham cohomology group of the complementary of the hypersurface defined by (this was pointed out in, we refer the reader to for the details). For a complete history of the algorithms designed for the absolute factorization we refer the reader to the introduction of. In fact, the straightforward resolution of the linear system defined by (G,H)=0> by Gaussian elimination requires -1>*t)> operations in >, where \|+\|S\|> is the number of the unknowns and s> is the number of equations2.11>. Here > is a real number at most 3 such that the product of two matrices of size s> can be done with >)> operations in >. In practice > is close to , so that Gaussian elimination leads to a cost more than quadratic. In, Gao suggested to compute the kernel of > using Wiedemann's algorithm: roughly speaking this reduces to compute the image by > of at most vectors. With a block support, and for when the dimension is fixed, the Kronecker substitution can be used so that a softly quadratic cost can be achieved in this way. In the next subsection we extend this idea for general supports by using the fast polynomial product of Section. The algorithm we propose for computing the number of the absolutely irreducible factors of summarizes as follows: <\algorithm> of . <\enumerate> Compute the integral hull of the support of . Deduce >, >, >, and . Pre-compute all the intermediate data necessary to the evaluation of > by means of the fast polynomial product of Section. Compute the dimension of the kernel of > with the randomized algorithm of4>, all the necessary random values being taken in agiven subset> of >. For simplicity, the following complexity analysis will not take into account the bit-cost involved by operations with the exponents and the supports. <\proposition> Assume that > has characteristic 0 or at least *(2*d-1)+1>, that is primitive in , that the discriminant of in is non-zero, that we are given an element > in > of order at least +1)(2*deg> F+1)>, and that the given set > contains at least > elements. Then Algorithm performs the computation of the integral hull of points of bit-size at most >, plus the computation of , plus (\|2*S\|*)> operations in>. It returns a correct answer with a probability at least *\|2*S\|(\|2*S\|+1)/\|\|>>. <\proof> Since is included in , the assumption on the order of > allows us to apply Proposition. In the second step, we thus compute all the necessary powers of > to evaluate >: because the supports are convex, this only amounts to operations in >. Then for any couple \\[x,y]>\\*[x,y]>>, the vector (G,H)> can be computed with (\|2*S\|)> operations. Now, by Theorem3 of, we can chose elements at random in> and compute the dimension of the kernel of > with evaluations of > and (\|2*S\|)> more operations in >. The probability of success being at least *\|2*S\|(\|2*S\|+1)/\|\|>, this concludes the proof. Once is known, the set can be obtained by means of the naive polynomial product with (l\|S\|)> bit-operations by Proposition. When the dimension is fixed and is non-degenerated then \| grows linearly with , whence our algorithm becomes softly quadratic in , in terms of the number of operations in>. This new complexity bound is to be compared to a recent algorithm by Weimann that computes the irreducible factorization of a bivariate polynomial within a cost that grows with >. In practice, the computation of the integral hull is not negligible when the dimension becomes large. The known algorithms for computing the integral hull of start by computing the convex hull. The latter problem is classical and can be solved in softly linear time in dimensions 2 and 3 (see for instance). In higher dimensions, the size of the convex hull grows exponentially in the worst case and it can be the bottleneck of our algorithm. With the fastest known algorithms, the convex hull can be computed in time +f*log s)> where is the number of faces of the hull (improvements are to be found in). In our implementation, we programmed the naive ``gift-wrapping'' method, which turned out to be sufficient for the examples below. In order to enumerate the points with integer coordinates in the convex hull, we implemented a classical subdivision method. This turned out to be sufficient for our purposes. But let us mention that there exist specific and faster algorithms for this task such as in for instance. Discussing these aspects longer would lead us too far from the purposes of the present paper. We chose the following family of examples in two variables, which depends on aparameter >: <\eqnarray*> >>||x+1>+>a*x*y-i>*y+1>+>b*x*y-i>*x\/2-1>y\/2-1>+>c*x*y-i>.>>>> Here ,b,c\\=\/p*\> are taken at random, with 2>>>. In Table below, we indicate the size > of , the size of the matrix >, and the time spent for computing the integral hull. Then we compare the time taken by the Wiedemann method with the naive and the fast polynomial products. As expected we observe a softly cubic cost with the naive product, and asoftly quadratic cost with the fast product. Notice that the supports are rather sparse, so that Kronecker substitution does not compete here. ||||>>||||>|||||>||||096>|425>>||||282>|745>>|>>>>|||767>|527>>|>>>>>>|725\ 926>>|445\1846>>|885\3686>>|765\7366>>>>>>|Counting the number of absolutely irreducible factors of >> (in seconds).> The goal of these examples is less the absolute factorization problem itself than the demonstration of the usefulness of the fast polynomial product on a real application. Let us finally mention that the algorithm with the smallest asymptotic cost, given in, will not gain on our family >>, because it starts with performing a random linear change of the variables. To the best of our knowledge, no other software is able to run the present examples faster. We have presented classical and new algorithms for multiplying polynomials and series in several variables with a special focus on asymptotic complexity. It turns out that the new algorithms lead to substantial speed-ups in specific situations, but are not competitive in a general manner. Of course, the fast algorithms involve many subalgorithms, which make them harder to optimize. With an additional implementation effort, some of the thresholds in our tables might become more favourable for the new algorithms. In our implementation, all the variants are available independently from one to another, but they can also be combined with given thresholds. This allows the user to finetune the software whenever it is known whether the polynomials are rather dense (which occur if a random change of the variables is done for instance), strictly sparse, etc. In the near future, we hope to extend the present techniques to the division and higher level operations such as the and polynomial factorization. <\bibliography|bib|alpha|all> <\bib-list|GKM+04> M.Agrawal, N.Kayal, and N.Saxena. PRIMES is in P. , 160(2):781--793, 2004. A.I. Barvinok. Computing the Ehrhart polynomial of a convex lattice polytope. , 12(1):35--48, 1994. A.I. Barvinok. A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. , 19(4):769--779, 1994. P.Bürgisser, M.Clausen, and M.A. Shokrollahi. . Springer-Verlag, 1997. D.Bernstein. The transposition principle. Available from. A.Bostan, G.Lecerf, and É.Schost. Tellegen's principle into practice. In , pages 37--44. ACM, 2003. A.Borodin and R.T. Moenck. Fast modular transforms via division. In , pages 90--96, Univ. Maryland, College Park, Md., 1972. A.Borodin and R.T. Moenck. Fast modular transforms. , 8:366--386, 1974. J.L. Bordewijk. Inter-reciprocity applied to electrical networks. , 6:1--74, 1956. D.Bini and V.Pan. . Birkhauser, 1994. J.Buchmann and V.Shoup. Constructing nonresidues in finite fields and the extended Riemann hypothesis. In , pages 72--79, New York, NY, USA, 1991. ACM. M.Ben-Or and P.Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. In , pages 301--309, New York, NY, USA, 1988. ACM. S.Czapor, K.Geddes, and G.Labahn. . Kluwer Academic Publishers, 1992. D.G. Cantor and E.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. J.Canny, E.Kaltofen, and Y.Lakshman. Solving systems of non-linear polynomial equations faster. In , pages 121--128, Portland, Oregon, A.C.M., New York, 1989. ACM Press. G.Chèze and G.Lecerf. Lifting and recombination techniques for absolute factorization. , 23(3):380--420, 2007. H.Cramér. On the order of magnitude of the difference between consecutive prime numbers. , 2:23--46, 1936. J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297--301, 1965. J.H. Davenport, Y.Siret, and É.Tournier. : systèmes et algorithmes de manipulations algébriques.> Masson, Paris, France, 1987. R.Fateman. Comparing the speed of programs for sparse polynomial multiplication. , 37(1):4--15, 2003. M.Fürer. Faster integer multiplication. In , pages 57--66, San Diego, California, 2007. S.Gao. Factoring multivariate polynomials via partial differential equations. , 72(242):801--822, 2003. J.vonzur Gathen and J.Gerhard. . Cambridge University Press, 2-nd edition, 2002. D.Y. Grigoriev and M.Karpinski. The matching problem for bipartite graphs with polynomially bounded permanents is in NC. In , pages 166--172, 1987. S.Gao, E.Kaltofen, J.May, Z.Yang, and L.Zhi. Approximate factorization of multivariate polynomials via differential equations. In , pages 167--174, New York, NY, USA, 2004. ACM. D.Y. Grigoriev, M.Karpinski, and M.F. Singer. Fast parallel algorithms for sparse multivariate polynomial interpolation over finite fields. , 19(6):1059--1063, 1990. D.Grigoriev, M.Karpinski, and M.F. Singer. Computational complexity of sparse rational interpolation. , 23(1):1--11, 1994. M.Gastineau and J.Laskar. Development of TRIP: Fast sparse multivariate polynomial multiplication using burst tries. In , LNCS 3992, pages 446--453. Springer, 2006. S.Gao and V.M. Rodrigues. Irreducibility of polynomials modulo via Newton polytopes. , 101(1):32--47, 2003. T.Granlund et al. GMP, the GNU multiple precision arithmetic library., 1991. S.Garg and É.Schost. Interpolation of polynomials given by straight-line programs. , 410(27-29):2659--2662, 2009. J.vander Hoeven etal. Mathemagix, 2002. . J.vander Hoeven and G.Lecerf. Divide and conquer products for multivariate polynomial and series. Work in progress, 2010. J.vander Hoeven. Relax, but don't be too lazy. , 34:479--542, 2002. J.vander Hoeven. The truncated Fourier transform and applications. In J.Gutierrez, editor, , pages 290--296, Univ. of Cantabria, Santander, Spain, July 4--7 2004. J.vander Hoeven. Newton's method and FFT trading. Technical Report 2006-17, Univ. Paris-Sud, 2006. To appear in JSC. G.H. Hardy and E.M. Wright. . Oxford University Press, 1979. S.C. Johnson. Sparse polynomial arithmetic. , 8(3):63--71, 1974. E.Kaltofen and Y.N. Lakshman. Improved sparse multivariate polynomial interpolation algorithms. In , pages 467--474. Springer Verlag, 1988. E.Kaltofen, W.Lee, and A.A. Lobo. Early termination in Ben-Or/Tiwari sparse interpolation and a hybrid of Zippel's algorithm. In , pages 192--201, New York, NY, USA, 2000. ACM. E.Kaltofen, Y.N. Lakshman, and J.-M. Wiley. Modular rational sparse multivariate polynomial interpolation. In , pages 135--139, New York, NY, USA, 1990. ACM. E.Kaltofen and B.D. Saunders. On Wiedemann's method of solving sparse linear systems. In H.F. Mattson, T.Mora, and T.R.N. Rao, editors, , volume 539 of , pages 29--38. Springer-Verlag, 1991. E.Kaltofen and B.M. Trager. Computing with polynomials given by black boxes for their evaluations: greatest common divisors, factorization, separation of numerators and denominators. , 9(3):301--320, 1990. G.Lecerf. Improved dense multivariate polynomial factorization algorithms. , 42(4):477--494, 2007. G.Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. , 5(1):1--10, September 2003. J.B. Lasserre and E.S. Zeron. An alternative algorithm for counting lattice points in a convex polytope. , 30(3):597--614, 2005. K.Makino and M.Berz. Remainder differential algebras and their applications. In M.Berz, C.Bischof, G.Corliss, and A.Griewank, editors, , pages 63--74, SIAM, Philadelphia, 1996. K.Makino and M.Berz. Suppression of the wrapping effect by Taylor model-based validated integrators. Technical Report MSU Report MSUHEP 40910, Michigan State University, 2004. M.Monagan and R.Pearce. Polynomial division using dynamic arrays, heaps, and packed exponent vectors. In , pages 295--315. Springer, 2007. M.Monagan and R.Pearce. Parallel sparse polynomial multiplication using heaps. In , pages 263--270, New York, NY, USA, 2009. ACM. M.Monagan and R.Pearce. Sparse polynomial multiplication and division in Maple 14. Available from , 2009. J.Matou²ek and O.Schwarzkopf. Linear optimization queries. In , pages 16--25, New York, NY, USA, 1992. ACM. G.I. Malaschonok and E.S. Satina. Fast multiplication and sparse structures. , 30(2):105--109, 2004. F.P. Preparata and M.I. Shamos. . Springer-Verlag, 1985. A.Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. , 7:395--398, 1977. R.Seidel. Constructing higher-dimensional convex hulls at logarithmic cost per face. In , pages 404--413, New York, NY, USA, 1986. ACM. D.Shanks. On maximal gaps between successive primes. , 18(88):646--651, 1964. H.Shaker. Topology and factorization of polynomials. , 104(1):51--59, 2009. A.Schönhage and V.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. D.R. Stoutmeyer. Which polynomial representation is best? In , pages 221--243, 1984. A.Storjohann. . PhD thesis, ETH, Zürich, Switzerland, 2000. V.Strassen. Die Berechnungskomplexität von elementarsymmetrischen Funktionen und von Interpolationskoeffizienten. , 20:238--251, 1973. M.Weimann. Algebraic osculation and factorization of sparse polynomials. Available from , 2009. M.Weimann. A lifting and recombination algorithm for rational factorization of sparse polynomials. Available from , 2009. K.Werther. The complexity of sparse polynomial interpolation over finite fields. , 5(2):91--103, 1994. T.Yan. The geobucket data structure for polynomials. , 25(3):285--293, 1998. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> SS71 Sch77 CK91 Fur07 Johnson1974 Stoutemyer1984 DavenportSiretTournier1987 CzaporGeddesLabahn1992 BuClSh1997 CT65 SS71 CK91 vdH:tft PB94 GaGe02 Yan1998 GastineauLaskar2006 MonaganPearce2007 MonaganPearce2009 MonaganPearce2009b Fateman2003 GaGe02 Fateman2003 MalaschonokSatina2004 vdH:relax vdH:fnewton HoevenLecerf2010 KaltofenTrager1990 CKL89 BenOrTiwari1988 GrigorievKarpinski1987 KaltofenYagati1988 KaltofenLakshmanWiley1990 KaltofenLeeLobo2000 GargSchost2009 GrigorievKarpinskiSinger1990 Werther1994 GrigorievKarpinskiSinger1994 CKL89 CKL89 GrigorievKarpinskiSinger1990 CKL89 LeSc03 LeSc03 CKL89 vdH:mmx CK91 Fur07 CT65 SS71 CK91 GMP Johnson1974 Fateman2003 MonaganPearce2009b GaGe02 vdH:fnewton vdH:fnewton MB72 Str73 BM74 GaGe02 Bor56 Bern:transp BoLeSc:2003:tellegen GaGe02 GaGe02 GaGe02 CK91 GaGe02 BS91 Cra36 Shanks1964 AKS04 HardyWright1979 MB96 MB04 GaGe02 LeSc03 vdH:relax LeSc03 LeSc03 Gao2003 GaGe02 ChezeLecerf2007 Gao2003 GaoRodrigues2003 GaoKaltofenMayYangZhi2004 Lecerf2005 Shaker2009 ChezeLecerf2007 Storjohann2000 Gao2003 KaltofenSaunders1991 KaltofenSaunders1991 Weimann2009a Weimann2009b PreparataShamos1985 Seidel1986 MatousekSchwarzkopf1992 Barvinok1994a Barvinok1994b LasserreZeronEduardo2005 ChezeLecerf2007 <\associate|table> <\tuple|normal> Block polynomial product with 2 variables (in milliseconds) > <\tuple|normal> Block polynomial product with 3 variables (in milliseconds) > <\tuple|normal> Sparse polynomial product (packed exponents, in milliseconds) > <\tuple|normal> Sparse polynomial product with |Maple> 13 (in milliseconds) > <\tuple|normal> Sparse polynomial product (in milliseconds) and comparison with Kronecker multiplication from Tables |->|-0.3em|>|0em||0em||>> and |->|-0.3em|>|0em||0em||>>. > <\tuple|normal> Dense series product (with packed exponents, in milliseconds) > <\tuple|normal> Polynomial product with 2 variables of two strips from |\-3> to |\> (in milliseconds) > Fast series product (in milliseconds)|> Series products with 4 variables (in seconds)|> Series products with 5 variables (in seconds)|> Counting the number of absolutely irreducible factors of |F>> (in seconds).|> <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |1.1.Related works |.>>>>|> > |1.2.Our contributions |.>>>>|> > |math-font-series||font-shape||2.Multiplication of block polynomials> |.>>>>|> |2.1.Dense polynomials using the block representation |.>>>>|> > |2.2.Naive product |.>>>>|> > |2.3.Kronecker substitution |.>>>>|> > |2.4.Timings |.>>>>|> > |math-font-series||font-shape||3.Naive multiplication of sparse polynomials> |.>>>>|> |3.1.Naive dichotomic multiplication |.>>>>|> > |3.2.Timings |.>>>>|> > |math-font-series||font-shape||4.Naive multiplication of power series> |.>>>>|> |4.1.Naive product |.>>>>|> > |4.2.Analysis for dense series |.>>>>|> > |4.3.Timings |.>>>>|> > |math-font-series||font-shape||5.Fast multiplication of sparse polynomials> |.>>>>|> |5.1.Evaluation, interpolation and transposition |.>>>>|> > |5.2.General multiplication algorithm |.>>>>|> > |5.3.Finite fields |.>>>>|> > |5.4.Integer coefficients |.>>>>|> > |5.4.1.Big prime algorithm |.>>>>|> > |5.4.2.Chinese remaindering |.>>>>|> > |5.5.Floating point coefficients |.>>>>|> > |5.6.Rational coefficients |.>>>>|> > |5.7.Timings |.>>>>|> > |math-font-series||font-shape||6.Fast multiplication of power series> |.>>>>|> |6.1.Total degree |.>>>>|> > |6.2.Projective coordinates |.>>>>|> > |6.3.Multiplication by evaluation and interpolation |.>>>>|> > |6.4.Complexity analysis |.>>>>|> > |6.5.Finite fields |.>>>>|> > |6.6.Integer coefficients |.>>>>|> > |6.7.Timings |.>>>>|> > |math-font-series||font-shape||7.Absolute factorization> |.>>>>|> |7.1.Reduction to linear algebra |.>>>>|> > |7.2.Algorithm |.>>>>|> > |7.3.Timings |.>>>>|> > |math-font-series||font-shape||Conclusion> |.>>>>|> |math-font-series||font-shape||Bibliography> |.>>>>|>