> <\body> <\hide-preamble> blockwise polynomial multiplication>|||<\author-address> Laboratoire d'Informatique, UMR 7161 CNRS École polytechnique 91128 Palaiseau Cedex, France |>|<\doc-note> This work has been partly supported by the French project, and by the grant of the Région Ile-de-France. |||>> <\abstract> In this article, we study the problem of multiplying two multivariate polynomials which are somewhat but not too sparse, typically like polynomials with convex supports. We design and analyze an algorithm which is based on blockwise decomposition of the input polynomials, and which performs the actual multiplication in an FFT model or some other more general so called ``evaluated model''. If the input polynomials have total degrees at most, then, under mild assumptions on the coefficient ring, we show that their product can be computed with |)>> ring operations, where denotes the number of all the monomials of total degree at most. Let> be an , which means that we have a data structure for representing the elements of> and algorithms for performing the ring operations in>, including the equality test. The complexity for multiplying two univariate polynomials with coefficients in> and of degree is rather well understood8>. Let us recall that for small, one uses the naive multiplication, as learned at high school, of complexity|)>>. For moderate, one uses the Karatsuba multiplication, of complexity |)>>, or some higher order Toom-Cook scheme, of complexity >|)>> with \\log 3/log 2>. For large, one uses FFT andTFT multiplications, both of complexity > whenever> has sufficiently many >th roots of unity, or the Cantor-Kaltofen multiplication with complexity in > otherwise. Unfortunately most of the techniques known in the univariate case do not straightforwardly extend to several variables. Even for the known extensions, the efficiency thresholds are different. For instance, the naive product is generally softly optimal when are used, because the size of the output can grow with the product of the sizes of the input. In fact, the nature of the input supports plays a major role in the design of the algorithms and the determination of their mutual thresholds. In this article, we focus on and analyze an intermediate approach to several variables, which roughly corresponds to an extension of the Karatsuba and Toom-Cook strategies, and which turns out to be more efficient than the naive multiplication for instance if the supports of the input polynomials are rather dense in their respective convex hulls. The main idea in our blockwise polynomial multiplication is to cut the supports in blocks of size \\\b> and to rewrite all polynomials as ``block polynomials'' in >,\,z>> with ``block coefficients'' in <\eqnarray*> >|>|\,\,z|]>:deg> P\b,\,deg> P\b|}>.>>>> These block coefficients are then manipulated in a suitable ``evaluated model'', in which the multiplication is intended to be faster than with the direct naive approach. The blockwise polynomials are themselves multiplied in a naive manner. The precise algorithm is described in Section, where we also discuss the expected speedup compared to the naive product. A precise complexity analysis is subtle because it depends too much on the nature of the supports. Although the worse case bound turns out to be no better than with the naive approach, we detail, in Section, a way for quickly determining a good block size, that suits supports that are not too small, not too thin, and rather dense in their convex hull. Our Section is devoted to implementation issues. We first design a cache-friendly version of our blockwise product. Then we adapt a blockwise product for multivariate power series truncated in total degree. Finally, we discuss a technique to slightly improve the blockwise approach for small and thin supports. In Section we analyze the complexity of the blockwise product for two polynomials supported by monomials of degree at most : if> contains> in its center, then we show that their product can be computed with |)>> operations in>, where denotes the number of all the monomials of degree at most>. Notice that the constant hidden behind the latter> does not depend neither on nor on the number of the variables. With no hypothesis on>, the complexity bound we reach becomes in |)>>, by a direct extension of the Karatsuba algorithm. Finally, we prove similar complexity results for truncated power series Algorithms for multiplying multivariate polynomials and series have been designed since the early ages of computer algebra. Even those based on the naive techniques are a matter of constant improvements in terms of data structures, memory management, vectorization and parallelization. With a single variable, the usual way to multiply two polynomials and of degree with the Karatsuba algorithm begins with splitting and into +z*P> and +z*Q>, where |d/2|\>>. Then *Q>, *Q>, and +P|)>*+Q|)>> are computed recursively. This approach has been studied for several variables, but it turns out to be efficient mainly for plain block supports, as previously discussed in3>. For any kind of support, there exist general purpose sparse multiplication algorithms. They feature quasi-linear complexity in terms of the sizes of the supports of the input and of a given superset for the support of the product. Nevertheless the logarithmic overhead of the latter algorithms is important, and alternative approaches, directly based on the truncated Fourier transform, have been designed in for supports which are initial segments of > ( sets of monomials which are complementary to a monomial ideal), with the same order of efficiency as the FFT multiplication for univariate polynomials. To the best of our knowledge, the blockwise technique was introduced, in a somewhat sketchy manner, in, and then refined in6>. In the case of univariate polynomials, its complexity has been analyzed in, where important applications are also presented.\ Throughout this article, we use the for the polynomials, and the with the of view4> for the complexity analysis of the algorithms. Informally speaking, this means that complexity estimates charge a constant cost for each arithmetic operation and the equality test in>, and that all the constants are thought to be freely at our disposal. Consider a multivariate polynomial \=\,\,z|]>>, also written as <\equation*> P=\>P*z=,\,i\\>P,\,i>*z>*\*z>. We define its by \:P\0|}>>, and denote its cardinality > by >. We interpret > as the of . Given a vector ,\,b|)>\>|)>> of positive integers, we define the set of at order by <\eqnarray*> >||\,\,z|]>:deg> P\b,\,deg> P\b|}>.>>>> Any polynomial can uniquely be rewritten as a >=|\>\\>>|\>>*z|\>>\\|]>=\>,\,z>|]>>, where <\eqnarray*> >|\>>>||i\b,\,0\i\b>P*|\>+i,\,b*|\>+i>*z.>>>> Given >,>\\|]>>, we also notice that >*>\\|]>>. We let =b*\*b> represent the size of the block. A univariate evaluation-interpolation scheme in size\>> on> is the data of <\itemize> an algorithm for computing an function :\\\>>, where > is the subset of polynomials in> of degree at most , and where > can be seen as the number of evaluation points, and an algorithm for computing an function, written >, but which is not necessarily exactly the inverse of >, :\>\\>, such that > and > are linear and <\equation*> Eval\Eval|)>=P*Q holds for all and in>, where> denotes the entry-wise product in>>. We write > for a common bound on the complexity of > and >, so that two polynomials in > can be multiplied in time +>. For convenience we still write > and > for extensions to any > seen as a ring endowed with the entry-wise product. <\example> The Karatsuba algorithm for polynomials of degree corresponds to , =3>, <\equation*> Eval:P+P*z\,P+P,P|)>, <\equation*> Eval:,C,C|)>\C+-C-C|)>*z+C*z. > can be interpreted as the evaluation of a polynomial at the values , , and >. By induction, the Karatsuba algorithm for polynomials of degree at most corresponds to =3*|d/2|\>|)>>, :P\Eval|d/2|\>>>|)>>, where>> is the block polynomial of with respect to >, and where |d/2|\>>>|)>> actually corresponds to applying the extension of |d/2|\>>> to |]>\\|d/2|\>|)>>>. This yields |)>=3> and |)>=\|)>>. Univariate evaluation-interpolation schemes can be extended to several variables as follows: if ,\,b|)>\>|)>> then we define =|)>*\*|)>>, define the maps <\eqnarray*> :\>|>|>>>||>|,1>\Eval,2>\\\Eval,n>,>>>> <\eqnarray*> :\>>|>|>>||>|,1>\Eval,n-1>\\\Eval,1>,>>>> and take <\equation> =*|)>|b>+\+|)>|b>|)>, where ,i>> and ,i>> represent the evaluation and interpolation with respect to the variable>: <\eqnarray*> ,i>>|||)>*\*|)>>\\,\,b|)>>>>|||\|)>*\*|)>>\\,\,b|)>>,>>>> <\eqnarray*> ,i>>|||)>*\*|)>>\\,\,2*b|)>>>>|||\|)>*\*|)>>\\,\,2*b,2*b|)>>.>>>> In the sequel we will only use multivariate evaluation-interpolation schemes constructed in this way. For what follows, we first assume that we have a strategy for picking a suitable evaluation-interpolation scheme for any given block size >|)>>, and that we have a fast way to compute the corresponding functions > and >, at least approximately. We also assume that /d> and /d> are increasing functions. Let us first assume that we have fixed a block size . The first version of our algorithm for multiplying two multivariate polynomials \> summarizes as follows: <\named-specified-algorithm|>> \> and >|)>>. \>. <|named-specified-algorithm> <\with|par-left|0fn> <\enumerate> Rewrite and as block polynomials >,>\\|]>>. Compute \Eval >\|\>>Eval>|\>>|)>*|)>|\>>> and \Eval >>. Multiply \*\\>|]>> using the naive algorithm. Compute >\Eval|)>\|\>>Eval>|\>>|)>*|)>|\>>\\|]>>. Reinterpret >> as a polynomial \> and return . Let >>>=>+supp >|\|>\s>>> represent the size of>> if there is no coefficient of >> which accidentally vanishes. <\proposition> The algorithm > works correctly as specified, and performs at most <\equation*> *s>>*s>>+*>>+s>>+2*s>>>|)> operations in >. <\proof> The correctness is clear from the definitions. Step1 performs no operation in>. Steps2 and4 require *>>+s>>+s>>>|)>> operations. Step 3 requires *s>>*s>>> operations. Step5 takes at most *s>>>\*s>>>> operations in order to add up overlapping block coefficients, where ,n|}>:b\1|}>|\|>>. Unfortunately, without any structural knowledge about the supports of >> and>>, the quantity >>>> requires a time >>*s>>> to be computed. In the next subsection we explain a heuristic approach to find a good block size.\ In order to complete our multiplication algorithm, we must explain how to set the block size. Computing quickly the block size that minimizes the number of operations in the product of two given polynomials and looks like a difficult problem. In Section we analyze it for asymptotically large simplex supports, but, in this subsection, we describe a heuristic for the general case. For approximating a good block size we first need to approximate the complexity in Proposition by a function, written >, which is intended to be easy to compute in terms of>, >, and the input supports. For instance one may opt for the formula <\eqnarray*> >||+2*|)>*>>+1|)>*>>+1|)>.>>>> Nevertheless, if and are expected to be not too sparse, then one may assume >>\2*>>+s>>|)>> and use the formula <\eqnarray*> >||*s>>*s>>+2**>>+s>>|)>.>>>> Then we begin the approximating process with ,1|)>>, and keep doubling the entry of which reduces > as much as possible. We stop as soon as > cannot be further reduced in this way. Given >|)>> and ,n|}>>, we write > for ,\,b,2*b,b,\,b|)>>. In order to make explicit the dependency in during the execution of the algorithm, we write >> instead of >>. <\named-specified-algorithm|>> \>. a block size >|)>> for multiplying and. <|named-specified-algorithm> Set ,1|)>>. While>> and>> are supported by at least two monomials repeat: <\indent> Let ,n|}>> be such that |)>> is minimal. If |)>\T>, then return . Set 2>. In the computation tree model over>, the algorithm actually performs no operation in>. Its complexity must be analyzed in terms of , which means here computation trees over/2*\>. For this purpose we assume that the supports are represented by vectors of monomials, with each monomial being stored as a vector of integers in binary representation. <\proposition> If and have total degrees at most, then the algorithm > performs *+s|)>*log d|)>> bit-operations, plus *log d|)>> calls to the function. <\proof> To run the algorithm > efficiently, we maintain the sets \supp P>> and \supp Q>> throughout the main loop. Since >> can be computed from > with >>*log d|)>> bit-operations, each iteration requires at most >>+s>>|)>*log d|)>> bit-operations. The number of iterations is bounded by>. Remark that in favorable cases the first few iterations are expected to approximately halve >>> at each stage. The expected total complexity thus becomes closer to +s|)>*log d|)>>. Several problems arise if one tries to implement the basic blockwise multiplication algorithm from Section without any modification: <\itemize> The mere storage of >, > and > involves >+s>+s>>|)>*> coefficients in>. This number is usually much larger than +s+s>>. Coefficients in >> usually will not fit into cache memory, which makes the algorithm highly cache inefficient. Both drawbacks can be removed by using a recursive multiplication algorithm instead, where the evaluation-interpolation technique is applied sequentially with respect to >,\,z>> for suitable pairwise distinct ,\,i\,n|}>>. <\named-specified-algorithm|,\,i|)>>> \>, >|)>> and pairwise distinct ,\,i\,n|}>>. \>. <|named-specified-algorithm> <\with|par-left|0fn> <\enumerate> If then return >. Let |-1|)> \>,1,b>,1,\,1|)>> and rewrite and as block polynomials >,>\\|]>>. Compute \Eval >> and \Eval >>. For each ,|}>> do <\indent> \,,b,i,\,i|)>>, where> is identified to |\>\\>|\>>|)>*|)>|\>>.> Compute >\Eval|)>\\|]>>. Reinterpret >> as a polynomial \> and return . It is not hard to check that the improved algorithm has the same complexity as the basic algorithm in terms of the number of operations in>. Let us now examine how to choose ,\,i> in order to increase the performance. Setting ,\,c|)>> with <\eqnarray*> >|||j\,\,i|}>>>|>|>>>>>,>>>> we first choose the set ,\,i|}>> in such a way that *> constants in > fit into cache memory for a certain threshold \1>. The idea is that the same coefficients should be reused as much as possible in the inner multiplication, so > should not be taken to small, \64>. For a fixed set ,\,i|}>> of indexes, we next take ,\,i> such that >\\\b>>, thereby minimizing the memory consumption of the algorithm. Let \> and \>. We define the of> to by \S>P*z>. Notice that =P> if, and only if, \> with \\:supp P\S|}>>. It is sometimes convenient to represent finite sets \> by polynomials \> with , for instance by taking \S>u*z>, for a given\\>. Given \> and a fixed support \>, the computation of > is called the . The basic blockwise multiplication algorithm is adapted as follows: <\named-specified-algorithm|>> \> and >|)>>. \\>. <|named-specified-algorithm> <\with|par-left|0fn> <\enumerate> Rewrite , and as block polynomials >,>,>\\|]>>. Compute \Eval >> and \Eval >>. Multiply \*|)>>>\\>|]>> using a naive algorithm. Compute >\Eval|)>\\|]>>>>. Reinterpret >> as a polynomial \> and return >. In a similar way as in Proposition, we can show that > requires at most <\equation> *s>>*s>>+*>>+s>>+2*s>>|)> operations in >. However, this bound is very pessimistic since usually has aspecial form which allows us to perform the naive truncated multiplication in step3 in much less than >>*s>>> operations. For instance, in the special and important case of truncated power series at order , we have <\eqnarray*> >|>|\:i+\+i\d-1|}>,>>>> and, by6>, |\|>=>. The naive truncated power series multiplication at order can be performed using only <\eqnarray*> >|>|+l=k>>|\|>*>|\|>==|\|>>>||>||\|>=>>>> operations, as shown in6>. Hence, if > and ,\,\|)>>, so that >=supp >=S|d/\|\>>>, then >>*s>>> can be replaced by |d/\|\>-1|2*n>> in our complexity bound(). Assuming that we have a way to estimate the number of operations in >> necessary to compute the truncated product> using the naive algorithm, we can adapt the heuristic of Section to determine a candidate block size. Finally let us mention that the additional implementation tricks from Section also extend to the truncated case in astraightforward way. If we increase the block size in , then the block coefficients >|\>>,>|\>>> get sparser and sparser. At a certain point, this makes it pointless to further increase . One final optimization which can often be applied is to decompose +P> and +Q> in a such a way that the multiplication *Q> can be done using a larger block size than the remaining part *Q+P*Q+P*Q> of the multiplication. Instead of providing a full and rather technical algorithm, let us illustrate this idea with the example <\eqnarray*> \Q>|>|+z+|)>*z+z.>>>> The naive multiplication performs products in.> The direct use of the Karatsuba algorithm with >, reduces to the naive product of two polynomials of terms with coefficients in>, which amounts to 3\9=81> multiplications in >. If we decompose and into <\eqnarray*> =Q>||+|)>*z>>|=Q>||+z,>>>> then *Q> takes products in>, and the naive multiplication for *Q+P*Q+P*Q> uses products in>. In this example, the separate treatment of the border saves7 products in> out of the36 of the naive algorithm, and is faster than the direct use of the blockwise algorithm. In this section we focus on the specific and important problems of multiplying polynomials\> of total degree at most, and truncated power series at order. Recall from Section that \\:i+\+i\d-1|}>>. Let <\eqnarray*> |)>>|>||)>*log|)>-\*log \,>>|>|)>>|>|-1|)>+2*\/\|)>|\|)>>.>>>> Let\0> be the first value of> such that |)>=\|)>>, and define \\|)>>. Numeric computations yield \\2.1455> and \\1.5337>. <\theorem> Let\0>, and assume that> admits evaluation-interpolation schemes with =2*d-1> and =\>|)>>, for all1>, and for some constant\0>. Then two polynomials in> of degree at most can be multiplied with > using at most *|\|>+\>|)>> operations in>, if the sizes of the blocks ,\,\|)>> are chosen as follows: <\itemize-dot> =1>, if \>, =4>, if \d/n\8>, =|\/2|\>>, if d/n>. <\proof> We introduce \d/n>. By Stirling's formula, there exists a constant> such that <\equation*> +|\|>\K,n\1. From |\|>=log =log*|)>>, we obtain the existence of a constant> such that the following inequality holds for all 1> and 1>: <\equation*> |\|>-n*\|)>-|2>|\|>\K. Therefore there exists a constant> such that <\eqnarray*> |\|>|n>-\|)>|\|>>|>|*>>||\|>|n>-\|)>|\|>>|>|*>>>> hold for all 1> and 1>. By Proposition, the cost of the multiplication is bounded by +T> with <\eqnarray*> >||*s>>*s>>=*>>|\|>,>>|>||*>>+s>>+2*s>>>|)>=2**>>|\|>+>-1>|\|>|)>,>>>> where >\|d/\|\>>, since >\>-1>, >\>-1>, and >\2*>-1>. With|\>\>/n>, |)>\log-1|)>,>and using, there exists a constant> such that: <\eqnarray*> |n>-\|)>-2*\|\>|)>|\|>>|>|*,>>||n>-\|)>-\|\>|)>|\|>>|>|*+\*+1|)>|n>.>>>> Since |)>=log|)>> and since |\>-\/\|\|>\1/n>, we can bound |\>|)>-\/\|)>|\|>> by |n>\2*>, and deduce the existence of an other constant> such that <\eqnarray*> |n>-\|)>-2*\/\|)>|\|>>|>|*,>>||n>-\|)>-\/\|)>|\|>>|>|*+\*+1|)>|n>.>>>> <\big-figure||1.00000000000000000000>|gr-geometry||gr-mode||gr-frame|>|magnify|1.18920711030014|gr-text-at-halign|center||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>>||line-width|2ln||>>||line-width|2ln||||||||||>>||>>||>>|>|>>|>|>>|>|>>|>|>>|>|>>|>|>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>||>||>>>>> Illustration of the curves >|)>> for various >, together with |)>>. From |)>=log|)>> we have that |)>-\|)>|)>\0>, and since \0> \|)>=0> it follows that |)>-\|)>\0>, and that there exists a constant> such that <\equation> -\|)>-2*\|)>|\|>\K*|n>. In order to express the execution time in terms of the output size, we thus have to study the function <\eqnarray*> |)>>|>|\|}>> \>|)>.>>>> The first >> are plotted in Figure. The right limit of> when> tends to is1, while the same limit for the other>> is>. Since|)>\0> whenever \0>, the sign of >-\> is the same as the one of <\equation*> \>|)>\\|)>+2*\/\|)>-2*\|)>. From >|)>=2/\*log/\|)>-2*log|)>\0> and \+\> \>|)>=log-1|)>-2*log \\0>, it follows that there exists a unique zero, written>>, of>-\> for each\2>. These zeros can be approximated by classical ball arithmetic techniques. We used the package (based on ) of to obtain \2.5>, \2.16>, and =\\>. Still using ball arithmetic, one checks that |)>,\|)>|)>\\|)>> achieves its maximum for \> at =\>. Given\8>, Lemma of the appendix shows that |\/2|\>>|)>\\>. Finally, we have obtained so far that taking =1> for \\>, =4> for \\\8> and =|\/2|\>> for larger >, the inequality >|)>\\> holds for all \0>. Therefore there exists a constant, such that |\|>\\+\> for all 3> and A*log log d>, or2> andA>. It remains to bound |\|>> for A*log log d> and sufficiently large. There exists a constant> such that: <\eqnarray*> |\|>-n*log d|\|>>||log-log n!|\|>>>||>|K*,>>||\|>-n*log d|\|>>|>|*.>>>> Therefore, taking\|\/2|\>>, there exists a contant> with <\eqnarray*> |>||)>+2*n*log |)>+\*log log d+K*>>||>|+2*n*log +\*log log d+K,>>>> which implies that |\|>\1+\\\> when is sufficiently large. <\remark> In the proof of Theorem, it was convenient for computational purposes to explicitly chose> in terms of>. However our choice of> is suboptimal. For instance, taking=|2.5*\|\>> for\4> is generally better whenever remains in >. In fact, we think that algorithm will do the choice of the block size just as well and maybe even a little bit better from the latter bounds when using a function precise enough. Adetailed proof of this claim sounds quite technical, so we have not pursued this direction any further in the present article. Nevertheless, the claim is plausible for the following reasons: <\itemize> For symmetry reasons, the block size computed by should usually be of the form ,|i\>,2,2,\,2|)>> for some \> (and modulo permutation of the indeterminates). The complexity of > for of the above form is close to what we would obtain by taking =*\*b|n>=2> in the proof of Theorem. When allowing for real >, the maximum of =max\|)>> \>|)>> is now reached at \2.1438>, with |)>=\>|)>=\\1.5336> and \3.7866>. For rings which do not support large evaluation-interpolation schemes we propose the following extension of Theorem. <\theorem> Let\0>, and assume that> admits a unit. Then two polynomials in> of degree at most can be multiplied using at most *|\|>+\>|)>> operations in>. <\proof> We use the Chinese remaindering technique from. It suffices to prove the theorem for 2>. The unit of> is written, and we introduce \\|]>/>-1|)>> and \\|]>/>-1|)>>. By using the aforementionedTFT algorithm, we can multiply and in> (resp. in>) via Theorem, with> (resp. with>) being the next power of (resp. of) of> if we do not perform the divisions by (resp. by). We thus obtain *R> and *R>. If+v*3=1> then we deduce as *R+v*3*R>. Replacing all arithmetic operations in > by arithmetic operations in \\> gives rise to an additional overhead of *log \*log log \|)>> with respect to the complexity analysis from the proof of Theorem. More precisely, we have a new constant> such that <\equation*> -\|)>-2*\|)>|\|>\K* The result is thus proved for when A*log d> for a sufficiently large constant. It remains to examine the case whenA*log d>. Again, we use a similar proof as at the end of Theorem, with new constants> and> such that <\eqnarray*> |>||)>+2*n*log |)>+log d+K*>>||>|+2*n*log +log d+K*,>>||>|*log d+K*n*log log d+K*.>>>> Hence |\|>\3/2+\\\> for sufficiently large. For rings which do not have a unit, we can use a Karatsuba evaluation-interpolation scheme. For this purpose we introduce <\eqnarray*> >|)>>|>|*log \+2*\/\|)>|\|)>>,>>||)>>|>|+\|)>-\|)>.>>>> Let> be the unique positive zero of>, and let\\|)>>. Numeric computations lead to \\1.2622> and \\1.5930>. <\theorem> Let\0>, let> be any ring. Then two polynomials in> of degree at most can be multiplied using at most *|\|>+\>|)>> operations in>. <\proof> We use the Karatsuba scheme2> with |)>=3> and |)>=|)>> for all 1>. We follow the proof of Theorem and begin with a new constant> such that <\equation*> -*log \-2*\|)>|\|>\K*|n>. In order to express the execution time in terms of the output size, we thus have to study the function <\eqnarray*> |)>>|>|\|}>> \>|)>.>>>> The right limit of> when> tends to is1, while the same limit for the other>> is>. Since|)>\0> whenever \0>, the sign of >-\>> is the same as the one of <\equation*> \,\>|)>\*log /\|)>+\/\|)>-\/\|)>. If \\>, then ,\>|)>\0> and \+\> \>|)>=-1|)>*log /\|)>\0>, which implies the existence of a unique zero of>-\>>, written,\>>. In particular, with =\/2>, we obtain ,\/2>=\*\>. For convenience we let>\\,\/2>>, and display the first few values: <\equation*> >||||||>|>>||||||>|>>|)>>||||||>>>>> One can verify that *log /\|)>+\*\/\|)>-\|)>=*log+\*t|)>-\|)>\0> holds for all \/B\1>, which implies that >>|)>\\>>|)>> for all \2*\>. Therefore |)>=\>|)>> with > such that >\\\\>> (with the convention that=0>). Lemma of the appendix provides us with|)>\\|)>=\>. Therefore there exists a constant, such that |\|>\\+\> for all 2> and A log d>, or1> andA>. It remains to bound |\|>> for A*log d> and sufficiently large, as in the end of the proof of Theorem. In this range we take =2> with ||0.3*log 2>|\>>, so that \\>. There exist constants> and> such that: <\eqnarray*> |>|*-1|)>*log \+2*n*log |)>>>|||*n*.>>>> Therefore, using2>, we have <\equation*> |\|>>\1.5915\\ whenever is sufficiently large. <\theorem> Let\0>, and assume that> admits evaluation-interpolation schemes with =2*d-1> and =\>|)>>, for all1>, and for some constant\0>. Then two power series in> truncated at order can be multiplied with > using at most *|\|>+\>|)>> operations in>, if the sizes of the blocks ,\,\|)>> are chosen as follows: <\itemize-dot> =1>, if 2*\>, =4>, if \d/n\16>, =|d/n|\>>, if d/n>. <\proof> The proof is similar to the one of Theorem. In the present case, for a suitable new constant> the following inequalities hold: <\eqnarray*> |\|>|n>-\|)>|\|>>|>|*,>>||\|>|n>-2*\/2|)>|\|>>|>|*.>>>> The cost of the multiplication is bounded by +T> with =*S>>> and =4**S>>>. There exists a new constant> such that: <\eqnarray*> |n>-\|)>-2*\|2*\>|)>|\|>>|>|*,>>||n>-\|)>-\|\>|)>|\|>>|>|*+\*+1|)>|n>.>>>> The present situation is similar to the one of the proof of Theorem, with/2> instead of>. Therefore by taking =1> for \2*\>, =4> for \\\16> and =|\|\>> for larger >, there exists a constant, such that |\|>\\+\> for all 3> and A*log log d>, or2> andA>. It remains to bound |\|>> for A*log log d> and sufficiently large. There exists a new constant> such that: <\eqnarray*> |\|>-n*log d|\|>>|>|*,>>||\|>-2*n*log d|\|>>|>|*.>>>> Taking\|\|\>>, we thus have a contant> with <\eqnarray*> |>||)>+2*n*log |)>+\*log log d+K*>>||>|+2*n*log +\*log log d+K.>>>> We conclude that |\|>\1+\\\> for sufficiently large. <\bibliography|bib|plain|blockprod-issac> <\bib-list|10> P.Bürgisser, M.Clausen, and M.A. Shokrollahi. . Springer-Verlag, 1997. J.Canny, E.Kaltofen, and Y.Lakshman. Solving systems of non-linear polynomial equations faster. In , pages 121--128, Portland, Oregon, A.C.M., New York, 1989. ACM Press. D.G. Cantor and E.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. S.A. Cook. . PhD thesis, Harvard University, 1966. J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297--301, 1965. S.Czapor, K.Geddes, and G.Labahn. . Kluwer Academic Publishers, 1992. J.H. Davenport, Y.Siret, and É.Tournier. : systèmes et algorithmes de manipulations algébriques.> Masson, Paris, France, 1987. R.Fateman. Comparing the speed of programs for sparse polynomial multiplication. , 37(1):4--15, 2003. L.Fousse, G.Hanrot, V.Lefèvre, P.Pélissier, and P.Zimmermann. MPFR: A multiple-precision binary floating-point library with correct rounding. , 33(2), June 2007. Software available at . M.Gastineau and J.Laskar. Development of TRIP: Fast sparse multivariate polynomial multiplication using burst tries. In , LNCS 3992, pages 446--453. Springer, 2006. J.vonzur Gathen and J.Gerhard. . Cambridge University Press, 2-nd edition, 2003. G.Hanrot and P.Zimmermann. A long note on Mulders' short product. , 37(3):391--401, 2004. J.vander Hoeven. Relax, but don't be too lazy. , 34(6):479--542, 2002. J.vander Hoeven. The truncated Fourier transform and applications. In J.Gutierrez, editor, , pages 290--296, Univ. of Cantabria, Santander, Spain, July 4--7 2004. J.vander Hoeven. Notes on the truncated Fourier transform. Technical Report 2005-5, Université Paris-Sud, Orsay, France, 2005. J.vander Hoeven. Newton's method and FFT trading. , 45(8), 2010. J.vander Hoeven etal. Mathemagix. Available from , 2002. J.vander Hoeven and G.Lecerf. On the bit-complexity of sparse polynomial multiplication. Technical Report , École polytechnique and CNRS, 2010. J.vander Hoeven and É. Schost. Multi-point evaluation in higher dimensions. Technical report, HAL, 2010. . S.C. Johnson. Sparse polynomial arithmetic. , 8(3):63--71, 1974. A.Karatsuba and J.Ofman. Multiplication of multidigit numbers on automata. , 7:595--596, 1963. G.I. Malaschonok and E.S. Satina. Fast multiplication and sparse structures. , 30(2):105--109, 2004. M.Monagan and R.Pearce. Polynomial division using dynamic arrays, heaps, and packed exponent vectors. In , pages 295--315. Springer, 2007. M.Monagan and R.Pearce. Parallel sparse polynomial multiplication using heaps. In , pages 263--270, New York, NY, USA, 2009. ACM. M.Monagan and R.Pearce. Polynomial multiplication and division in Maple 14. , 44(4):205--209, 2010. M.Monagan and R.Pearce. Sparse polynomial pseudo division using a heap. , 46(7):807--822, 2011. A.Schönhage and V.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. D.R. Stoutemyer. Which polynomial representation is best? In , pages 221--243, 1984. A.L. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. , 4(2):714--716, 1963. T.Yan. The geobucket data structure for polynomials. , 25(3):285--293, 1998. \; This appendix contains details on how one can bound the functions |\/2|\>>|)>> and |)>> used in Section. We first prove the bounds in an explicit neighbourhood of infinity and then we rely on certified ball arithmetic for the remaining compact set. <\lemma> For all \8> we have |\/2|\>>|)>\\>. <\proof> Let |)>\\|\/2|\>>|)>> and |~>|)>\\/2>|)>>. We shall first show that |~>|)>\0> for all \0>. Let |^>|)>\-1|)>*|~>|)>*\|)>=\|)>-2*\|)>*-1|)>*-1|)>+2*\|)>>. Since |^>\0>, it suffices to show that |^>|)>\0>. Since |^>|)>=-2*|)>*-1|)>+\|)>|)>*-1|)>+2*\|)>>, we let |\>|)>\2*\|)>*-1|)>+\|)>>, and have to prove that |\>|)>\0>. The latter inequality is correct since \+\> |\>|)>=0>, and |\>|)>=-+1|\*+1|)>>\0> for \8>. On the other hand, for\16> we have <\eqnarray*> |)>-|~>|)>|\|>>|>||\/2|\>-1|)>-log-1|)>|\|)>>>>|||-2*\/|\/2|\>|)>|\|)>>>>||>|+2*-\|)>|)>|\>>>||>|>>> Since|~>\1.39>, we have shown that |)>\\> for \16>. For> between and, we appeal to numeric computations to verify that |)>\\> still holds. <\lemma> For all \0> we have |)>\\>. <\proof> Let \log 3/log 2>, and \2*\|)>-\*log \\2.7365>. Let |~>|)>\\/\>|)>>. We shall first show that |~>|)>\0> for all \100>. Let |^>|)>\\*|~>|)>*\|)>=\*\|)>-2*\|)>*\**log \+\|)>>. Since |^>\0>, it suffices to show that |^>|)>\0>. Since |^>|)>=-2**\|)>+\|)>|)>**log \+\|)>>, we let |\>|)>\\*\|)>+\|)>>, and have to prove that |\>|)>\0>. The latter inequality is correct since \+\> |\>|)>=0>, and |\>|)>=-|\*+1|)>>\0>. Let |-log \|log 2>|\>> and \2>. We introduce |)>\\>|)>>. Since /|)>\\\\/\>, for\2>, we have <\eqnarray*> |)>-|~>|)>|\|>>|>|/\|)>-\|)>|\|)>>>>||>||)>>\0.0062.>>>> Since|~>|)>\1.5853>, we have shown that |)>\\> for \2>. For \\\2>, we symbolically computed >|)>> and straightforwardly lower bounded it by a function |)>> in the domain *\\\\2\*\>. For each value of =2,2,\,2>, we used ball arithmetic to show that |)>\0>. For smaller values of>, we could directly compute that >|)>\0> whenever *\\\\2\*\>. Finally we computed that >*\|)>\\> for all =2,4,8,\,256>, which concludes the proof. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> GaGe2003 Kar63 Cook66 Toom63b CaKa91 CT65 SS71 vdH:tft vdH:tft-note CaKa91 SS71 CzaporGeddesLabahn1992 DavenportSiretTournier1987 Fateman2003 Johnson1974 Stoutemyer1984 GastineauLaskar2006 vdH:sparsemult MonaganPearce2007 MonaganPearce2009 MonaganPearce2009b MonaganPearce2011 Yan1998 MalaschonokSatina2004 Fateman2003 CKL89 vdH:sparsemult vdH:tft vdH:tft-note vdH:mmeval vdH:relax vdH:fnewton HaZi04 BuClSh1997 vdH:fnewton vdH:fnewton mpfr mmx CaKa91 vdH:fnewton <\associate|figure> <\tuple|normal> Illustration of the curves |\>|)>> for various |\>, together with |\|)>>. > <\associate|toc> |math-font-series||Abstract> |.>>>>|> |Keywords |.>>>>|> > |A.M.S. subject classification |.>>>>|> > |math-font-series||1.Introduction> |.>>>>|> |1.1Our contributions |.>>>>|> > |1.2Related works |.>>>>|> > |math-font-series||2.Conventions and notation> |.>>>>|> |2.1Block polynomials |.>>>>|> > |2.2Evaluation-interpolation schemes |.>>>>|> > |math-font-series||3.The basic algorithm> |.>>>>|> |3.1Blockwise multiplication |.>>>>|> > |3.2Heuristic computation of the block size |.>>>>|> > |math-font-series||4.Variations> |.>>>>|> |4.1Implementation issues |.>>>>|> > |4.2Truncated multiplication |.>>>>|> > |4.3Separate treatment of the border |.>>>>|> > |math-font-series||5.Uniform complexity analysis> |.>>>>|> |5.1Polynomial product |.>>>>|> > |5.2Power series product |.>>>>|> > |math-font-series||6.References> |.>>>>|> |math-font-series||AppendixTechnical Lemmas> |.>>>>|>