<\body> and Applications>>>||<\author-address> de Mathématiques ( 425) Université Paris-Sud 91405 Orsay Cedex France Email: >||-multiplication, multivariate polynomials, multivariate power series.>|> <\abstract> In this paper, we present a truncated version of the classical Fast Fourier Transform. When applied to polynomial multiplication, this algorithm has the nice property of eliminating the ``jumps'' in the complexity at powers of two. When applied to the multiplication of multivariate polynomials or truncated multivariate power series, we gain a logarithmic factor with respect to the best previously known algorithms. >>>>>>>>>>>> Let \1/2> be an effective ring of constants ( the usual arithmetic operations >, > and >> can be carried out by algorithm). If > has a primitive -th root of unity with >, then the product of two polynomials \[X]> with n> can be computed in time using the Fast Fourier Transform or . If > does not admit a primitive -th root of unity, then one needs an additional overhead of in order to carry out the multiplication, by artificially adding new root of unity . Besides the fact that the asymptotic complexity of the involves a large constant factor, another classical drawback is that the complexity function admits important jumps at each power of two. These jumps can be reduced by using )>-th roots of unity for small . They can also be smoothened by decomposing )\(n+\)>-multiplications as n>-, \>- and )\\>-multiplications. However, these tricks are not very elegant, cumbersome to implement, and they do not allow to completely eliminate the jump problem. In section , we present a new kind of ``Truncated Fourier Transform'' or , which allows for the fast evaluation of a polynomial \[X]> in any number of well-chosen roots of unity. This algorithm coincides with the usual if is a power of two, but it behaves smoothly for intermediate values. In section , we also show that the inverse operation of interpolation can be carried out with the same complexity (modulo a few additional shifts). The permits to speed up the multiplication of univariate polynomials with a constant factor between and . In the case of multivariate polynomials, the repeated gain of such a constant factor leads to the gain of a non-trivial asymptotic factor. More precisely, assuming that > admits sufficiently >-th roots of unity, we will show in section that the product of two multivariate polynomials \[z,\,z]> can be computed in time , where > and . The best previously known algorithm , based on sparse polynomial multiplication, has time complexity s)>. In section we finally give an algorithm for the multiplication of truncated multivariate power series. This algorithm, which has time complexity s)>, again improves the best previously known algorithm by a factor of . Moreover, both in the cases of multivariate polynomials and power series, we expect the corresponding constant factor to be better. Let > be an effective ring of constants, > with \> and \\> a primitive -th root of unity ( =\1>). The discrete Fast Fourier Transform () of an -tuple ,\,a)\\> (with respect to >) is the -tuple ,\,)=FFT>(a)\\> with <\equation*> =a*\. In other words, =A(\)>, where \[X]> denotes the polynomial +a*X+\+a*X>. The can be computed efficiently using binary splitting: writing <\equation*> (a,\,a)=(b,c,\,b,c), we recursively compute the Fourier transforms of ,\,b)> and ,\,c)> <\eqnarray*> >(b,\,b)>||,\,);>>|>(c,\,c)>||,\,).>>>> Then we have <\eqnarray*> >(a,\,a)>||+,\,+*\>>|||-,\,-*\).>>>> This algorithm requires n> multiplications with powers of > and additions (or subtractions). In practice, it is most efficient to implement an in-place variant of the above algorithm. We will denote by > the bitwise mirror of at length (for instance, =24> and =26>). At step , we start with the vector <\equation*> x=(x,\,x)=(a,\,a). At step {1,\,p}>, we set <\equation> +j>>>|+j>>>>>>=|*m>>>||\*m>>>>>>+j>>>|+j>>>>>>. for all {0,2,\,n/m-2}> and {0,\,m-1}>, where =2>. Using induction over , it can easily be seen that <\equation*> x+j>=(FFT>>(a,a+j>,\,a+j>))>, for all {0,\,n/m-1}> and {0,\,m-1}>. In particular, <\eqnarray*> >||>>>|>||>>>>> for all {0,\,n-1}>. This algorithm of ``repeated crossings'' is illustrated in figure . <\big-figure|> Schematic representation of a Fast Fourier Transform for . The black dots correspond to the >, the upper row being ,\,x)=(a,\,a)> and the lower row ,\,x)=(,,,,\,)>. A classical application of the is the multiplication of polynomials +\+a*X> and +\+b*X>. Assuming that n>, we first evaluate and in ,\,\> using the : <\eqnarray*> ,A(\))>||>(a,\,a)>>|,B(\))>||>(b,\,b)>>>> We next compute the evaluations ,A(\)*B(\)))> of at ,\>. We finally have to recover from these values using the inverse . But the inverse with respect to > is nothing else as times the direct with respect to >. Indeed, for all ,\,a)\\> and all {0,\,n-1}>, we have <\equation> FFT1>>(FFT>(a))=a*\=n*a, since <\equation*> \=0 whenever k>. This yields a multiplication algorithm of time complexity in [X]>, when assuming that > admits enough primitive >-th roots of unity. In the case that > does not, then new roots of unity can be added artificially so as to yield an algorithm of time complexity . The algorithm from the previous section has the disadvantage that needs to be a power of two. If we want to multiply two polynomials \[X]> such that >, then we need to carry out the at precision , thereby losing a factor of . This factor can be reduced using several tricks. For instance, one may decompose the )\(n+\)>-product into an n> product, an \>-product and an )\\>-product. This is efficient for small >, but not very good if \n/2>. In the latter case, one may also use an at precision , by using 3>-matrices at one step of the computation. However, all these tricks of the trade require a large amount of hacking and one always continues to lose a non-trivial factor between and . The idea behind the Truncated Fourier Transform is to provide an efficient algorithm for the evaluation of polynomials in any number of distinct points. Moreover, the inverse operation of interpolation can be carried out with the same complexity (modulo a few additional shifts). This technique will eliminate the ``jumps'' in the complexity of multiplication. So let >, n> (usually, n/2>) and let > be a primitive -th root of unity. Given an -tuple ,\,a)>, we will evaluate the corresponding polynomial +\+a*X> in >,\>,\,\>>. We call >),\,A(\>))> the () of ,\,a)>. Now consider the completion of the -tuple ,\,a)> into an -tuple ,\,a,0,\,0)>. When using the in-place algorithm from the previous section in order to compute >),\,A(\>))>, we claim that many of the computations of the > can actually be skipped (see figure ). Indeed, at stage , it suffices to compute the vector ,\,x(l-1)/m\+1)*m-1>)>. Besides ,\,x>, we therefore compute at most =2> additional values. In total, we therefore compute at most +2+\+1\p*l+n> values >. This proves the following result: <\theorem> Let >, n> and let \\> be a primitive -th root of unity in >. Then the Truncated Fourier Transform of an -tuple ,\,a)> > can be computed using at most additions (or subtractions) and (l*p+n)/2\> multiplications with powers of >. \; <\big-figure|> Schematic representation of a for and . <\remark> Assume that > admits a privileged primitive \ -th root of unity > for every 2>>, such that =\> for all . Then the ,\,)> of an -tuple ,\,a)> > with l> does not depend on the choice of . We call ,\,)> the of ,\,a)> the privileged sequence ,\,\,\)> of roots of unity. <\remark> Since the only operations we need for computing the are additions, subtractions and multiplications by powers of >, the algorithm naturally combines with Schönhage-Strassen's algorithm when > is a symbolically added root of unity. <\remark> If =\=f-1>>, then the of ,\,f)> can be computed using )*p+2*n)> ring operations using a similar algorithm as above. More generally, this allows the rapid transformation of ``unions of segments''. <\section> Inverting the Truncated Fourier Transform Unfortunately, the inverse cannot be computed using a similar formula as (). Indeed, starting with the >, we need to compute an increasing number of > when decreases. Therefore we will rather invert the algorithm which computes the , but with this difference that we will sometimes need ,i>> with \s> in order to compute >. We will use the fact that whenever one value among +j>>, +j>> and one value among +j>>, +j>> are known in the cross relation (), then we can deduce the others from them using one multiplication by a power of > and two ``shifted'' additions or subtractions ( the results may have to be divided by ). More precisely, let us denote =\(l-1)/m\*m> and =k+m> at each stage . We use a recursive algorithm which takes the values >,\,x> and ,\,x>> on input, and which computes >,\,x>. If , then we have nothing to do. Otherwise, we distinguish two cases: <\itemize> If =l>, then we first compute >,\,x-1>> from >,\,x-1>> using repeated crossings. We next deduce > and /2>> from > and /2>> for all {l-m/2,\,k-1}>. Invoking our algorithm recursively, we now obtain >,\,x>. We finally compute > and /2>> from > and /2>> for {k,\,l-m/2-1}>. If \l>, then we first compute > from > and /2>> for {l,\,l-1}>. Invoking our algorithm recursively, we next compute >,\,x>. We finally deduce > from > and /2>> for all {k,\,l-1}>. The two cases are illustrated in figures . Since =\=x=0>, the application of our algorithm for computes the inverse . We notice that the values > with l> are computed in decreasing order (for ) and the values > with l> in increasing order. In other words, the algorithm may be designed in such a way to remain in place. We have proved: <\theorem> Let >, n> and let \\> be a primitive -th root of unity in >. Then the -tuple ,\,a)> can be recovered from its Truncated Fourier Transform > using at most shifted additions (or subtractions) and (l*p+n)/2\> multiplications with powers of >. <\big-figure|>||>>|||>|>||>>|||>|>||>>>>> Schematic representation of the recursive computation of the inverse for , . The different images show the progression of the known values > (the black dots) during the different computations at stage . Since =l=16>, we fall into the first case of our algorithm and the recursive invocation of the algorithm is done between the third and the fourth image. \; <\big-figure|>||>>|||>|>||>>>>>> Schematic representation of the recursive computation of the inverse for , at stage . Since =16> and =12>, we now fall into the second case of our algorithm and the recursive invocation of the algorithm is done between the third and the fourth image. <\remark> Besides shifted additions, subtractions or multiplications by powers of >, the algorithm essentially computes inverse FFT-transforms of sizes >,\,2>> with >+\+2>>. Using (), it is therefore possible to replace all but shifted additions and subtractions by normal additions and subtractions. Let > be a ring with a privileged sequence ,\,\,\)> of roots of unity (see remark ). Given a non-zero multivariate polynomial <\equation*> f=,\,i>f,\,i>*z>*\*z>\\[z,\,z], in 1> variables, we define the of by <\equation*> deg f=max {i+\+i:f,\,i>\0}\\ We let 1>. Now let \[z,\,z]> be such that r>. In this section we present an algorithm to compute , which has a good complexity in terms of the number <\equation*> s= of expected coefficients of . When computing using the classical with respect to each of the variables ,\,z>, we need a time *log r))> which is much bigger than , in general. When using multiplication of sparse polynomials , we need a time s)> with a non-trivial constant factor. Our algorithm is based on the all variables and we will show that it has a complexity . Given \[z,\,z]> with r>, the of with respect to one variable > at order is defined by <\equation*> TFT(f)=f,\,i,\,i,\,i>(z=\]>)+\+i\r>, where \r>. We recall that the result does not depend on the choice of . The with respect to all variables ,\,z> at order is defined by <\equation*> TFT(f)=f(\]>,\,\]>)+\+i\r>, where \r> (see figure ). We have <\equation*> TFT(f)=TFT(\TFT(f)\). Given \[z,\,z]> with r>, we will use the formula <\equation*> f*g=TFT(TFT(f)*TFT(g)) in order to compute the product . <\big-figure|(\>,\>)>>||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>|||||>|>|>||||>|>|>|>|||>|>|>|>|>||>|>|>|>|>|>|>|>|>|>|>|>|>>>>>\||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||)>|||||>|)>|>||||>|)>|>|>|||>|)>|>|>|>||>|)>|>|>|>|>|>||,1)>|,1)>|,1)>|,1)>|,1)>>>>>>>>> Illustration of the in two variables (=\>). In order to compute (f)>, say for , we compute the of ,\,i>,\,f,\,i>)> with -\-i> for all ,\,i> with +\+i\r-1> (if +\+i=r-1>, then the of ,\,i>)> is given by itself, so we have nothing to do). One such computation takes a time >C*l*log l> for some universal constant , by using the > with minimal \l> (so may vary as a function of ,\,i>, but not ). The computation of (f)> therefore takes a time > with <\equation*> T\C*d**l*log l. Dividing by , we obtain <\eqnarray*> |s>>|>|*(d-1)***l*log l>>||>|***>>>> If d>, then the summand rapidly deceases when 2>, so that <\equation*> |s>=O(d*>)=O(r)=O(log s). Consequently, =O(s*log s)> and even =O(s)> for fixed . If d>, then for *r> and *r>, Stirling's formula yields <\equation*> log *=\\*\*r+\. It follows that only the first )> terms in () contribute to the asymptotic behaviour of |s>>, so that <\equation*> T=O(d*>*)|\*r>)=O(d*log (r/d))=O(log s). Again, we find that =O(s*log s)>. We have proved: <\theorem> Let > be a ring with a privileged sequence ,\,\,\)> of roots of unity. Let \[z,\,z]> be polynomials with r> and let >. Then the product can be computed using ring operations in >. Since power series have infinitely many terms, implementing an operation on power series really corresponds to implementing the operation for polynomial approximations at all degrees. As usual, multiplication is a particularly important operation. Given \[z,\,z]> with r> and r>, we will show how to compute the truncated product ,\,i\d>(f*g),\,i>*z>*\*z>\\[z,\,z]> of and . The first idea is to use homogeneous coordinates instead of the usual ones: <\eqnarray*> (z,z,\,z)>||,z*z,\,z*z)>>|(z,z,\,z)>||,z*z,\,z*z).>>>> This transformation takes no time since it corresponds to some re-indexing. We next compute the s > and > in ,\,z> at order : <\eqnarray*> >||(\TFT()\)>>|>||(\TFT()\).>>>> We next compute the => truncated products ,i,\,i>(z)> of the obtained polynomials ,i,\,i>(z)> and ,i,\,i>(z)>. After transforming the results of these multiplication back using <\equation*> =TFT1>(\TFT1>()\), we obtain the truncated product of and by <\equation*> h(z,z,\,z)=(z,z/z,\,z/z). The total computation time is bounded by *log s+r*s*log r)>. Using the fact that =O(s*log s)>, we have proved the following theorem: <\theorem> Let > be a ring with a privileged sequence ,\,\,\)> of roots of unity. Let \[z,\,z]> be polynomials of degrees >r> and let >. Then the truncated product of and at degree >r> can be computed using s)> ring operations in >. <\remark> In practice, if the coefficients ,\,i>> have different growths in ,\,i>, then it may be useful to consider truncations along more general degrees of the form <\equation*> deg> f=max {\*i+\+\*i:f,\,i>\0}. The ``slicing technique'' from section 6.3.5 in may then be used in order to obtain complexity bounds of the same type. <\remark> Using remark , the polynomial and truncated multiplication algorithms can be used in combination with the strategy of relaxed evaluation for solving partial differential equations in multivariate power series with an additional overhead of . A recent technique allows to reduce this overhead even further and it would be interesting to study more precisely what happens in the multivariate case. The author would like to thank the first referee for his enthusiastic and helpful comments. This referee also implemented the algorithms from sections and and he reports a behaviour which is close to the expected one. In response to some other comments and suggestions, we conclude with the following remarks: <\itemize> The results of the paper may be generalized to characteristic 2 and general rings > along similar lines as in . The crucial remark is that, if =1> and <\equation*> >>|>>|>>>>>=||>|||>>||>|>>>>*>>|>>|>>>>>, then, for all \{0,1}>, we may compute >,b>,c>)> in terms of >,b>,c>)> by using only additions, subtractions, multiplications by and divisions by . Theorem 1 in implies theorem with replaced by s)>. The technique from is actually more general: let \[z,\,z]> and assume that we know <\equation*> \=supp f*supp g={(i+j,\,i+j):f,\,i>\0\g,\,j>}. If and are not ``extraordinarily sparse'', then may be computed in time )*log (#\))>. It would be interesting to prove something similar in our context, so as to examine to which extent we need the density hypothesis. Using remark in a recursive way, we expect that there exists an algorithm of complexity *(log #\+#\ \))>, for a suitable definition of \>. The terminology of privileged sequences may seem to be an overkill. Indeed, in practice, we rather need a sufficiently large root of unity in order to carry out a given computation. Nevertheless, from a theoretical point of view, this paper suggests that it may be interesting to study ``fractal FFT-transforms'' of power series with convergence radius >1> with respect to a privileged sequence ,\,\,\)>. Two referees pointed us to the recent on-line paper which also contains the idea of evaluating in powers of > in order to multiply polynomials with f*g=l\n=2>. <\bibliography|bib|alpha|i04> <\bib-list|[99]> D.Bernstein. Fast multiplication and its applications. Available from . See section 4, page 11. D.G. Cantor and E.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. J.Canny, E.Kaltofen, and Y.Lakshman. Solving systems of non-linear polynomial equations faster. In , pages 121--128, Portland, Oregon, A.C.M., New York, 1989. ACM Press. J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297--301, 1965. Guillaume Hanrot, Michel Quercia, and Paul Zimmermann. Speeding up the division and square root of power series. Research Report 3973, INRIA, July 2000. Available from . Guillaume Hanrot, Michel Quercia, and Paul Zimmermann. The middle product algorithm I. speeding up the division and square root of power series. Accepted for publication in AAECC, 2002. Guillaume Hanrot and Paul Zimmermann. A long note on Mulders' short product. , 37(3):391--401, 2004. G.Lecerf and É. Schost. Fast multivariate power series multiplication in characteristic zero. , 5(1):1--10, September 2003. T.Mulders. On short multiplication and division. , 11(1):69--88, 2000. VictorY. Pan. Simple multivariate polynomial multiplication. , 18(3):183--186, 1994. A.Schönhage and V.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. J.vander Hoeven. Lazy multiplication of formal power series. In W.W. Küchlin, editor, , pages 17--20, Maui, Hawaii, July 1997. J.vander Hoeven. Relax, but don't be too lazy. , 34:479--542, 2002. J.vander Hoeven. New algorithms for relaxed multiplication. Technical Report 2003-44, Université Paris-Sud, Orsay, France, 2003. J.vander Hoeven. Relaxed multiplication using the middle product. In Manuel Bronstein, editor, , pages 143--147, Philadelphia, USA, August 2003. <\initial> <\collection> <\references> <\collection> > > > > > > > |\>|?>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> Mul00 HaZi04 HaQuZi00 HaQuZi02 Pan94 CT65 SS71 CK91 CKL89 LeSc03 SS71 CK91 vdH:relax CKL89 LeSc03 vdH:relax vdH:issac97 vdH:relax vdH:issac03 vdH:newrelax CK91 CKL89 CKL89 Bern:FMA <\associate|figure> <\tuple|normal> Schematic representation of a Fast Fourier Transform for |n=16>. The black dots correspond to the |x>, the upper row being |(x,\,x)=(a,\,a)> and the lower row |(x,\,x)=(,,,,\,)>. > <\tuple|normal> Schematic representation of a |TFT> for |n=16> and |l=11>. > <\tuple|normal> Schematic representation of the recursive computation of the inverse |TFT> for |n=16>, |l=11>. The different images show the progression of the known values |x> (the black dots) during the different computations at stage |s=0>. Since |l=l=16>, we fall into the first case of our algorithm and the recursive invocation of the algorithm is done between the third and the fourth image. > <\tuple|normal> Schematic representation of the recursive computation of the inverse |TFT> for |n=16>, |l=11> at stage |s=1>. Since |l=16> and |l=12>, we now fall into the second case of our algorithm and the recursive invocation of the algorithm is done between the third and the fourth image. > <\tuple|normal> Illustration of the |TFT> in two variables (|\=\>). > <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |math-font-series||font-shape||2.The Fast Fourier Transform> |.>>>>|> |math-font-series||font-shape||3.The Truncated Fourier Transform> |.>>>>|> |math-font-series||font-shape||4.Inverting the Truncated Fourier Transform> |.>>>>|> |math-font-series||font-shape||5.Multiplying multivariate polynomials> |.>>>>|> |math-font-series||font-shape||6.Multiplying multivariate power series> |.>>>>|> |math-font-series||font-shape||7.Final notes> |.>>>>|> |math-font-series||font-shape||Bibliography> |.>>>>|>