<\body> ||<\author-address> de Mathématiques ( 425) CNRS, Université Paris-Sud 91405 Orsay Cedex France Email: >|>||> <\abstract> In previous work, we have introduced the technique of relaxed power series computations. With this technique, it is possible to solve implicit equations almost as quickly as doing the operations which occur in the implicit equation. Here ``almost as quickly'' means that we need to pay a logarithmic overhead. In this paper, we will show how to reduce this logarithmic factor in the case when the constant ring has sufficiently many >-th roots of unity. >>>>>>>>>>>> Let \{}> be an effective ring and consider two power series +f*z+\> and +g*z+\> in [[z]]>. In this paper we will be concerned with the efficient computation of the first coefficients of the product +h*z+\>. If the first coefficients of and are known beforehand, then we may use any fast multiplication for polynomials in order to achieve this goal, such as divide and conquer multiplication , which has a time complexity )>, or multiplication , which has a time complexity n*log log n)>. For certain computations, and most importantly the resolution of implicit equations, it is interesting to use so called ``relaxed algorithms'' which output the first coefficients of as soon as the first coefficients of and are known for each

n>. This allows for instance the computation of the exponential of a series with =0> using the formula <\equation> g=f*g. More precisely, this formula shows that the computation of reduces to one differentiation, one relaxed product and one relaxed integration. Differentiation and relaxed integration being linear in time, it follows that terms of can be computed in time , where denotes the time complexity of relaxed multiplication. In , we proved the following theorem: <\theorem> There exists a relaxed multiplication algorithm of time complexity <\equation*> R(n)=O(M(n)*log n) and space complexity . In this paper, we will improve the time complexity bound in this theorem in the case when > admits >-th roots of unity for any \>. In section , we first reduce this problem to the case of ``semi-relaxed multiplication'', when one of the arguments is fixed and the other one relaxed. More precisely, let and be power series, such that is known up to order . Then a semi-relaxed multiplication algorithm computes the product up to order and outputs > as soon as ,\,f> are known, for all n>. In section , we show that the overhead in theorem can be reduced to )>. In section

, the technique of section is further improved so as to yield an >)> overhead. In the sequel, we will use the following notations from : we denote by [[z]]\[z]\[[z]]> the set of truncated power series of order , like +\+f*z>. Given [[z]]> and i\j\n>, we will denote j>=f+\+f*z\[[z]]>. <\remark> An preprint of the present paper was published a few years ago. The current version includes a new section

with implementation details, benchmarks and afew notes on how to apply similar ideas in the Karatsuba and Toom-Cook models. Another algorithm for semi-relaxed multiplication, based on the middle product , was also published before . <\remark> The exotic form >)> of the new complexity for relaxed multiplication might surprise the reader. It should be noticed that the time complexity of Toom-Cook's algorithm for polynomial multiplication has a similar complexity >)> . Indeed, whereas our algorithm from section has a Karatsuba-like flavour, the algorithm from section

uses a generalized subdivision which is similar to the one used by Toom and Cook. An interesting question is whether even better time complexities can be obtained (in analogy with FFT-multiplication). However, we have not managed so far to reduce the cost of relaxed multiplication to or . Nevertheless, it should be noticed that the function >> grows very slowly; in practice, it very much behaves like as a constant (see section

). <\remark> The reader may wonder whether further improvements in the complexity of relaxed multiplication are really useful, since the algorithms from are already optimal up to a factor . In fact, we expect fast algorithms for formal power series to be one of the building bricks for effective analysis . Therefore, even small improvements in the complexity of relaxed multiplication should lead to global speed-ups for this kind of software.

In , we have stated several fast algorithms for relaxed multiplication. Let us briefly recall some of the main concepts and ideas. For details, we refer to. Throughout this section, and are two power series in [[z]]>. <\definition> We call <\equation> P=fn>*gn> the full product of and at order . <\definition> We call <\equation> P=n>(f*g)*z

the truncated product of and at order . <\definition> A full (or truncated) zealous multiplication algorithm of and at order takes ,\,f> and ,\,g> on input and computes as in () ( ()). <\definition> A full (or truncated) relaxed multiplication algorithm of and at order successively takes the pairs ,g),\,(f,g)> on input and successively computes ,\,P> (resp. ,\,P>). Here it is understood that > is output as soon as ,g),\,(f,g)> are known. <\definition> A full (or truncated) semi-relaxed multiplication algorithm of and takes ,\,g> and the successive values ,\,f> on input and successively computes ,\,P> (resp. ,\,P>). Here it is understood that > is output as soon as ,\,f> are known. We will denote by , and the time complexities of full zealous, relaxed and semi-relaxed multiplication at order , where it is understood that the ring operations in > can be performed in time . We notice that full zealous multiplication is equivalent to polynomial multiplication. Hence, classical fast multiplication algorithms can be applied in this case . The main idea behind efficient algorithms for relaxed multiplication is to anticipate on future computations. More precisely, the computation of a full product () can be represented by an n> square with entries *g>, i,j\n>. As soon as ,\,f> and ,\,g> are known, it becomes possible to compute the contributions of the products *g> with j,k\i> to , even though the contributions of *g> with i> are not yet needed. The next idea is to subdivide the n> square into smaller squares, in such a way that the contribution of each small square to can be computed using a zealous algorithm. Now the contribution of such a small square is of the form \i>*g\j>*z+j>>. Therefore, the requirement +j\max(i,j)> suffices to ensure that the resulting algorithm will be relaxed. In the left hand image of figure , we have shown the subdivision from the main algorithm of , which has time complexity . <\big-figure|

> Illustration of the facts that (1) a full relaxed 2*n> multiplication reduces to one full relaxed n> multiplication, two semi-relaxed n> multiplication and one zealous n> multiplication (2) a semi-relaxed 2*n> multiplication reduces to two semi-relaxed n> multiplications and two zealous n> multiplications. There is an alternative interpretation of the left hand image in figure : when interpreting the big square as a 2*n> multiplication <\equation*> P=f2*n>*g2*n>, we may regard it as the sum <\equation*> P=P+P*z+P*z+P*z of four n> multiplications <\eqnarray*> >||n>*gn>>>|>||n>*g2*n>>>|>||2*n>*gn>>>|>||2*n>*g2*n>.>>>> Now > is a relaxed multiplication at order , but > is even semi-relaxed, since ,\,g> are already known by the time that we need )>. Similarly, > corresponds to a semi-relaxed product and > to a zealous product. This shows that <\equation*> R(2*n)\R(n)+2*Q(n)+M(n). Similarly, we have <\equation*> Q(2*n)\2*Q(n)+2*M(n), as illustrated in the right-hand image of figure . Under suitable regularity hypotheses for and , the above relations imply: <\theorem> <\enumerate-alpha> If > is increasing, then . If > is increasing, then . A consequence of part () of the theorem is that it suffices to design fast algorithms for semi-relaxed multiplication in order to obtain fast algorithms for relaxed multiplication. This fact may be reinterpreted by observing that the fast relaxed multiplication algorithm actually applies Newton's method in a hidden way. Indeed, since Brent and Kung , it is well known that Newton's method can also be used in the context of formal power series in order to solve differential or functional equations. One step of Newton's method at order involves the recursive application of the method at order n/2\> and the resolution of a linear equation at order n/2\>. The resolution of the linear equation corresponds to the computation of the two semi-relaxed products.

Assume from now on that > admits an -th root of unity > for every power of two 2>>. Given an element [[z]]>, let (f)\> denote its Fourier transform <\equation*> FFT(f)=(f(1),f(\),\,f(\

)) and let

:\(n)> be the inverse mapping of >. It is well known that both > and

> can be computed in time . Furthermore, if [[z]]> are such that [[z]]>, then <\equation*> f*g=FFT

(FFT(f)*FFT(g)), where the product in > is scalar multiplication ,\,a)*(b,\,b)=(a*b,\,a*b)>. Now consider a decomposition *n> with =2>> and =2>>. Then a truncated power series