> <\body> |<\doc-note> This work has been partly supported by the French ANR-09-JCJC-0098-01 project, and by the 2009-36HD grant of the Région Ile-de-France. ||<\author-address> de Mathématiques ( 425) CNRS, Université Paris-Sud 91405 Orsay Cedex France Email: >|>||> <\abstract> In previous work, we have introduced several fast algorithms for relaxed power series multiplication (also known under the name on-line multiplication) up till agiven order. The fastest currently known algorithm works over an effective base field> with sufficiently many >-th roots of unity and has algebraic time complexity >|)>>. In this note, we will generalize this algorithm to the cases when > is replaced by an effective ring of positive characteristic or by an effective ring of characteristic zero, which is also torsion-free as a >-module and comes with an additional algorithm for partial division by integers. We will also present an asymptotically faster algorithm for relaxed multiplication of -adic numbers. >>>>>>>>>> Let > be an effective (possibly non commutative) ring. That is, we assume data structures for representing the elements of > and algorithms for performing the ring operations , and >. The aim of algebraic complexity theory is to study the cost of basic or more complex algebraic operations over> (such as the multiplication of polynomials or matrices) in terms of the number of operations in >. The algebraic complexity usually does not coincide with the bit complexity, which also takes into account the potential growth of the actual coefficients in >. Nevertheless, understanding the algebraic complexity usually constitutes a first useful step towards understanding the bit complexity. Of course, in the special case when > is a finite field, both complexities coincide up to a constant factor. One of the most central operations is polynomial multiplication. We will denote by >> the number of operations required to multiply two polynomials of degrees n> in >. If > admits primitive >-th roots of unity for all , then we have >=\> using FFT multiplication, which is based on the fast Fourier transform. In general, it has been shown that >=\>. The complexities of most other operations (division, Taylor shift, extended , multipoint evaluation, interpolation, ) can be expressed in terms of>>. Often, the cost of such other operations is simply |~>>|)>=|~>>, where |~>|)>> stands for *>|)>>; see for some classical results along these lines. The complexity of polynomial multiplication is fundamental for studying the cost of operations on formal power series in |]>> up to a given truncation order . Clearly, it is possible to perform the multiplication up to order in time >|)>>: it suffices to multiply the truncated power series at order and truncate the result. Using Newton's method, and assuming that \\>, it is also possible to compute , , up to order in time>|)>>. More generally, it has been shown in that the power series solutions of algebraic differential equations with coefficients in |]>> can be computed up to order in time >|)>>. However, in this case, the ``>'' hides a non trivial constant factor which depends on the size of the equation that one wants to solve. The approach for computations with formal power series makes it possible to solve equations in quasi-optimal time with respect to the sparse size of the equations. The idea is to consider power series \|]>> as streams of coefficients ,f,\> and to require that all operations are performed ``without delay'' on these streams. For instance, when multiplying two power series \|]>>, we require that > is computed ,g,\,f,g> are known. Any algorithm which has this property will be called a or algorithm for multiplication. Given a relaxed algorithm for multiplication, it is possible to let the later coefficients ,g,f,g,\> of the input on the known coefficients ,\,h> of the output. For instance, given a power series \> with =0>, we may compute using the formula <\eqnarray*> ||f*g.>>>> Indeed, extraction of the coefficient of > in and f*g> yields <\eqnarray*> >||**g|)>,>>>> and *g|)>> only depends on ,\,g>. More generally, we define an equation of the form <\eqnarray*> ||>>>> to be , if > only depends on ,\,f>. Replacing > by >, we notice that the same terminology applies to systems of equations. In the case of an implicit equation, special rewriting techniques can be implied in order transform the input equation into arecursive equation. Let >> denote the cost of performing a relaxed multiplication up to order . If> is an expression which involves multiplications and other ``linear time'' operations (additions, integrations, etc.), then it follows that() can be solved up to order in time >+t*n|)>>. If we had >=\>|)>>, then this would yield an optimal algorithm for solving() in the sense that the computation of the solution would essentially require the same time as its verification. The naive |)>> algorithm for computing , based on the formula <\eqnarray*> >||*g+f*g+\+f*g,>>>> is clearly relaxed. Unfortunately, FFT multiplication is relaxed, since ,\,h> are computed simultaneously as a function of *g,\,f*g>, in this case. In it was remarked that Karatsuba's |)>> algorithm for multiplying polynomials can be rewritten in a relaxed manner. For small , Karatsuba multiplication and relaxed multiplication thus require the same number of operations. In, an additional fast relaxed algorithm was presented with time complexity <\eqnarray*> >>||>*log n|)>.>>>> We were recently made aware of the fact that a similar algorithm was first published in. However, this early paper was presented in a different context of on-line (relaxed) multiplication of integers (instead of power series), and without the application to the resolution of recursive equations (which is quite crucial from our perspective). An interesting question remained: can the bound() be lowered further, be it by aconstant factor? In, it was first noticed that an approximate factor of two can be gained if one of the multiplicands is known beforehand. For instance, if we want to compute for a known series with =0>, then the coefficients of > are already known in the product *g> in(), so only one of the inputs depends on the output. An algorithm for the computation of is said to be , if > is written to the output as soon as ,\,f> are known, but all coefficients of are known beforehand. We will denote by >> the complexity of semi-relaxed multiplication. We recall from (see also section) that relaxed multiplication reduces to semi-relaxed multiplication <\eqnarray*> >>||>|)>.>>>> The first reduction of() by a non constant factor was published in, and uses the technique of (which has also been used for the multiplication of multivariate polynomials and power series in6.3>, and for speeding up Newton iterations in). Under the assumption that > admits primitive >-th roots of unity for all (or at least for all with \n>), we showed that <\eqnarray*> >>||>|)>.>>>> Since this complexity will play a central role in the remainder of this paper, it will be convenient to abbreviate <\eqnarray*> >>||>.>>>> The function >> has slower growth than any strictly positive power of . It will also be convenient to write =\>|)>> whenever =\*>|)>> for all \0>. In particular, it follows that <\eqnarray*> >>||>.>>>> In section, we will recall the main ideas from which lead to the complexity bound(). We recall that the characteristic of a ring > is the integer \> such that the canonical ring homomorphism \\> has kernel /k*\>. If > is torsion-free as a >-module, then we will say that > admits an effective partial division by integers if there exists an algorithm which takes \>> and k*\> on input and which returns the unique \> with on output. The main result of this paper is: <\theorem> Assume that one of the following two holds: <\itemize> > is an effective ring of characteristic zero, which is torsion-free as a >-module, and which admits an effective partial division by integers. > is an effective ring of positive characteristic. Then we have <\eqnarray*> >>||>.>>>> We notice that the theorem holds in particular if > is an effective field. In Section, we will also consider the relaxed multiplication of -adic numbers, with \> and 2>. If we denote by > the bit complexity of multiplying two -bit integers, then essentially provided an algorithm of bit complexity*log n|)>> for the relaxed multiplication of adic numbers at order |)>>. Various algorithms and benchmarks for more general were presented in. It is also well known that =\>>. Let > denote the bit complexity of the relaxed multiplication of two -adic numbers modulo >. In Section, we will prove the following new result: <\theorem> Let \> with 2>. Then we have <\eqnarray*> >||>.>>>> The main idea which allows for the present generalizations is quite straightforward. In our original algorithm from, the presence of sufficiently many primitive >-th roots of unity in > gives rise to an quasi-optimal evaluation-interpolation strategy for the multiplication of polynomials. More precisely, given two polynomials of degrees n>, their FFT-multiplication only requires > evaluation and interpolation points, and both the evaluation and interpolation steps can be performed fast, using only > operations. Now it has recently been shown that quasi-optimal evaluation-interpolation strategies still exist if we evaluate and interpolate at points in geometric progressions instead of roots of unity. This result is the key to our new complexity bounds, although further technical details have to be dealt with in order to make things work for various types of effective rings>. We also notice that the main novelty of concerns the interpolation step. Fast evaluation at geometric progressions was possible before using the so called . For effective rings of small characteristic >, this would actually have been sufficient for the proving the bound(). Our paper is structured as follows. Since the algorithms of were presented in the case when > is an effective field, Section starts with their generalization to more general effective rings >. These generalizations are purely formal and contain no essentially new ideas. In Section, we give a short survey of the algorithm from, but we recommentd the reader to download the original paper from our webpage for full technical details. In Section, we prove Theorem in the case when > has characteristic zero. In Section, we turn our attention to the case when > has prime characteristic. If the characteristic is sufficiently large, then we may find sufficiently large geometric progressions in \\> in order to generalize the results from Section. Otherwise, we have to work over \\>> for some sufficiently large . In Section, we complete the prove of Theorem; the case when is a prime power is a refinement of the result from Section. The remaining case is done Chinese remaindering. In Section, we will prove Theorem. Let > be an effective integral domain and let > be its quotient field. Assume that> has characteristic zero. Let > be an effective torsion-free >-module and =\\\>. Elements of > and > are fractions with \> ( \>) and \>>, and the operations > on such fractions are as usual: <\eqnarray*> >||>>|+>||>>|*>||>>>> For \>>, we also have =s/x>. It follows that > and > are effective fields and vector spaces. Moreover, all field operations in > (and all vector space operations in>) can be performed using only > operations in > ( > or >). We will say that > admits an effective partial division, if for every \>> and s*\>, we can compute the unique \> with . From now on, we will assume that this is the case, and we will count any division of the above kind as one operation in >. Given \>, we define <\eqnarray*> >||\:deg P\n|}>.>>>> Given \> and \>, we will denote by >> the number of operations in> and> which are needed in order to compute the product \>. <\lemma> Let > be an effective and commutative integral domain and > an effective torsion-free >-module with an effective partial division. There exists a constant , such that the following holds: for any 0>, \> and \>> such that ,q> are pairwise distinct, we have: <\enumerate-alpha> We may compute ,\,P|)>> from using >> operations in > and >. We may reconstruct from ,\,P|)>> using >> operations in > and>. <\proof> In the case when =\> is a field of characteristic zero and =\=\>, this result was first proven in. More precisely, the conversions can be done using the algorithms , , and in that paper. Examining these algorithms, we observe that general elements in > are only multiplied with elements in > and divided by elements of the set -1>,\,q-1|}>>. In particular, the algorithms can still be applied in the more general case when =\> is a field of characteristic zero and =\> a vector space. If> is only an effective commutative integral domain of characteristic zero and > an effective torsion-free >-module with an effective partial division, then we define the effective field > and the effective vector field> as above, and we may still apply the generalized algorithms for multipoint evaluation and interpolation in >. In particular, both multipoint evaluation and interpolation can still be done using >|)>> operations in> and>, whence >|)>> operations in > and >. If we that the end-results of these algorithms are really in the subspace > of > (or in the subring > of >), then we use the partial division in > to replace their representations in > (or >) by representations in > (or >). <\lemma> Let > be an effective and commutative integral domain and > an effective >module. There exists a constant , such that the following holds: for any 0>>, \>> and \> such that ,q> are pairwise distinct and such that and -1,\,q-1> are all invertible in >, we have: <\enumerate-alpha> We may compute ,\,P|)>> from using >> operations in > and >. We may reconstruct from ,\,P|)>> using >> operations in > and>. <\proof> We again use straightforward generalizations of the algorithms in, using the fact that we only divide by elements in the set -1>,\,q-1|}>>. Let > be an effective (possibly non commutative) ring and recall that <\eqnarray*> >||\:deg P\n|}>.>>>> Given a power series \|]>> and j>, we will also use the notations <\eqnarray*> >||*z+\+f*z>>|>||+\+f*z>>|>||*z+f*z+\.>>>> The fast relaxed algorithms from are all based on two main changes of representation: ``blocking'' and the fast Fourier transform. Let us briefly recall these transformations and how to use them for the design of fast algorithms for relaxed multiplication. Given a block size 0>, the first operation of rewrites a power series \|]>> as a series in > with coefficients in > <\eqnarray*> >||b>f*z*y\\|]>.>>>> Given \|]>>, we may then compute using <\eqnarray*> ||*\|)>,>>>> where *\\\|]>> and <\eqnarray*> :\|]>>|>||]>>>|P*y>|>|P*z.>>>> Assume now that \1/2>, that |}>>, and that > admits a primitive -th root of unity=\>. Then the discrete Fourier transform provides us with an isomorphism <\eqnarray*> >:\>|>|>>||>|,P|)>,\,P|)>|)>,>>>> and it is classical that both >> and >> can be computed using > operations in >. The operations >> and >> extend naturally to |]>> <\eqnarray*> >f*y|)>>||FFT>|)>*y.>>>> This allows us to compute using the formula <\eqnarray*> ||>>|)>*FFT>|)>|)>|)>.>>>> The first coefficients of can be computed using at most >+\> operations in>. In formula() the -th coefficient of the right hand side may depend on the >-th coefficients of and . In order to make() suitable for relaxed multiplication, we have to treat the first coefficients of and apart. Indeed, the formula <\eqnarray*> ||*g+f*g+f*g+>>|||>>|)>|)>*FFT>|)>|)>|)>|)>>>>> allows for the relaxed computation of at order using at most <\eqnarray*> >>|>|>+2*k*>+2*b*>+\>>>> operations in >. Similarly, the formula <\eqnarray*> ||*g+\>>|)>|)>*FFT>|)>|)>|)>>>>> allows for the semi-relaxed computation of at order using at most <\eqnarray*> >>|>|>+2*b*>+\>>>> operations in >. For a given expansion order , one may take >, and use the above formula in a recursive manner. This yields11> <\eqnarray*> >>|||)>.>>>> <\remark> Since the block size is chosen as a function of , the above method really describes a relaxed algorithm for computing the product up to an order >> which is specified in advance. In fact, such an algorithm automatically yields a genuine relaxed algorithm with the same complexity (up to a constant factor), by doubling the order >> each time when needed. In the above discussion, we both provided bounds for >> and >>. In fact, there exists a straightforward reduction of relaxed multiplication to semi-relaxed multiplication. First of all, the relaxed multiplication of two power series \|]>> up to order |)>> clearly reduces to the relaxed multiplication of the two polynomials > and > up to order |)>>. Now the formula <\eqnarray*> *g>||*g+f*g+f*g+f*g>>>> shows that a relaxed product of two polynomials > and > of degrees 2*n> reduces to a relaxed product *g> of half the size, two semi-relaxed products *g>, *g>, and one non-relaxed product *g>. Under the assumptions that >/n>> and >/n>> are increasing, a routine calculation thus yields <\eqnarray*> >>||>|)>.>>>> Instead of taking a single block size , we may write n*\*n>, take different block sizes =n\\\b=n*\*n>, and decompose in parts >+f;b>+\+f;b>>, where =+\>. For semi-relaxed multiplication, this yields the formula <\eqnarray*> ||>*g+\>>>>>>;b>|)>|)>*FFT>>>|)>|)>|)>>>>> and the complexity bound <\eqnarray*> >>|>|>*>|)>+>*>|)>+\+>*>|)>+\.>>>> For \n\\\n> and \\>> well chosen block sizes (depending on the expansion order ), the recursive application of this technique yields12> <\eqnarray*> >>||>|)>.>>|||>|)>.>>>> Let us now consider the less favourable case when > is an effective ring which does not necessarily contain primitive >-th roots of unity for arbitrarily high . In this section, we will first consider the case when > is torsion-free as a >-module and also admits a partial algorithm for division by integers. Given a block size \> and \\> (say ), we will replace the discrete Fourier transform >> at a >-th primitive root of unity by multipoint evaluation at ,q>. More precisely, we define <\eqnarray*> :\>|>|>>||>|,P,\,P|)>|)>>>>> and the inverse transform :im \\\>. By Lemma, these transforms can both be computed using |)>> operations in >. In a similar way as >> and >>, we extend > and > to power series in . <\theorem> Let > both be an effective ring and an effective torsion-free >-module. Then <\eqnarray*> >>||>*|)>>>|||>.>>>> <\proof> It suffices to prove the complexity bound for semi-relaxed multiplication. We adapt the multiplication algorithm with different block sizes from the end of the previous section, and replace() by <\eqnarray*> ||>*g+\>>>>;b>|)>|)>*\>>|)>|)>|)>.>>>> This leads to the complexity bound <\eqnarray*> >>|>|>*>|)>+>*>|)>+\+>*>|)>+\>|)>.>>>> We now follow the proof of . Denote <\eqnarray*> >||>|)>|p*2>>>>> and take =\=n>. Using that >=\>, this leads to <\eqnarray*> >|>|+\|)>>>>> Applying this relation times, we obtain <\eqnarray*> |)>>|>|*U+\*l*+\+>|)>*log l|)>>>|||*l*log l|)>.>>>> Taking <\eqnarray*> |||>>|||>,>>>> this yields <\eqnarray*> >||*\>|)>.>>>> Re-expressing this bound in terms of >> yields the desired result. Let > now be an effective ring of prime characteristic . For expansion orders p>, the ring > does not necessarily contain distinct points in geometric progression. Therefore, in order to apply Lemma, we will first replace > by a suitable extension, in which we can find sufficiently large geometric progressions. Given , let ||2*log p>|\>> be even such that \n>. Let >\\> be such that the finite field >> is isomorphic to />|)>>. Then the ring <\eqnarray*> >|>|/>|)>>>>> has dimension over > as a vector space, so we have a natural >-linear bijection <\eqnarray*> :\>|>|>>||>|>>> The ring > is an effective ring and one addition or subtraction in > corresponds to additions or subtractions in >. Similarly, one multiplication in > can be done using >|)>> operations in >. In order to multiply two series \|]>> up to order |)>>, the idea is now to rewrite and as series in |]>> with >. If we want to compute the relaxed product, then we also have to treat the first coefficients apart, as we did before for the blocking strategy. More precisely, we will compute the semi-relaxed product using the formula <\eqnarray*> ||*g+\|)>|)>*\|)>|)>|)>,>>>> where we extended > to |]>> in the natural way: <\eqnarray*> *0>f*u|)>>||0>\|)>*u.>>>> From the complexity point of view, we get <\eqnarray*> >>|>|*>|)>+>|)>*\>|)>.>>>> Since > contains a copy of >>, it also contains at least -1\n> points in geometric progression. For the multiplication up till order of two series with coefficients in >, we may thus use the blocking strategy combined with multipoint evaluation and interpolation. <\theorem> Let > be an effective ring of prime characteristic . Then <\eqnarray*> >>||>**log log log n|)>>>>> <\proof> With the notations from above, we may find a primitive -1|)>>-th root of unity in>\\>. We may thus use formula() for the semi-relaxed multiplication of two series in |]>> up till order n>. In a similar way as in the proof of Theorem, we thus get <\eqnarray*> >|)>>||>*|k>|)>.>>>> Using classical fast relaxed multiplication, we also have <\eqnarray*> >|)>>|| k*log log k|)>,>>>> whence() simplifies to <\eqnarray*> >>||>*>|k>*|)>.>>>> Since >/k=\> and >, the result follows. <\remark> In our complexity analysis, we have not taken into account the computation of the polynomial >\\> with >=\/>|)>>. Using a randomized algorithm, such a polynomial can be computed in time |~>*log p|)>>; see . If >, then this is really a precomputation of negligible cost |~> n|)>>. If we insist on computing >> in a deterministic way, then it is better to slightly change our algorithm, and systematically choose to be a prime number minus one. Under this assumption, the cyclotomic polynomial +\+1> is irreducible over >; see. For a fixed , the first prime number with \n> still has size >, by the prime number theorem, and we may compute it in time |~>> using the sieve of Eratosthenes. This again shows that a suitable >> can be precomputed with negligiblecost. Let us now show that the technique from the previous section actually extends to the case when > is an arbitrary effective ring of positive characteristic. We first show that the algorithm still applies when the characteristic of > is a prime power. We then conclude by showing how to apply Chinese remaindering in our setting. <\theorem> Let > be an effective ring of prime power characteristic >. Then <\eqnarray*> >>||>**log log log n|)>.>>>> <\proof> Taking ||2*log p>|\>>>, let >> be as in the previous section and pick a monic polynomial > in /s*\|)>> such that the reduction |)>> of > modulo yields . Then we get a natural commutative diagram <\eqnarray*> /s*\|)>/|)>>|>|/|)>>>|>>||>>>|/>|>||)>/,>>>> where > stands for reduction modulo . In particular, we have an epimorphism <\eqnarray*> :/s*\|)>/|)>>|>|/\\>,>>>> with =>. Now let be an element in >> of order -1>. Then any lift \/s*\|)>/|)>> of with |)>=q> has order at least -1>. Moreover, ,q-2>-1> and are all invertible. Consequently, -1,\,-2>-1> and > do not lie in =>, whence they are invertible as well. It follows that we may still apply multipoint evaluation and interpolation in /|)>> at the sequence ,\,-2>>, whence Theorem generalizes to the presentcase. <\remark> For a fixed prime number , we notice that the complexity bound is uniform in the following sense: there exists a constant such that for all effective rings of characteristic > with |}>>, we have >\K*>**log log log n>. Indeed, the choice of only depends on and , and any operation in >> or />|)>> in the case corresponds to exactly one lifted operation in /s*\|)>/|)>> or /|)>> in the general case. <\theorem> Let > be an effective ring of non zero characteristic . Then <\eqnarray*> >>||>**log log log n|)>>>|||>.>>>> <\proof> We will prove the theorem by induction on the number of prime divisors of . If is a prime power, then we are done. So assume that *s>, where > and > are relatively prime, and let ,k\\> be such that <\eqnarray*> *s+k*s>||>>> Then we may consider the rings <\eqnarray*> >||/s*\>>|>||/s*\.>>>> These rings are effective, when representing their elements by elements of > and transporting the operations from >. Of course, the representation of an element of > (or >) is not unique, since we may replace it by for any s*\> (or s*\>). But this is not a problem, since our definition of effective ring did not require unique representability or the existence of an equality test. Now let \|]>> and let ,\> be their projections in |]>>, for >. Consider the relaxed products *\>, for . These products are represented by relaxed series ,h\\|]>> |)>=\*\>, for . By the induction hypotheses, we may compute > and > at order using <\equation*> \>**log log log n|)> operations in >. The linear combination *s*h+k*s*h\\|]>> can still be expanded up till order with the same complexity. We claim that . Indeed, <\eqnarray*> *s*h-k*s*f*g>|>|*s*s*\=>>|*s*h-k*s*f*g>|>|*s*s*\=.>>>> Summing both relations, our claim follows. -adic numbers> Let 1> be an integer, not necessarily a prime number, and denote =,p-1|}>>. We will regard -adic numbers \> as series +a*p+a*p+\> with \\>, and such that the basic ring operations , and > require an additional carry treatment. In order to multiply two relaxed -adic numbers \>, we may rewrite them as series ,\\|]>>, multiply these series =*>, and recover the product \> from the result. Of course, the coefficients of > may exceed , so some relaxed carry handling is required in order to recover from >. We refer to for details. In particular, we prove there that can be computed up to order |)>> using >|)>> ring operations in> of bit size >. Given 0>, let =\:\2|}>>, and consider two power series \|]>>. We will denote by >> ( >>) the bit complexity of multiplying and up to order |)>> using a relaxed ( semi-relaxed) algorithm. <\lemma> We have <\eqnarray*> >>||>***log log log n|)>.>>>> <\proof> Let \|]>> and let be a prime number (in practice, we recommend to take to be a prime number which fits inside one machine word and such that is divisible by a high power of two). Let |2*k*log/log p|\>> be sufficiently large such that \p>. Let ,\/p*\|)>|]>> be the reductions of modulo >. Then may be reconstructed up to order |)>> from the product *>. We thus get <\eqnarray*> >>|>||)>*/p*\>>>>> By Theorem, we have <\eqnarray*> /p*\>>||>**log log log n|)>.>>>> By Remark, this bound is uniform in . Since \n*2>, the result follows. For the above strategy to be efficient, it is important that >. This can be achieved by combining it with the technique of -adic blocking. More precisely, given aadic block size 1>, then any -adic number in > can naturally be considered as a>adic number in >>, and . Assuming that numbers in > are written in base, the conversion is trivial if is a power of two. Otherwise, the conversion involves base conversions and we refer to for more details. In particular, the conversions in both directions up to order |)>> can be done in time **log|)>>. Let > ( >) the complexity of relaxed ( semi-relaxed) multiplication in> up till order |)>>. <\theorem> We have <\eqnarray*> >||>*log p*log|)>>|)>>>|||>.>>>> <\proof> Let |log n/log p|\>>, so that <\eqnarray*> |>|>|>||*log log|)>>>||)>>||*log*log log|)>.>>>> Using the strategy of -adic blocking, a semi-relaxed product in > may then be reduced to one semi-relaxed product in >> and one relaxed multiplication with an integer in ,p-1|}>>. In other words, <\eqnarray*> >|>|*+>||\>|)>+\,>>>> where > stands for the cost of semi-relaxed multiplication of two -adic numbers in > up till order |)>>. By 4>, we have <\eqnarray*> >|||)>*log r|)>>>|||*log*log log |)>>>>> By Lemma, we also have <\eqnarray*> >||\>|)>>||**log n**log log log n*\>|)>>>|||**log n**log log log n*\>|)>>>||||)>>*\>|)>>>>> In particular, <\eqnarray*> *>||>||\>|)>|)>,>>>> which completes the proof of the theorem. <\remark> The best previously known bound for relaxed multiplication in > was <\eqnarray*> >|||)>*log n|)>>>|||> n|)>>|p=\>>|> n*log p*log log p|)>>|n=\>>>>>>>>> We thus improved the previous bound by a factor at least, up to sublogarithmic terms. For the moment, we have not implemented any of the new algorithms in practice. Nevertheless, our old implementation of the algorithm from allowed us to gain some insight on the practical usefulness of blockwise relaxed multiplication. Let us briefly discuss the potential impact of the new results for practical purposes. In characteristic zero, our focus on algebraic complexity makes the complexity bounds more or less irrelevant from a practical point of view. In practice, two cases are of particular interest: floating point coefficients (which were already considered in) and integer coefficients (rational coefficients can be dealt with similarly after multiplying by the common denominator). In the case of integer coefficients, it is best to re-encode the integers as polynomials in > for a prime number which fits into a machine word and such that> admits many >th roots of unity (it is also possible to take several primes and use Chinese remaindering). After that, one may again use the old algorithm from. Also, integer coefficients usually grow in size with , so one really should see the power series as abivariate power series in |]>> with a triangular support. One may then want to use TFT-style multiplication in order to gain another constant factor. For large finite fields, it is easy to find large geometric progressions, so the algorithms of this paper can be applied without the need to consider field extensions. Moreover, for finite fields of the form >> with sufficiently large and 1>, it is possible to choose \>, thereby speeding up evaluation and interpolation. For small finite fields of the form >>, it is generally necessary to make the of working in a larger ring with sufficiently large geometric progressions. Of course, instead of the ring extensions considered in Section, we may directly use field extensions of the form >> with k>. In principle, in the semi-relaxed case, it is possible to gain a factor with respect to the fully relaxed case, using the technique from. Unfortunately, the middle product is not always easy to implement. For instance, if we rely on Kronecker substitution for multiplications in >, then we will need to implement an analogue for the middle product. Since we did not use a fast algorithm for middle products for our benchmarks in5>, the timings for the semi-relaxed product in were only about instead of better than the timings for the fully relaxed product. Nevertheless, modulo increased implementation efforts, we stress that a gain should be achievable. So far, we have not investigated the cache friendliness of blockwise relaxed multiplication, and it can be feared that a lot of additional work is required in order to make our algorithms really efficient from this point of view. In order to keep the presentation reasonably simple, we have focussed on the case when > is an effective ring. In fact, a more general setting for relaxed multiplication is to consider a bilinear mapping :\\\\\>, where >, > and> are effective >modules, and extend it into a mapping |^>:\|]>\\|]>\\|]>> by |^>=\,g|)>*z>. Under suitable hypothesis, the algorithms in this paper generalize to this setting. The relaxed approach can also be generalized to the case when the coefficients of the power series are operators which commute with monomials > in a non trivial way. More precisely, assume that we have an effective ring homomorphism :\\\> such that a|)>*z> for all \>. For instance, one may take =\|]>> with =z*\/\ z>, so that |)>*z=P+1|)>*z>. Given a commutation rule of this kind, we define a skew multiplication on |]>> by <\eqnarray*> 0>f*z|]>*0>g*z|]>>||0>f* g|)>*z.>>>> If >> denotes the cost of multiplying two polynomials > and > in > with n>, then the classical fast relaxed multiplication algorithm from generalizes and still admits the time complexity >*log n|)>>. However, the blockwise algorithm from this paper does not generalize to this setting, at least not in astraightforward way. <\bibliography|bib|alpha|all> <\bib-list|AHU74> A.Aho, J.Hopcroft, and J.Ullman. . Addison-Wesley, Reading, Massachusetts, 1974. D.Bernstein. Removing redundancy in high precision Newton iteration. Available from , 2000. J.Berthomieu, J.vander Hoeven, and G.Lecerf. Relaxed algorithms for -adic numbers. , 23(3):541--577, 2011. R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. , 25:581--595, 1978. J.Berthomieu and R.Lebreton. Relaxed -adic hensel lifting for algebraic systems. Work in preparation, 2011. L.I. Bluestein. A linear filtering approach to the computation of the discrete fourier transform. , AU-18:451--455, 1970. D.Bini and V.Y. Pan. . Birkhäuser Boston Inc., Boston, MA, 1994. Fundamental algorithms. A.Bostan and É.Schost. Polynomial evaluation and interpolation on special sets of points. , 21(4):420--446, August 2005. Festschrift for the 70th Birthday of Arnold Schönhage. D.G. Cantor and E.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. S.A. Cook. . PhD thesis, Harvard University, 1966. J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297--301, 1965. M.J. Fischer and L.J. Stockmeyer. Fast on-line integer multiplication. , 9:67--72, 1974. M.Fürer. Faster integer multiplication. In , pages 57--66, San Diego, California, 2007. J.vonzur Gathen and J.Gerhard. . Cambridge University Press, 2-nd edition, 2002. J.vander Hoeven. Lazy multiplication of formal power series. In W.W. Küchlin, editor, , pages 17--20, Maui, Hawaii, July 1997. J.vander Hoeven. Relax, but don't be too lazy. , 34:479--542, 2002. J.vander Hoeven. Relaxed multiplication using the middle product. In Manuel Bronstein, editor, , pages 143--147, Philadelphia, USA, August 2003. J.vander Hoeven. The truncated Fourier transform and applications. In J.Gutierrez, editor, , pages 290--296, Univ. of Cantabria, Santander, Spain, July 4--7 2004. J.vander Hoeven. Notes on the Truncated Fourier Transform. Technical Report 2005-5, Université Paris-Sud, Orsay, France, 2005. J.vander Hoeven. New algorithms for relaxed multiplication. , 42(8):792--802, 2007. J.vander Hoeven. Relaxed resolution of implicit equations. Technical report, HAL, 2009. . J.vander Hoeven. Newton's method and FFT trading. , 45(8):857--878, 2010. J.vander Hoeven. From implicit to recursive equations. Technical report, HAL, 2011. . A.Karatsuba and J.Ofman. Multiplication of multidigit numbers on automata. , 7:595--596, 1963. L.R. Rabiner, R.W. Schafer, and C.M. Rader. The chirp z-transform algorithm and its application. , 48:1249--1292, 1969. Alexandre Sedoglavic. ; applications à l'étude des propriétés structurelles de systèmes différentiels algébriques en automatique>. PhD thesis, École polytechnique, 2001. A.Schönhage and V.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. A.L. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. , 4(2):714--716, 1963. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> CT65 SS71 CK91 AHU74 BiPa94 GaGe02 BK78 vdH:relax Sedo01 vdH:fnewton vdH:newimpl vdH:implicit BL11 vdH:issac97 vdH:relax Kar63 vdH:issac97 vdH:relax FS74 vdH:issac03 vdH:newrelax vdH:newrelax vdH:relax Bern:fnewton vdH:fnewton vdH:newrelax FS74 vdH:padic Toom63b Cook66 SS71 Fur07 vdH:newrelax BoSc05 BoSc05 RSR69 Blue70 BoSc05 vdH:newrelax BoSc05 BoSc05 vdH:newrelax CT65 vdH:newrelax vdH:newrelax vdH:newrelax vdH:issac97 vdH:relax FS74 GaGe02 GaGe02 vdH:padic vdH:padic vdH:padic vdH:newrelax vdH:newrelax vdH:newrelax vdH:tft vdH:tft-note vdH:issac03 vdH:newrelax vdH:relax FS74 <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |1.1.Relaxed resolution of recursive equations |.>>>>|> > |1.2.Known algorithms for relaxed multiplication |.>>>>|> > |1.3.Improved complexity bounds |.>>>>|> > |math-font-series||font-shape||2.Multipoint evaluation and interpolation> |.>>>>|> |math-font-series||font-shape||3.Survey of blockwise relaxed multiplication> |.>>>>|> |Blocking and unblocking |.>>>>|> > |Discrete Fourier transforms |.>>>>|> > |Relaxed multiplication |.>>>>|> > |Reducing relaxed to semi-relaxed multiplication |.>>>>|> > |Multiple block sizes |.>>>>|> > |math-font-series||font-shape||4.Relaxed multiplication in characteristic zero> |.>>>>|> |math-font-series||font-shape||5.Relaxed multiplication in prime characteristic> |.>>>>|> |math-font-series||font-shape||6.Relaxed multiplication in mixed characteristic> |.>>>>|> |math-font-series||font-shape||7.Relaxed multiplication of |p>-adic numbers> |.>>>>|> |math-font-series||font-shape||8.Final remarks> |.>>>>|> |Characteristic zero |.>>>>|> > |Finite fields |.>>>>|> > |Semi-relaxed multiplication |.>>>>|> > |Cache friendliness |.>>>>|> > |Bilinear maps |.>>>>|> > |Skew series |.>>>>|> > |math-font-series||font-shape||Bibliography> |.>>>>|>