> <\body> ||<\author-address> CNRS, Département de Mathématiques Bâtiment 425 Université Paris-Sud 91405 Orsay Cedex France ||>|<\doc-note> This work was partially supported by the ANR Gecko project. |||>> <\abstract> Let and be two convergent power series in [[z]]> or [[z]]>, whose first terms are given numerically with a *n>-bit precision for a fixed constant \0>. Assuming that =0>, we will show in this paper that the first coefficients of g> can be computed with a *n>-bit precision in time (n)>. Using Newton iteration, a similar complexity bound holds for power series reversion of . Our method relies on fast multi-point evaluation, which will be recalled and further detailed for numeric polynomials. We also discuss relaxed variants of our algorithm. Let > be an effective ring of coefficients ( we have algorithms for performing the ring operations). Let \[[z]]> be two power series with =0>, so that the composition g\\[[z]]> is well-defined. We are interested in algorithms for fast composition: given ,\,f> and ,\,g>, how much arithmetic operations in > are needed in order to compute ,\,h>? A first efficient general purpose algorithm of time complexity (n)*)> was given in. Here (n)> denotes the complexity for the multiplication of two polynomials of degrees > n> and we have (n)=O(n*log n*log log n)> . In the case when is polynomial or algebraic, then this complexity further reduces to(n)*log n)>. For some very special series , there even exist (n))> algorithms; see for an overview. In positive characteristic 0>, right composition can also be performed in quasi-linear time (n)*log n)>. In this paper, we are interested in efficient algorithms when > is a ring of numbers, such as >, > or a ring of floating point numbers. In that case, we are interested in the bit complexity of the composition, which means that we also have to take into account the bit precision of the underlying integer arithmetic. In particular, we will denote by (p)> the time needed for multiplying two -bit integers. We have (p)=O(p*log p*log log p)> and even (p)=O(p*log p*log> p)>, where >> satisfies >(exp n)=log> n+1>. Ifall coefficients ,\,f>, ,\,g> and ,\,h> correspond to -bit numbers, then we will search for a composition algorithm which runs in quasi-linear time (n*p)=O(n*p*(log (n*p)))>. Throughout the paper, we will assume that (n)/n> is increasing and (O(n))=O((n))>. In section, we start by reviewing multiplication and division of polynomials with numeric coefficients. Multiplication of two polynomials of degrees > n> with -bit coefficients can be done in time (n*p))> using Kronecker's substitution. Division essentially requires a higher complexity (n*(p+n)))>, due to the fact that we may lose bits of precision when dividing by apolynomial of degree . Nevertheless, we will see that a best possible complexity can be achieved in terms of the output precision. In section, we will study multi-point evaluation of a polynomial +\+P*z> at points ,\,z> in the case of numeric coefficients. Adapting the classical binary splitting algorithm of time complexity (n)*log n)> to the numeric context, we will prove the complexity bound (n*(p+n))*log n)>. In the case when , this complexity bound is again non-optimal in many cases. It would be nice if a bound of the form (n*p)*log n)> could be achieved and we will present some ideas in this direction in section. In section, we turn to the problem of numeric power series composition. Under the assumptions and , we first show that the computation of the first coefficients of a convergent power series is essentially equivalent to its evaluation at equally spaced points on a given circle (this fact was already used in, although not stated as explicitly). The composition problem can therefore be reduced to one fast Fourier transform, one multi-point evaluation and one inverse Fourier transform, leading to acomplexity ((n+p))*log (n+p))>. We first prove this result under certain normalization conditions for floating point coefficients. We next extend the result to different types of numeric coefficients (including integers and rationals) and also consider non-convergent series. The good complexity of these algorithms should not come as a full surprise: under the analogy numbers>series, it was already known how to compose power series fast. In the last section, we conclude by giving relaxed versions of our composition algorithm. This allows for the resolution of functional equations for numeric power series, involving composition. Unfortunately, we did not yet achieve a quasi-linear bound, even though some progress has been made on the exponent. The special case of power series reversion may be reduced more directly to composition, using Newton's method. Let be a normed vector space. Given \V> and \\>={c\\:c\0}>, we say that > is an >-approximation> of if -x\|\\>. We will also say that > is an approximation of with >> or an absolute precision of \log \\> bits. For polynomials *z+\+P\\[z]>, we will use the norm \|+\+\|P\|>, and notice that \|P\|*\|Q\|>. For vectors ,\,v)\\>, we will use the norm \|,\,\|v\|)>. Given two polynomials \[z]> with n>, n>, 2> and 2>, take log n\+2>. Then the product can be read off from the integer product )=P(2)*Q(2)>, since the coefficients of the result all fit into bits. This method is called Kronecker multiplication and its cost is bounded by(n*p))>. Clearly, the method generalizes to the case when \[\][z]> or \[\]*[z]*2>>. A direct consequence for the computation with floating point polynomials is the following: <\lemma> Let \[z]> be two polynomials with . Given \> with 2> and 2>, we may compute a p>>-approximation of in time ((p+a+b)*n))>. <\proof> Let and consider polynomials ,B\\[i][z]> with <\eqnarray*> -2*A\|>|>|>|-2*B\|>|>|>>> By assumption, the bit-sizes of the coefficients of > and > are bounded by . Therefore, we may compute the exact product *B> in time (k*n))>, using Kronecker multiplication. We have <\eqnarray*> *A*B\|>|>|*B)\|+\|B*(A-2*A)\|+>>|||*A)*(B-2*B)\|>>||>|*B\|+\|B\|*\|A-2*A\|+>>|||*A\|*\|B-2*B\|>>||>|+2\2p>.>>>> This proves that *A*B> is a p>>-approximation of . One may optionally truncate the mantissas of the coefficients of *A*B> so as to fit into bits. <\remark> In order to increase readability, we will loosely use real and complex numbers as inputs of our algorithms. In practice, such inputs are really floating point numbers whose precisions should be clear from the context. <\lemma> Let ,\,z\\> be such that \|\1> for all and <\equation*> P=(1-z*z)*\*(1-z*z). Let =P1>\\[[z]]> and =\+\+\*z>. Let \> be such that 2> and \|\2>. Given \>, we may compute a p>>-approximation of > in time ((p+a+b)*n))>. <\proof> Assume that we are given an approximation > of >. Then we may compute abetter approximation using the classical Newton iteration <\equation> |~>=\-(\*P-1)*\. If *P-1> is ``small'', then <\equation> |~>*P-1=(\*P-1)-(\*P-1)*\*P=\(\*P-1) is about twice as ``small''. We will apply the Newton iteration for a polynomial > whose first coefficients are good approximations of > and the remaining coefficients less good approximations. Given a polynomial \[z]>, we write <\eqnarray*> >||+\+An/2\>*zn/2\>>>|>||n/2\+1>*zn/2\+1>+\+A*z>>|>||+A.>>>> Assume that -\)\|\2p>> and -\)\|\2q>>, with q>. Setting *P-1)> and *P-1)>, we have <\eqnarray*> ||*P-1)\|=\|((\-\)*P)\|\2>>|||*P-1)\|=\|((\-\)*P)\|\2.>>>> The relation() yields <\equation*> |~>*P-1=\E-2*E*R+O(z), whence <\eqnarray*> |~>*P-1)\|>|>|2*(2+2)\2>>||~>-\)\|>|>||~>*P-1)\|*\|\\|\2.>>>> When starting with =\>, we may take b>. If 4*a+3*b+6> is sufficiently large, then one Newton iteration yields |~>-\)\|\2>. Applying the Newton iteration once more for , we obtain |~>-\)\|\2>. In view of lemma and the assumption (O(n))=O((n))>, the cost of two such Newton iterations is bounded by ((p+a+b)*n)> for a suitable constant . We are now in a position to prove the lemma. Since an increase of by leaves the desired complexity bound unaltered, we may assume without loss of generality that 4*a+3*b+6>. Then the cost of the inversion at order satisfies T(\n/2\)+C*((p+a+b)*n)>. We conclude that C*((p+a+b)*n)+C*((p+a+b)*\n/2\)+\\(2+o(1))*C*((p+a+b)*n)>. <\remark> If only is given, then a constant for the bound \|\2> is not known priori>. Assume that -\\|\2p>> and write *P=1+\>. Then \|\2> and <\equation*> =|P*\>=|1+\>=|1+\+\>+O(z)=|1+\>*1-|1+\>+O(z). Assuming a+3>, it follows that <\equation*> \|\\|\2*\|\\|*(1+\|\\|). We may thus take log (2*\|\\|*(1+\|\\|))\>. Inversely, we have <\equation*> 2\|\\|*(1+\|\\|)\2*\|\\|*(2+\|\\|*\|P\|)\2*\|\\|*\|P\|*(1+o(1)). In other words, our choice of is at most a factor two worse than the optimal value. <\remark> The following alternative algorithm for the approximation of > is a variant of the method described in: <\itemize> Choose 0> sufficiently small and sufficiently large, such that <\equation> \|\*z+\*z+\\|\p>*r|2*N>,\|z\|\r. Evaluate =P1>> at ,r*\> where =\*\/N>>, using one direct FFT for the polynomial and scalar inversions. Let (r*z)> be the polynomial of degree >N> we recover when applying the inverse FFT on (r),\,\(r*\)>. Then() implies (r*z)-\(r*z)\|\2p-1>*r>. Consequently, -\*z-\-\\|\2p>>. Because of () below, we may always take and , which gives a complexity bound of the form (p+n)*n*log n)>. In fact the FFT of a numeric -bit polynomial of degree can be computed in time (p*n))> , which drops the bound further down to(n*(p+n)))>. In practice, we may also start with 1> and double until the computed approximation > of > satisfies the equation =1> up to a sufficient precision. This leads to the same complexity bound ((p+a+b)*n))> as in the lemma. It is not clear which of the two methods is most efficient: Newton's method performs a certain amount of recomputations, whereas the alternative method requires us to work at a sufficiently large degree n> for which() holds. Given power series \[[z]]> and \>[[z]]>, where >={x\\:x\0}>, we will say that and write b> if \|\b> for all coefficients of . This notation applies in particular to polynomials. <\lemma> For > and as in lemma, we may compute a p>>-approximation of > in time (n*(p+n)))>. <\proof> Then (1+z)> and \>>, whence <\eqnarray*> |>|>>|\|>|>|>=\4.>>>> We conclude by applying lemma for and . The following result was first proved in. <\lemma> Let \[z]> be polynomials with 1>, 2*n> and <\equation*> B=(z-z)*\*(z-z) for ,\,z\\> with \|\1> for all . Consider the Euclidean division <\equation*> A=Q*B+R, with n>. Given \>, we may compute a p>>-approximations of and in time(n*(p+n)))>. <\proof> Setting and , we may write <\eqnarray*> ||*(t)=z*(A+A*t+\+A*t)>>|||*(t)=z*(B+B*t+\+B*t).>>>> Setting =/\\[[t]]>, we then have <\eqnarray*> ||*(+*t+\+*t)>>|||>>> Our lemma now follows from lemmas and. <\remark> Assuming that 2>, 2>, 2> and 2>, it is again possible to prove the improved estimate ((p+a+b+q+r)*n))>. However, the proof relies on Schönhage's original method, using the same doubling strategy as in remark. Indeed, when using Newton's method for inverting >, nice bounds 2> and 2> do not necessarily imply a nice bound for \|>, where > is the truncated inverse of >. Using a relaxed division algorithm for />, one may still achieve the improved bound, up to an additional overhead. For applications where Euclidean division is used as a subalgorithm, it is necessary to investigate the effect of small perturbations of the inputs and on the outputs and. <\lemma> Consider two Euclidean divisions <\eqnarray*> ||>|>||*+,>>>> with similar hypotheses as in lemma. Then <\equation*> \|-R\|\2*\|-A\|+2*\|A\|*\|-B\|. <\proof> With the notations of the proof of lemma, let > and |~>> be the truncations of the power series 1>> and |^>1>> at order . Let us first consider the case when =B>, so that <\equation*> -A=(-Q)*B+>-R. Then -Q\|\\|-A\|*\|\\|> and <\equation*> \|-R\|\\|-A\|*(1+\|B\|*\|\\|). Let us next consider the case when =A>. Then <\equation*> |^>>->=|^>-|*|^>>, whence -Q\|\\|A\|*\|-B\|*\|\\|*\||~>\|> and <\equation*> \|-R\|=\|Q*(B-)+*(-Q)\|\\|A\|*\|\\|*\|-B\|*(1+\|\|*\||~>\|). In general, the successive application of the second and the first case yield <\equation*> \|-R\|\(\|-A\|+\|A\|*\|\\|*\|-B\|)*(1+\|\|*\||~>\|) We have also seen in the proof of lemma that \|\2>, \|\4> and |~>\|\4>. <\lemma> Let and be as in lemma, while allowing for the case when 2*n>. Then <\equation*> \|R\|\*\|A\|. <\proof> With the notations from the proof of lemma, we have <\eqnarray*> >|>|1>>>|1>>|>|n>>>|>|>|n\1>,>>>> whence <\equation*> \|Q\|\\|A\|* and <\equation*> \|R\|\\|B\|*\|Q\|+\|A\|\2*+1*\|A\|\*\|A\|. \; Consider a complex polynomial <\equation*> P=P+\+P*z\\[z], which has been normalized so that 1>. Let <\equation*> z,\,z\\ be pairwise distinct points with \|\1> for all . The problem of multi-point evaluation is to find an efficient algorithm for the simultaneous evaluations of at ,\,z>. Ap>>-approximation of )> will also be called a p>>-evaluation> of at >. In order to simplify our exposition, we will assume that > is a power of two; this assumption only affects the complexity analysis by a constant factor. An efficient and classical algorithm for multi-point evaluation relies on binary splitting: let =(z-z)*\*(z-z)>, =(z-z)*\*(z-z)>. Denoting by the remainder of the Euclidean division of by , we compute <\eqnarray*> >||>>|>||>>>> Then <\equation*> P(z)=(z),>|0\i\n/2>>|(z),>|n/2\i\n>>>>> In other words, we have reduced the original problem to two problems of the same type, but of size . For the above algorithm to be fast, the partial products > and > are also computed using binary splitting, but in the opposite direction. In order to avoid recomputations, the partial products are stored in a binary tree of depth n>. At depth , we have > nodes, labeled by polynomials ,\,Q-1>>, where <\equation*> Q=(z-z*i>)*\*(z-z*(i+1)-1>). For instance, for , the tree is given by <\equation*> ||Q|Q>||Q|Q>> We have <\eqnarray*> >||>>|>||*Q>>>> For a given polynomial of degree > n> we may now compute <\eqnarray*> >||>|>||i/2\> mod Q>>>> This second computation can be carried out in place. At the last stage, we obtain <\equation*> P=P(z). More generally, <\equation> P=P mod Q, for all and . <\lemma> Let \[z]> and ,\,z\\> be such that 1>, n> and \|\1> for all . Given \>, we may compute p>>-evaluate at ,\,z> in time (n*(p+n))*log n)>. <\proof> In the algorithm, let and assume that all multiplications() and all euclidean divisions() are computed up to an absolute error> 2q>>. Let > and > denote the results of these approximate computations. We have <\equation*> \|-Q\|\\|\|*\|-Q\|+\|Q\|*\|-Q\|+2q>. It follows by induction that <\eqnarray*> \|>|>|>>>|\|>|>|+1>>>|-Q\|>|>|+2\q>.>>>> From() and lemma, we have <\equation> \|P\|\>>\4. In view of() and lemma, we also have <\equation*> \|-P\|\22+1>*\|i/2\>-Pi/2\>\|+22+1>*\|Pi/2\>\|*\|-Q\|+2. By induction over , we obtain <\equation*> \|-P\|\(h-k)*2. Using our choice of , we conclude that <\equation*> \|-P(z)\|\2p>(0\i\n). Let us now estimate the complexity of the algorithm. In view of() and lemma, we may compute > in time (2*(q+O(2))))=O((2*(q+n)))>. In view of() and lemma, the computation of > takes a time (2*(q+2*n+2)))=O((2*(q+n)))>. For fixed , the computation of all > and > thus takes a time (2*(q+n))*n*2k>)=O((n*(q+n)))>. Since there are n> stages, the time complexity of the complete algorithm is bounded by (n*(q+n))*log n)=O((n*(p+n))*log n)>. If , then the bound from lemma reduces to (n)*log n)>, which is not very satisfactory. Indeed, in the very special case when =\*\*j/n>> for j\n>, we may achieve the complexity (p)*n*log n)>, by using the fast Fourier transform. In general, we may strive for a bound of the form (p*n)*log n)>. We have not yet been able to obtain this complexity, but we will now describe some partial results in this direction. Throughout the section, we assume that . <\proposition> With the notations of proposition, assume that \|=\=\|z\|=1>. Then we may p>>-evaluate at ,\,z> in time (p*log n)*p*n)>. <\proof> Let , and =\*\*j/N>> for all j\N>. Each of the points > is at the distance of at most to one of the points >. We will compute )> using a Taylor series expansion of at >. We first observe that <\equation> (\)\||i!>\*\|P\|\\n. Using fast Fourier transforms of size , we first compute *2D>>-approximations of (\)> for all D> and N>. This can be done in time <\equation*> O((D*log n)*(D*log n)*N*log N)=O((p*log n)*p*n). From () it follows that <\eqnarray*> P(z)-D>(\)|i!>*(z-\)>||D>(\)|i!>*(z-\)>>||>|D>2i>\2\2.>>>> The p>>-evaluation of sums of the form D>(\)|i!>*(z-\)> using Horner's rule takes a time (p*log n)*D)=O((p*log n)*p*n)>. <\proposition> With the notations of proposition, we may p>>-evaluate at ,\,z> in time (n*p*log n))>. <\proof> The proof of the above proposition adapts to the case when \\|z\|\1> for all. Let us now subdivide the unit disk into annuli \\|z\|\1-> and the disk of radius >. For a fixed annulus, we may evaluate at each of the > inside the annulus in time (p*log n)*p*n)>. For 1->, we have <\equation*> V>P*z\*1-\2, provided that n*p*log 2>. Consequently, taking p>, we may evaluate at the remaining points in time (V)*log V)>. Under the condition n*p*log 2>, and up to logarithmic terms, the sum <\equation*> T*(p*log n)*p*n+(n/V)*(V)*log V becomes minimal for *p1/2>)> and *p)>. <\proposition> With the notations of proposition, assume that there exists an \\> with \|\1> and -\\|\1/(2*n)> for all . Then we may p>>-evaluate at ,\,z> in time (n*p*log n))>. <\proof> We will only provide a sketch of the proof. As in the proof of proposition, we first compute the Taylor series expansion of at > up to order . This task can be performed in time (n*(p+p*log n)))> via a division of by )> followed by a Taylor shift. We next evaluate this expansion at -\> for all . According to proposition, this can be done in time (n*p)*log p)>. <\proposition> Let 0> be a fixed constant. With the notations of proposition, assume that -\*\*j/n>\|\c/n> for all . Then we may p>>-evaluate at ,\,z> in time (p+log n)*n*log n)>. <\proof> Again, we will only provide a sketch of the proof. The main idea is to use the binary splitting algorithm from the previous subsection. Let =\*\/n>>. If =\> for all , then we obtain =z>-\>> and the algorithm reduces to a fast Fourier transform. If each > is a slight perturbation of >, then > becomes a slight perturbation of >-\>>. In particular, the Taylor coefficients 1>)> of 1>> only have a polynomial growth )> in . Consequently, the Euclidean division by > accounts for a loss of at most instead of bits of precision. The binary splitting algorithm can therefore be carried out using a fixed precision of n)> bits. Sometimes, it is possible to decompose +\+n>, such that a fast multi-point evaluation algorithm is available for each set of points +\+n>,\,z+\+n-1>}>. This leads to the question of evaluating at n> points. <\proposition> Let +\+P*z\\[z]> and ,\,z\\> be such that n>, 1> and \|\1> for all . Let (m,p)> be the time needed in order to compute p>>-evaluations of polynomials +\+Q*z\\[z]> with 1> at ,\,z>. Then p>>-evaluations of at ,\,z> can be computed in time (m,p)*n/p+(p)*n)>. <\proof> Without loss of generality, we may assume that and are powers of two. We decompose as follows: <\eqnarray*> ||+\*z+\+\*z>>|>||+\+P*z.>>>> We may compute p-2>>-evaluations of the > at ,\,z> in time (m,p)*n/m)>. We may also p-2>>-approximate ,\,z> in time (p)*n*log m/m)=O((p)*n)>, using binary powering. Using Horner's rule, we may finally p>>-evaluate <\equation*> P(z)=\(z)+\(z)*z+\+\(z)*(z) for ,m-1> in time (p)*n/m)=O((p)*n)>. Let +f*z+\> be a convergent power series with radius of convergence \0>. Given \>, we denote <\equation*> \<\|\|\>f\<\|\|\>=supR>\|f(z)\|. Using Cauchy's formula, we have <\equation> \|f\|=*\> t|t>\f\<\|\|\>|R>, For \> with r\R>, it follows that <\equation> \|f*z+f*z+\\|\f\<\|\|\>|1-r/R>*. Let (n,p)> be the time needed to compute p>>-approximations of ,\,f>. Given R>, let (n,p)> be the time needed to p>>-evaluate at *\*k/n>> for {0,\,n-1}>. <\lemma> <\enumerate-alpha> We have (n,p)=(O(p),O(p))+O(n*(p)*log p)>. We have (n,p)=(O(p),O(p))+O((p)*p*log p+n)>. <\proof> We first consider the case when . Let be the smallest power of two with <\equation> N\f\<\|\|\>|1-1/R>|log R>. For all 1>, the bound() implies <\equation> \|f(z)-f-\-f*z\|\2p-1>. The computation of p-log N-2>>-approximations of ,\,f> takes atime (N,p+O(log p))>. The p-1>>-evaluation of +\+f*z> at primitive roots of unity can be done using n/N\> fast Fourier transforms of size . This requires a time (p)*log p)>. If , we thus obtain the bound <\equation*> (n,p)=(O(p),p+O(log p))+O(n*(p)*log p). The general case is reduced to the case via a change of variables >. This requires computations to be carried out with an additional precision of N*log r=O(N)=O(p)> bits, leading to the complexity bound in(). As to (), we again start with the case when . Taking to be the smallest power of two with(), we again obtain() for all 1>. If n>, then we also notice that \|\2p-1>> for all N>. We may p-2>>-evaluate +\+f*z> at the primitive -th roots of unity in time (N,p+2)>. We next retrieve the coefficients of the polynomial+\+f*z> up to precision > 2p-1>> using one inverse fast Fourier transform of size . In the case when , we thus obtain <\equation*> (n,p)=(O(p),p+O(1))+O((p)*p*log p+n). In general, replacing by yields the bound in(). Let us now consider two convergent power series \[[z]]> with =0>, f\<\|\|\>\1> and g\<\|\|\>\1>. Then g> is well-defined and h\<\|\|\>\1>. In fact, the series , and still converge on a compact disc of radius 1>. Let \>> be such that f\<\|\|\>\B>, g\<\|\|\>\B> and h\<\|\|\>\B>. Then() becomes <\equation> \|f*z+f*z+\\|\*RN> and similarly for and . In view of lemma, it is natural to use an evaluation-interpolation scheme for the computation of p>>-approximations of ,\,h>. For a sufficiently large and =\*\/N>>, we evaluate on the -th roots of unity \,\,\> using one direct FFT on ,\,g>. Using the fact that )\|\1> for all and the algorithm for multi-point evaluation from the previous section, we next compute ,h(\)>. Using one inverse FFT, we finally recover the coefficients of the polynomial +\+H*z> with )=h(\)> for all N>. Using the tail bound() for , , and a sufficiently large , the differences -h\|> can be made as small as needed. More precisely, the algorithm goes as follows: <\algorithm|compose> \[[z]]> with =0> and \>, > such that we have bounds f\<\|\|\>\1>, g\<\|\|\>\1> and > f\<\|\|\>\B>, g\<\|\|\>\B> and f\g\<\|\|\>\B> for certain 1> p>>-approximations for g),\,(f\g)> <\body> [Determine auxiliary degree and precision] <\indent> Let 2>> be smallest with p+2+log /log R> and n> Let N*log R-log >, so that q>=N>|1-1/R>> <\body> [Evaluate on roots of unity ,\>] <\indent> Let \\*\/N>> Compute a q>>-approximation ,\,z)\FFT>(g,\,g)> with \|\1> We will show that -g(\)\|\2> for all <\body> [Evaluate on ,g(\)>] <\indent> Let f+\+f*z> Compute a q>>-approximation ,\,v)\(F(z),\,F(z))> We will show that -f(g(\))\|\(2+2*n)**2q>> for all <\body> [Interpolate] <\indent> Compute a >-approximation ,\,u)\FFT1>>(v,\,v)> We will show that -(f\g)\|\(4+2*n)**2q>> for all n> Return ,\,u)> <\theorem> Let \[[z]]> be power series with =0> and g>. Assume that f\<\|\|\>\1> and g\<\|\|\>\1>. Given \>, we may compute p>>-approximations for ,\,h> (as a function of ,\,f> and ,\,g>) in time ((n+p))*log(n+p))>. <\proof> Let us first prove the correctness of the algorithm. The choice of and the tail bound() for imply *z+g*z+\\|\2q>>. This ensures that we indeed have -g(\)\|\2> for all at the end of step 2. For N>, we also have <\eqnarray*> -h(\)\|>|>|-F*(z)\|+\|F(z)-F(g(\))\|+\|F(g(\))-f(g(\))\|>>||>|+\<\|\|\>F\<\|\|\>*\|z-g(\)\|+f\<\|\|\>*RN>|1-1/R>>>||>|q>.>>>> This proves the bound stated at the end of step 3. As to the last bound, let +\+h*z>. Then )-h(\)\|\2q>> and ,\,h)=FFT>1>(H(1),\,H(\))>. Using the fact that >1>(a)\|\\|a\|> for all vectors of length , we obtain <\eqnarray*> -h\|>|>|>1>(H(1)-h(1),\,H(\)-h(\))\|+>>|||>1>(v-h(1),\,v-h(\))\|+>>|||-FFT>(v,\,v)\|>>||>|q>.>>>> This proves the second bound. Our choice of implies the correctness of the algorithm. As to the complexity bound, the FFT transform in step 2 can be done in time <\equation*> O((q)*N*log N)=O((p+O(log N))*N*log N)=O((N*(p+N))*log N). By lemma, the multi-point evaluation in step 3 can be performed in time <\equation*> O((N*(q+N))*log N)=O((N*(p+O(N)))*log N)=O((N*(p+N))*log N). The inverse FFT transform in step 4 can be performed within the same time as a direct FFT transform. This leads to the overall complexity bound <\equation*> O*((N*(p+N))*log N)=O(((n+p))*log(n+p)), since . <\corollary> Let \[[z]]> be convergent power series with =0> and g>. Given \>, we may compute p>>-approximations for ,\,h>(as a function of ,\,f> and ,\,g>) in time ((n+p))*log(n+p))>. <\proof> Let 0> be sufficiently small such that g\<\|\|\>\\\> and f\<\|\|\>\\\>. Consider the series and , so that <\equation*> (f\g)(z)=T*(F\G)(z/R). Let p-n*log min (1,R)+log max(T,1)+1\>. By the theorem, we may compute k>>-approximations ,\,> of G),\,(F\G)> in time ((n+k))*log(n+k))=O(((n+p))*log(n+p))>. Using additional -bit multiplications, we may compute <\equation*> =T*,\,=T*|R> with -h\|\2k>*T/min(1,R)\2p>> for all . This can again be done in time (k))=O(((n+p)))>. <\corollary> Let \[[z]]> be convergent power series with =0>, such that g\\[[z]]> is an ordinary generating function. Then we may compute ,\,h> (as a function of ,\,f> and ,\,g>) in time (n)*log n)>. <\proof> It suffices to take in corollary and round the results. <\corollary> Let \[[z]]> be convergent power series with =0> such that g=\>|n!>*z> is an exponential generating function. Then we may compute ,\,h\\> (as a function of ,\,f> and ,\,g>) in time (n*log n))>. <\proof> Applying corollary for log n!\+2=O(n*log n)>, we obtain )1>>-approximations |~>,\,|~>> of ,\,h/(n-1)!>> in time (n*log n))>. Using additional -bit multiplications of total cost (n log n))=O((n*log n))>, we may compute <\equation*> =0!*|~>,\,=(n-1)!*|~>, with -h\|\> for all . The > are obtained by rounding the > to the nearest integers. <\corollary> Let \[[z]]> be power series with =0> and g>. Let > and> be positive increasing functions with \|\2>> and \|\2>>. Given \>, we may compute p>>-approximations for ,\,h> (as a function of ,\,f> and ,\,g>) in time ((n+u+n*v+p))*log(n+u+n*v+p))>. <\proof> Without loss of generality, we may assume that =g=0> for n>. In the proof of corollary, we may therefore choose -1>>, which yields 2> and 2+n>>. It follows that p-n*log min (1,R)+log max(T,1)+1\\p+u+n*v+O(n)> and we conclude in a similar way as in the proof of corollary. Let \[[z]]> be such that =0> and g>. Given an order \>, we have shown in the previous section how to compute ,\,h> efficiently, as a function of ,\,f> and ,\,g>. Since > only depends on ,\,f> and ,\,g>, we may consider the in which ,\,f> and ,\,g> are given progressively, and where we require each coefficient > to be output as soon as ,\,f> and ,\,g> are known. Similarly, in the , the coefficients ,\,g> are known beforehand, but ,\,f> are given progressively. In that case, we require > to be output as soon as ,\,f> are known. The relaxed and semi-relaxed settings are particularly useful for the resolution of implicit equations involving functional composition. We refer to for more details about relaxed computations with power series. Let 2>>. In this section, we consider the problem of computing the semi-relaxed composition g> up to order . If , then we simply have =f>. For 2>, we denote <\eqnarray*> >||+\+f*z>>|>||+\+f*z,>>>> and similarly for and . Our algorithm relies on the identity <\equation> h=f\g+(f\g)*(g/z)*z+O(z). The first part > of is computed recursively, using one semi-relaxed composition at order2>: <\equation*> h=f\g+O(z). As soon as > is completely known, we compute \g> at order , using one of the algorithms for composition from the previous section. We also compute > at order using binary powering. We are now allowed to recursively compute the second part > of , using <\equation*> h=(f\g)+(f\g)*(g/z)+O(z). This involves one semi-relaxed composition \g> at order and one semi-relaxed multiplication \g)*(g/z)> at order . Assume now that \|\1> for all and g\<\|\|\>\1>. Consider the problem of computing p>>-approximations of ,\,h>. In our algorithm, it will suffice to conduct the various computations with the following precisions: <\itemize> We recursively compute p>>-approximations of \g),\,(f\g)>. We compute p-1>>-approximations of \g),\,(f\g)>. We recursively compute p-n-2>>-approximations of \g),\,(f\g)>. We compute p-\log n\-2>>-approximations of the coefficients of )>. We compute >-approximations of the coefficients of \g)*(g/z))>. Let us show that this indeed enables us to obtain the desired p>>-approximations for ,\,h>. This is clear for the coefficients of >. As to the second half, we have the bounds <\eqnarray*> f\<\|\|\>>|>|\|\n/2>>|f\g\<\|\|\>>|>|f\<\|\|\>g\<\|\|\>>\n/2>>|\g)\|>|>|f\g\<\|\|\>\n/2>>>> and <\eqnarray*> /z)>|>|>>>|/z))\|>|>|.>>>> These bounds justify the extra number of bits needed in the computations of > and \g)> respectively. Let us finally analyze the complexity of the above algorithm. We will denote by (n,p)> the complexity of multiplying two -bit integer polynomials of degrees > n>. Using Kronecker multiplication, we have (n,p)=O((n*p))>. We denote by (n,p)> the cost of asemi-relaxed multiplication of two -bit integer polynomials of degrees > n>. Using the fast relaxed multiplication algorithm from, we have (n,p)=O((n,p)*log n)>. We will denote by (n,p)> and (n,p)> the cost of classical and semi-relaxed composition for , and as described above. By theorem, we have (n,p)=O*(((n+p))*log(n+p))>. The complexity of the semi-relaxed composition satisfies the following recursive bound: <\eqnarray*> (n,p)>|>|(n/2,p+1)+(n/2,p+n+2)+(n,p)+>>|||(n,p+n+2)*log n+(n/2,p+n+2)+O(n*(p+n))>>||>|(n/2,p+n+2)+(n,p)+O((n*(p+n))*log n).>>>> Using theorem, it follows by induction that <\equation> (n,p)\2*(n*2k>,p+O(n))+O(2*((n+p))*log(n+p)). From, we also have <\equation> (n,p)=O((n*(p+n))* n>)). Applying() for \n> and() on (n*2k>,p+O(n))>, we obtain <\eqnarray*> (n,p)>||*((n+p))*log(n+p))+O((n*(n+p))*n*log n))>>|||((n+p))*log (n+p)).>>>> We have proved the following theorem: <\theorem> Let \[[z]]> be power series with =0> and \>. Assume that f\<\|\|\>\1> and g\<\|\|\>\1>. The computation of the semi-relaxed composition g> at order and up to an absolute error > 2p>> can be done in time ((n+p))*log(n+p))>. Let \[[z]]> be as in the previous section and 2>>. Assume also that \0>. In order to compute the relaxed composition g>, we again use the formula(), in combination with <\equation*> f\g=f\g+\g)|g>*g*z+O(z) The first coefficients of g> are still computed recursively, by performing a relaxed composition \g> at order . As soon as > and > are completely known, we may compute the composition \g> at order using the algorithm from section. Differentiation and division by > also yields \g)/g> at order . The product \g)/g)*g> can therefore be computed using one semi-relaxed multiplication. Let log n\+2>. This time, the intermediate computations should be conducted with the following precisions: <\itemize> We recursively compute q>>-approximations of the coefficients of >. We compute >-approximations of the coefficients of >. We compute >-approximations of the coefficients of \g)/g)>. We compute p-2>>-approximations of the coefficients of \g)/g)*g)>. Indeed, we have <\eqnarray*> >|>|2>>>|)1>)>|>|>>|)1>)\|>|>|>>|\g))\|>|>|g)\|\n/2>>|\g)/g)\|>|>|*4\2.>>>> Denoting by (n,p)> the complexity of relaxed composition, we obtain <\eqnarray*> (n,p)>|>|(n/2,p)+(n/2,p+n+2)+(n,q)+>>|||(n/2,q))+(n/2,q)+>>|||(n,p+n+2)*log n+(n/2,p+n+2)+O(n*(p+n))>>||>|(n/2,p)+O(((n+p))*log (n+p))>>>> It follows that <\equation*> (n,p)=O(((n+p))*log (n+p)). <\theorem> Let \[[z]]> be power series with =0>, \0> and \>. Assume that f\<\|\|\>\1> and g\<\|\|\>\1>. The computation of the relaxed composition g> at order and up to an absolute error > 2p>> can be done in time ((n+p))*log(n+p))>. <\remark> From, we have <\equation*> (n,p)=O((n*(p+n))* n>))=O(((n+p))*log (n+p)). Theorem improves on this bound in the frequent case when p>. Unfortunately, we have not yet been able to prove a quasi-linear bound (n,p)=((n+p))>. <\remark> For certain types of functional equations, one may avoid to apply theorem. For instance, power series reversion can directly be reduced to composition using Newton's method. The solutions of certain other equations, such as <\equation*> f(z)=\+f(z*f(z)), can be evaluated efficiently on small disks. Consequently, the Taylor coefficients of can be computed efficiently using lemma. <\bibliography|bib|alpha|all> <\bib-list|AHU74> A.Aho, J.Hopcroft, and J.Ullman. . Addison-Wesley, Reading, Massachusetts, 1974. D.J. Bernstein. Composing power series over a finite ring in essentially linear time. , 26(3):339--341, 1998. R.P. Brent and H.T. Kung. )> algorithms for composition and reversion of power series. In J.F. Traub, editor, , Pittsburg, 1975. Proc. of a symposium on analytic computational complexity held by Carnegie-Mellon University. R.P. Brent and H.T. Kung. Fast algorithms for composition and reversion of multivariate power series. In , pages 149--158, Waterloo, Ontario, Canada, August 1977. University of Waterloo. R.P. Brent and H.T. Kung. Fast algorithms for manipulating formal power series. , 25:581--595, 1978. Alin Bostan, Bruno Salvy, and Éric Schost. Power series composition and change of basis. In J.Rafael Sendra and Laureano González-Vega, editors, , pages 269--276, Linz/Hagenberg, Austria, July 2008. ACM. D.G. Cantor and E.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693--701, 1991. J.W. Cooley and J.W. Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297--301, 1965. M.Fürer. Faster integer multiplication. In , pages 57--66, San Diego, California, 2007. L.Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. , 92:1--122, 1882. A.Schönhage. Asymptotically fast algorithms for the numerical multiplication and division of polynomials with complex coefficients. In J.Calmet, editor, , volume 144 of , pages 3--15, Marseille, France, April 1982. Springer. A.Schönhage and V.Strassen. Schnelle Multiplikation grosser Zahlen. , 7:281--292, 1971. J.vander Hoeven. Relax, but don't be too lazy. , 34:479--542, 2002. Joris vander Hoeven. New algorithms for relaxed multiplication. , 42(8):792--802, 2007. <\initial> <\collection> <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > |\>|?>> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> BK78 CT65 SS71 CK91 BK78 vdH:relax BSS08 Ber98b SS71 Fur07 Kron1882 Sch82 AHU74 Sch82 BK77 vdH:relax BK75 Kron1882 Sch82 Sch82 Sch82 vdH:relax AHU74 vdH:relax vdH:newrelax vdH:relax BK75 vdH:relax BK75 vdH:relax BK75 <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |math-font-series||font-shape||2.Fundamental operations> |.>>>>|> |2.1.Multiplication |.>>>>|> > |2.2.Inversion |.>>>>|> > |2.3.Euclidean division |.>>>>|> > |math-font-series||font-shape||3.Multi-point evaluation> |.>>>>|> |3.1.The binary splitting algorithm |.>>>>|> > |3.2.Low precision methods |.>>>>|> > |math-font-series||font-shape||4.Composition of power series> |.>>>>|> |4.1.Evaluation of truncated power series |.>>>>|> > |4.2.Composition of power series |.>>>>|> > |4.3.Variants of the main theorem |.>>>>|> > |math-font-series||font-shape||5.Relaxed composition> |.>>>>|> |5.1.Semi-relaxed composition |.>>>>|> > |5.2.Relaxed composition |.>>>>|> > |math-font-series||font-shape||Bibliography> |.>>>>|>