> <\body> |<\doc-note> Grégoire Lecerf has been supported by the French NODE project. Joris van der Hoeven has been supported by an ERC-2023-ADG grant for the ODELIX project (number 101142171). <\wide-tabular> ||||||| ||LOGO_ERC-FLAG_FP.png>|61.62pt|28.51pt||>>>>> \ ||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris Bâtiment Alan Turing, CS35003 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |>>||<\author-affiliation> Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) CNRS, École polytechnique, Institut Polytechnique de Paris Bâtiment Alan Turing, CS35003 1, rue Honoré d'Estienne d'Orves 91120 Palaiseau, France |>>||>|<\doc-note> > , , and with coefficients in a prime finite field, we present a new algorithm for computing Q> modulo in time close to linear and with an asymptotic complexity smaller than the one of the Kedlaya\UUmans algorithm. As a novelty, our method mostly performs fast floating point Fourier transforms, while previously known ones rely on ad hoc algebraic constructions of finite fields.>> Let > be an effective commutative ring, so that we have algorithms for the ring operations. Given a monic polynomial \> of degree 1> and polynomials and in > of degreeD> the computation of the remainder of Q> in the division by , written Q rem R>>, is called the problem of . Modular composition is a central operation in computer algebra, especially for irreducible polynomial factorization; see for instance. It is still unknown whether modular composition can be achieved in time nearly linear in or not for any ground ring>. In this paper, we prove the following new complexity bound for when > is a finite ring /r*\> and where |)>> is a common abbreviation for *|)>>|)>>. <\theorem> The bit cost of degree modular composition over /r*\> is bounded by <\equation*> D>*. This result improves upon the best previously known bound <\equation*> D|)>|)>|)>>* given in4>, which is derived from an algorithm due to Kedlaya and Umans. The novelty of our approach is a reduction of modular composition to the floating point evaluation of a multivariate polynomial at a special set of points, asocalled. We take advantage of fast Fourier transforms to perform these evaluations. In contrast, previously known methods, in the vein of, rely on number theoretic constructions. Let > denote a cost function that bounds the number of operations in > required to multiply two polynomials of degreeD>> in >. Over any ring >, modular composition can be performed with |)>> operations in > by applying Horner's rule to evaluate |)>> in /|)>>. In 1978, Brent and Kung gave a faster algorithm with cost +*|)>>, that uses the technique. Their algorithm even yielded asub-quadratic cost /2>+*|)>> when combined with fast linear algebra; see185>. Here, the constant> is a real value between 3 and 4 such that the product of a n> matrix by a \n> matrix takes >|)>> operations; one may take \3.250385> according to. At present time, the fastest known modular composition method over any ground field > is due to Neiger, Salvy, Schost, and Villard: it is probabilistic of Las Vegas type and takes an expected number of <\equation*> >|)>,\\1+-1>+-2>> operations in >; see1.1>. Here, the constant > denotes any real value between and such that two n> matrices over a commutative ring can be multiplied with >|)>> ring operations. The current best known value is \2.371552>, so we may take\1.43>>. A major breakthrough for the modular composition problem is due to Kedlaya and Umans in the case where > is a finite field > (and even more generally a finite ring of the form /r*\|)>/|)>> for any integer and > monic). For any fixed real value \0>, they showed that Q rem R> can be computed using >|)>> bit operations. The dependency in > is analyzed in. Recent improvements of the Kedlaya\UUmans approach can be found in, but the dependency in > is not detailed. In it was shown how the knowledge of factorizations of can be exploited to speed up modular composition. An important special case concerns the composition of power series, which corresponds to taking =x>. Recently, Kinoshita and Li showed how to accomplish this task using *log D|)>> operations in >. Our modular composition algorithm will be presented in section: Theorem follows from the sharper complexity bound given in Theorem. As in, the problem will be reduced to the multi-point evaluation of a multivariate polynomial, using Kronecker segmentation. But, instead of relying on finite field arithmetic, we benefit from floating point arithmetic and reduce the multi-point evaluation to a deformation of the fast Fourier transform. The deformed evaluation point set is presented in section. Corresponding evaluation and interpolation algorithms are given in section. The next section gathers prerequisites. For complexity analyses, we will consider bit complexity models such as computation trees or RAM machines. The function > will bound the bit cost for multiplying two integers of bit size m>. We will assume that /m> is nondecreasing. At the end, we will make use of the fast product of, that yields =O>. Numbers will be represented in radix , with a finite number of digits. We will represent fixed point numbers by a signed mantissa and a fixed exponent. More precisely, given aprecision parameter \>, we denote by > the set of complex numbers of the form *2>, where \\|]>> and \1>. We write *2> for the set of complex numbers of the form \>, where \> and \>; in particular, for \*2> we always have \2>. At every stage of our algorithms, the exponent will be specified, so the exponents do not have to be stored or manipulatedexplicitly. In the error analyses of our numerical algorithms, each \*2> is really the approximation of some genuine complex number \\>. So each such comes with an implicit error bound \0>; this is areal number for which we can guarantee that |\|>\\>. A of > is an approximation of> that is rounded towards zero in the sense that\|\|>>. Given \>, the sum can be computed exactly in time >. The product can be computed exactly in time|)>>. <\lemma> Given \> and \p>, we can compute an approximation of in >> with error2-1/2>> in time >. <\proof> Rounding \\> into >> can be done as follows: let > and > be the truncations to the nearest of and into >> and let \u+v*\\\>>; Then we have |\|>\2-1/2>>. <\lemma> Given \> and \p>, we can compute a truncation of in >> \ with error 2+1/2>> in time >. <\proof> Rounding \\> towards zero into >> can be done as follows: let > and> be the truncations of and at precision 2>> and let \u+v*\\\>>; Then we have|\|>\> and |\|>\2+1/2>.> Let > be a power of and let =,\,a-1>|)>> be a vector of complex numbers. We write |\|>\max|\|>,\,-1>|)>> and \\*\/\>>. <\lemma> Given \>, we can compute truncations of ,\,\,\-1>>> in > with error2> in time *|)>|)>>. <\proof> We use4> at precision \+2> in order to obtain first approximations of ,\,\,\-1>> in > with error 2>. We next truncate these results in >, which yields an overall error 2+2\2> thanks to Lemma. We define the fast Fourier transform of > to be <\equation*> >|)>\|)>,P|)>,P|)>,\,P-1>|)>|)>,> where \-1>a*x>. In particular, we have >|)>|\|>\\*|\|>>. <\lemma> Given \> and \\>>, a truncation of >|)>> in *\|)>>> with error \*2> can be computed in time |)>|)>|)>>. <\proof> Thanks to3> an approximation of >|)>> in >*2*\|)>>> can be computed with error \*2> in time |)>|)>|)>>. Consider an entry =+*\>> of >|)>> and let > be the corresponding approximation. Let \x+y*\>, where \x-sign*\*2> and \y-sign*\*2>. Then |\|>\|\|>>, |\|>\|\|>>, and -z|\|>\\*+2|)>\\*2>. We may clearly compute > in time|)>>. Thanks to Lemma, and using |)>> further operations, we may next compute a> of > with error\*2>. <\lemma> When \2>, we have |\|>\>>. <\proof> If =2>, then |\|>=2=4/\>. Otherwise, \4> and we have <\equation*> |\|>=2*sin|\>|)>\2*cos|4>|)>*|\>\>.> Given integers \1,\,\\1>, we define <\eqnarray*> ,\,z|]>,\,\>>|>|\,\,z|]>\deg> P\\,\,deg> P\\|}>,>>>> where > P> denotes the partial degree of in >, for ,n>. In this paper, a dense representation will be used for univariate and multivariate polynomials. This means that a polynomial \,\,z|]>,\,\>> is stored as a vector of size \\*\*\> made of the coefficients of the terms of of partial degree\> in >, for ,n>. Precisely, when , a polynomial -1>P*z> is stored as the vector ,\,P-1>|)>>. When 2>, a polynomial will be regarded as a univariate polynomial in > whose coefficients are polynomials in ,\,z|]>,\,\>>. Given a polynomial ,\,i>P,\,i>*z>*\*z>\\,\,z|]>>, we define <\equation*> \max,\,i>,\,i>|\|>.> Given polynomials \,\,z|]>,\,\>> the sum can be computed exactly in time *p|)>>. The product can be computed exactly in time *\*|)>|)>|)>> using Kronecker substitution; see8>. In addition we have \\>. Given integers \1,\,\\1>, we let \\*\*\> and consider a tuple of points <\eqnarray*> >||,\,\-1>|)>\|)>>.>>>> We define |\|>\max |)>|\|>> and call <\eqnarray*> >>|>|\,\,z|]>\P|)>=\|}>>>>> the of >. We also define the evaluation map <\eqnarray*> >:\,\,z|]>,\,\>>|>|>>>||>||)>\|)>,\,P-1>|)>|)>.>>>> We refer to the computation of >> as the problem of . The computation of >> is called the problem. >> is a bijection for aZariski dense subset of tuples \|)>>>. Whenever this is the case, the points ,\,\-1>> are pairwise distinct, so we have a natural bijection <\eqnarray*> ,\,z|]>/I>>|>|>>>|>>|>||)>.>>>> This allows us to use polynomials in ,\,z|]>,\,\>> as canonical representatives of residue classes in ,\,z|]>/I>>. In particular, given a polynomial \,\,z|]>>, we define >> to be the unique polynomial in ,\,z|]>,\,\>> with >|)>\I>>. For special tuples \|)>>>, called , we will show in this section that >> can be computed efficiently. Assume from now that ,\,\\2>> are powers of two and let \\*\/\>> be the standard primitive >-th root of unity in >. For each ,\-1|}>>, we let <\eqnarray*> >|>|/\>,\/*\|)>>,\,\|)>\\.>>>> We call the tuple \,\,\-1>|)>\|)>>> a . The vanishing ideal >> of this tuple of points is generated by the polynomials <\equation> z>-1,z>-z,z>-z,\,z>-z. A straightforward computation shows that these generators actually form a Gröbner basis for the grevlex monomial ordering. For instance, for i\j\n>, the -polynomial of >-z> and >-z> is <\equation*> >-z|)>*z>->-z|)>*z>=z*z>-z*z>, which reduces to *z-z*z=0>. In particular, the monomials in ,\,z|]>,\,\>> are reduced. Let \>, \,0|)>>, and >\\\|}>>. For =,\,\|)>\\>>, we have <\eqnarray*> >>|>|*\>*\*z*\>-z>*\*z>\I>.>>>> For convenience, we also set <\eqnarray*> >>|>|>>> Given \,\,z|]>,\,2*\>> and a family <\equation*> >|)>\\>>\\,\,z|]>,\,\>>,> we say that >|)>\\>>> is an of modulo >> if <\equation*> \\>A>*F>.> Since >=1>, we have >=P rem I>>. We further define <\equation*> |A|\<\|\|\>>\max\\> >|\|>.> Given \,\,z|]>,\,2*\>> we aim at computing an extended reduction of by>>. As a first step we begin with polynomials in ,\,z|]>+1,\,\+1>>. <\lemma> Let =,\,\|)>\\>. The map <\eqnarray*> +\>*\*z+\>\e\\,\,e\\|}>>|>|>*\*z>\e\\,\,e\\|}>>>|>*\*z>>|>|>*\*z> rem I>>>>> is well defined and one-to-one. <\proof> Since >> is generated by binomial polynomials, the reduction of a monomial by>> is a monomial. The defining equations of >> further imply that ,\,z> are invertible in ,\,z|]>/I>>. The > of a polynomial is the set of its monomials with a non-zero coefficient. We say that the monomials of > are pairwise distinct modulo >> when any two distinct monomials of > have distinct projections in ,\,z|]>/I>>. <\lemma> Let ,\,e>>P,\,e>*z>*\*z>\\,\,z|]>+1,\,\+1,\,\,\>> be such that the monomials of its support are pairwise distinct modulo >>. Let >,\,P>> be defined recursively as follows: >\P>, and <\equation> >\P>-\\\\>A>*F>>, where <\equation*> >\ quo \=\,\,e quo \=\>>P>,\,e>*z rem \>*\*z rem \>\>\,\,z|]>,\,\> for \\\\>. For ,k>, the following properties hold: <\itemize> >\\,\,z|]>+1,\,\+1,\,\,\>>, the monomials of the support of >> are pairwise distinct modulo >>, the coefficients of >> are coefficients of . Letting >\0> for \\\> and >\P>>, the family >|)>\\>> is an extended reduction of . <\proof> The proof is done by induction on . The properties are clear for >>. Let us assume that they hold for k>. We decompose >> into >=L+H*z>> with <\eqnarray*> |>|,\,z|]>+1,\,\+1,\,\,\>>>||>|,\,z|]>+1,\,\+1,1,\,\,\>.>>>> We verify that <\equation*> >=L+\\\\>A>*F>+R> where <\equation*> R\\\\\>A>*z>*\*z>\\,\,z|]>+1,\,\+1,\,\,\>. Since the monomials of the support of are pairwise distinct modulo >>, the supports of and are disjoint. In particular the monomials of the support of >=L+R> are pairwise distinct modulo >>, and the nonzero coefficients of >> are coefficients of .\ Finally unrolling yields <\equation*> \\>\>A>*F>+P>=\\\>A>*F>,> which constitutes the extended reduction of . The reduction of a general \,\,z|]>,\,2*\>> modulo >> can be performed efficiently by the following algorithm. <\specified-algorithm> <\description> ,\,e>P,\,e>*z>*\*z>\\,\,z|]>,\,2*\>>. An extended reduction >|)>\\>\*2|)>,\,z|]>>,\,\>> of . <|specified-algorithm> <\enumerate> For all =,\,\|)>\\> set <\equation*> A>> quo \=\,\,e quo \=\>P,\,e>*z rem \>*\*z rem \> and compute <\equation*> >\\\>>A>*z>*\*z>.> Let >\0> for \\\>, and for from down to do: <\itemize> For all \\\\>, set <\equation*> >\ quo \=\,\,e quo \=\>>P>,\,e>*z rem \>*\*z rem \>>; Compute >\P>-\\\\>A>*F>>>>. \; Let >\P>>, compute >\A>+A>> for \\>, and return >|)>\\>>. <\proposition> Algorithm is correct and runs in time *\*|)>>. <\proof> For all =,\,\|)>\\>, we have >\>\,\,z|]>,\,\>>. Let us examine the particular case where >> restricts to a single term >*z>*\*z>> for some \\>>. By Lemma, the monomials of the support of >> are pairwise distinct modulo>>, so Lemma implies the following properties, for ,n-1>: <\itemize> >\\,\,z|]>+1,\,\+1,\,\,\>>, the coefficients of >> are coefficients of >>, hence of , >|)>\\>> is an extended reduction of >>. Since the operations in step2 are linear with respect to the coefficients of >>, the general case where >> is a sum of -1> terms of the form >*z>*\*z>> satisfies the following properties: <\itemize> >\\*2,\,z|]>+1,\,\+1,\,\,\>>, >|\|>\2-1>, >|)>\\>> is an extended reduction of >>. On the other hand step1 yields <\equation*> \\>A>*F>>+P>. So the correctness proof is completed by noting that <\equation*> \\>A>*F>>+\\>A>*F>=\\>A>*F>, and >|\|>\>|\|>+>|\|>\2>. As for the complexity analysis, step1 takes time *\*|)>>. Step2 runs in time <\equation*> Oi\n>2*\*|)>=O*\*|)>. The cost of step3 does not exceed *\*|)>>. <\example*> Let and -1>*z-1>>. With the notation as in Algorithm, the only non-zero >> is >=z-1>*z-1>>, and we have >=z>*z-1>>. It follows that all the >> are all but >=z-1>>, so we have >=z-1>>. The extended reduction of is <\equation*> P=z-1>*z-1>*F>+z-1>*F>+z-1>*F>. We now turn to the situation where \|)>>> is a sufficiently small perturbation of >. More precisely, we say that > is a when there exist polynomials <\equation*> E>\\,\,z|]>,\,\>> such that <\equation*> >\F>+E>\I>,> for all \\>>, and <\eqnarray*> >|>|-\|\|>,|E|\<\|\|\>>|)>\*\>,>>>> with the convention that >\0> and >\1>. We call and ,\,\> the and the of the spiroid and let <\equation*> |\>\max,n>-1|)>.> By Lemma we have -\>|\|>\4/\> for all k>. Consequently, <\equation*> -\|\|>\>\>\*mink> -\>|\|>,> so the points> are pairwise distinct. We let <\equation*> \>|2*\>,> where \0> is a constant, that will be specified later. We denote by ;p>\*2|)>,\,z|]>>> an approximation of >> with error \*2>, where |)>> will also be defined later. For convenience we always take ;p>\1>. We are to show that > is a spiroid as soon as-\|\|>> is sufficiently small. <\lemma> Let \\>> and let \,\,z|]>,\,\>> be such that |)>=\>. Then we have\|\|>>>. <\proof> The polynomial \P/\>,x/*\|)>>,\,x|)>\\> has degree <\eqnarray*> |>||\>*-1|)>+|\*\>*-1|)>+\+|\*\*\>*-1|)>>>|||*|)>+-*\|)>|)>+\+*\*\|)>-\|)>>>|||-1,>>>> and satisfies |)>=P|)>> for ,\-1>. This means that is the inverse Fourier transform of>, that is *FFT>|)>>, following the notation of section. We deduce that =\|\|>>. The two following technical lemmas will be needed several times in the paper. <\lemma> Let \> and let \,\,z|]>,\,\>> haveT> non-zero terms. Then we have <\equation*> -P|\|>\n*|\>*T**max,,1|)>|~>-1>*.> <\proof> We set \P*b|)>> for > and use the classical inequality <\equation*> -P|\|>=-f|\|>\max>|\|>.> From = P|\ z>*b|)>*-b|)>>, we obtain <\equation*> >|\|>\n*|\>*T**max,,1|)>|\>-1>*.> <\lemma> With the above notation and \0>, we have /\|)>>\2> for all \0> . <\proof> We verify that <\equation*> /\|)>>|)>=n*\*log/\|)>\|log 2>\-n>|\*log 2>\1.> <\lemma> Assume that \1>, \2>, and let \|)>>> be such that -\|\|>\\/|\>>. Given \\>>, there exists a unique \,\,z|]>,\,\>> such that |)>=\>. Moreover, \2*|\|>>. <\proof> Let \\> and define \\-P|)>>, where > stands for the unique polynomial in ,\,z|]>,\,\>> such that |)>\\>. From Lemma we know that |\|>\|\|>>. Using Lemma, we deduce <\equation*> |\|>=|)>-P|)>|\|>\n*|\>*\*|\|>*-\|\|>|)>|\>-1>*-\|\|>\|\|>*||\>>|)>|\>>*2-n>.> Using Lemma and \1>, we obtain |\|>\2*|\|>>, hence |\|>\2*|\|>> for 0>. Consequently, 0>P> is well defined and we have <\equation*> =\=P|)>+\=P|)>+P|)>+\=\=P|)>.> Finally, we verify that <\equation*> \0>|\|>\0>|\|>\|\|>*0>2\2*|\|>.> <\proposition> Assume that \1>, \2>, and let \|)>>> be such that -\|\|>\\>. Then > is a spiroid and we have \n*|\>*2*-\|\|>>. <\proof> From Lemmas and, and using <\equation*> -\|\|>\\=>*\|2*\>\||\>+1>,> we obtain <\eqnarray*> >|)>|\|>>||>|)>-F>|)>|\|>>>||>||\>+1|)>*2*-\|\|>|)>|\>+1|)>>*-\|\|>>>||>||\>+1|)>*2*||\>+1>|)>|\>+1|)>>*-\|\|>>>||>||\>*2*-\|\|>.>>>> Thanks to Lemma, there exists >\\,\,z|]>,\,\>> such that >|)>=-F>|)>> and <\equation*> >|\|>\2*>|)>|\|>\n*|\>*2*-\|\|>,> whence \n*|\>*2*-\|\|>\n*|\>*2*\\*\>>. In the rest of this section, we fix \,\,z|]>,\,2*\>> and assume that \|)>>> is aspiroid. A family <\equation*> >|)>\\>>\\,\,z|]>,\,\>>.> is called an of by the spiroid > if\ <\equation*> \\>A>*G>.> Since >=1>, we have >=A>>. <\proposition> Assume that \\>. Given \,\,z|]>,\,2*\>>, there exists an extended reduction >|)>\\>> of by > with |A|\<\|\|\>>\*|1-\/\>>. <\proof> If then we take >\0> for all \\>. Otherwise, without loss of generality, we may assume that =1>. For-1>, we construct sequences >>, >>|)>\\>> of polynomials as follows. For , we set >\P> and >>\0> for \\>. By induction, we assume that these sequences are defined up to some index for some 0>. We let <\equation*> >\P>-\\>A>>*G>,> and apply Proposition to >> without any rounding, and let >>|)>\\>> be the corresponding extended reduction with |A|\<\|\|\>>\2*>|\|>>. We have <\equation*> >-\\>A>>*F>=0.> It follows that <\eqnarray*> |>|\|>>||>-\\>A>>*G>|\|>>>|||\\>A>>*>-G>|)>|\|>>>||>|*2*>|\|>*\>>||>|>|\|>*|\>.>>>> Hence, <\equation*> >|\|>\|\>|)>,i\0.> In particular 0>A>>> converges to a limit written >>. For \\> we have <\equation*> >|\|>\0>>>|\|>\0>>|\|>*2\2*0>|\>|)>=|1-\/\>.> The proof of Proposition can be turned into an algorithm to compute approximations of extended reductions. For efficiency reasons, /\> is required to be sufficiently small. <\lemma> Assume \4> and \\>. Let \> and <\equation*> p+n-log \+1=O|)>.> Given \,\,z|]>,\,2*\>>, we can compute <\eqnarray*> >|)>\\>>|>|*2|)>,\,z|]>,\,\>>>>>> such that <\equation*> \\>A>*G>|\|>\2>, in time *\*|)>|)>*p|)>>. <\proof> We adapt the proof of Proposition. For-1>>, we construct sequences >\\,\,z|]>,\,2*\>> with >|\|>\1> and >>|)>\\>\*2|)>,\,z|]>,\,\>>> as follows. For , we set >\P> and >>\0> for \\>. By induction, we assume that these sequences have been defined up to for some 0>. We compute <\equation*> >\P>-\\>A>>*G;p>> exactly. Note that >=P>. We further assume by induction that >|\|>\1>. Applying Proposition to >>, let >|)>\\>> be the obtained extended reduction. Then <\equation*> >-\\>A>*F>=0.> Using Lemma, we compute a truncation >>> of >> in *2|)>,\,z|]>> with error 2\\*2>. We obtain <\eqnarray*> |>|\|>>||>-\\>A>>*G;p>|\|>>>|||\\>A>*F>-\\>A>>*G;p>|\|>>>||>|\\>A>*>-G;p>|)>|\|>+\\>>>-A>|)>*G;p>|\|>>>||>|*2*+\*2|)>*>|\|>+2*\*+\*2|)>*\*2>>||>|*\*\*/\+2|)>*>|\|>+2*\*+\*2|)>*\*2>>||>|+1>*>|\|>+2>.\2>)>>>>> Iterating the latter inequality yields <\eqnarray*> >|\|>>|>|+1>|)>+2>*j\i>+1>|)>>>||>|+1>|)>+>|1-2+1>>>>||>|+1>|)>+2.>>>> In particular, >|\|>\1.> We are done with the construction of the sequences >> and>>|)>\\>>. Let 0> be the smallest integer such that <\equation> +1>|)>+2\2, that is <\equation*> |-1>|\>\,> since \4>. Note that <\equation> \.> We take >\j\I>>A>>> and verify that <\eqnarray*> >|\|>>|>|j\I>>|\|>*2>>||>|+1>>+I*2|)>*2)>>>||>|+|)>*2\4> and)>>>||>|.>>>> Finally, we obtain <\eqnarray*> \\>A>*G>|\|>>|>|\\>A>*G;p>|\|>+\\>A>*>-G;p>|)>|\|>>>||>|+1>|)>+2+\*2*\*2 and)>>>||>|+2+1>|)>*2)>>>||>|.\4>)>>>>> As for the complexity, one product >>*G;p>> takes time *\*|)>|)>|)>>, and >|\|>> is computed with error >>. Consequently, the computation of >> from >> runs in time <\equation*> **\*|)>|)>|)>=O*\*|)>|)>|)>.> Each use of Proposition contributes *\*|)>|)>>. Since >, the total cost for computing >> and >>|)>\\>> up to index is *\*|)>|)>*p|)>>. The numerical convergence of the method behind Lemma is linear, which is sufficient for small precisions. For higher precisions, our next faster algorithm benefits from the \Pdivide and conquer\Q paradigm. In order to upper bound the norms of the extended reductions, we introduce the auxiliary notation <\equation*> \\p\10>>||)>*|)>*\*|p/2|\>-1>|)>p\10.>>>>> Note that > is nondecreasing in and that \2> for all \>. <\specified-algorithm> <\description-compact> \,\,z|]>,\,2*\>>. >|)>\\>\*2|)>,\,z|]>,\,\>>>, where p+n-log \+2>, <\equation*> \\>A>*G>|\|>\2>and>|A|\<\|\|\>>\2*\>. \4> and \\>. <|specified-algorithm> <\enumerate> If m\9* \|)>> then return the extended reduction from Lemma. Compute a truncation \\|p/2|\>+3>,\,z|]>> of with error 2|p/2|\>-5/2>> using Lemma. Apply Algorithm recursively to > and let >|)>\\>> denote the result. Compute a truncation \\|p/2|\>+3>,\,z|]>> of <\equation*> 2|p/2|\>+1>*\\>A>*G;p>|)> with error 2|p/2|\>-5/2>>. Apply Algorithm recursively to> and let >|)>\\>> denote the result. Return a truncation of >+2|p/2|\>-1>*A>|)>\\>> with error2>, by using Lemma. <\proposition> Algorithm is correct and runs in time <\equation*> O*\*|)>|)>*|)>|)>. <\proof> If m> then the output is correct in step1 by Lemma. Otherwise 10>, sothat |p/2|\>+3\p> and |p/2|\>+3\p>. Consequently, the algorithm always terminates. In step2, by induction, we have <\equation*> -\\>A>*G>|\|>\2|p/2|\>-3>and>|A|\<\|\|\>>\2.> We verify that <\eqnarray*> \\>A>*G;p>|\|>>|>||\|>+-\\>A>*G>|\|>+\\>A>*>-G;p>|)>|\|>>>||>||p/2|\>-5/2>+2|p/2|\>-3>+\*2*\*2>>||>||p/2|\>-3>*+1+\*2*\*2|p/2|\>+5>|)>>>||>||p/2|\>-3>*+1+2>|)>10>)>>>||>||p/2|\>-1>,\4>)>>>>> so the polynomial > is well defined in step3, and we have <\equation*> -\\>A>*G>|\|>\2|p/2|\>-3>>. Let >> be a truncation of >\A>+2|p/2|\>-1>*A>> with error 2>. On the one hand, <\eqnarray*> ||\\>>*G;p>|\|>>>||>|\\>A>*G;p>-2|p/2|\>-1>*P|\|>>>|||+2|p/2|\>-1>*-\\>A>*G>|\|>+2|p/2|\>-1>*\\>A>*>-G;p>|)>|\|>>>||>||p/2|\>-1>*|p/2|\>-5/2>+2|p/2|\>-3>>+\*2*\*2|)>>>||>|*+2+2-1>|)>10>)>>>||>|.>>>> On the other hand, we obtain <\eqnarray*> >|\|>\>|\|>>>|>|>|\|>+2|p/2|\>-1>*>|\|>>>||>|*|p/2|\>+3>+2|p/2|\>-1>*\|p/2|\>+3>|)>>>||>|*\|p/2|\>+3>*|p/2|\>-1>|)>>>||>|*\,10>)>>>>> which yields <\equation> \\>>*>-G;p>|)>|\|>\\*2*\*2\2+2>>. Combining and we deduce <\eqnarray*> \\>A>*G>|\|>>>|>|\\>>*G;p>|\|>>+\\>>*>-G;p>|)>|\|>+\\>>->|)>*G>|\|>>>>||>|+2+2>+2*\*2*|)>>>||>|+2+2>+2*\*\*2>>||>|*+2+2>+2+3/2>|)>>>||>|.>>>> This concludes the proof of the correctness of the algorithm. Let us now analyze the complexity. By Lemma, step1 contributes <\equation*> O*\*|)>|)>*p|)>. Each product >*G;p>> can be computed in time <\equation*> *\*+log \|)>|)>|)>=O*\*p|)>|)>.> The total cost to obtain > is therefore bounded by <\equation*> **\*p|)>|)>=O*\*p|)>|)>.> Let > be the computation time as a function of . So far, we have shown that <\equation*> \|p/2|\>+3|)>+|p/2|\>+3|)>+O*\*p|)>|)>>, for m>. Unrolling this inequality leads to <\eqnarray*> >||*\*p|)>*log p+||\>**\*m|)>*m|)>>>|||*\*p|)>*log p+*\*p|)>*m+*\*m|)>*m|)>>>|||*\*|)>|)>*|)>|)>.>>>> The following corollary is the main result of this section. <\corollary> Assume \4> and \\>. Given \,\,z|]>,\,2*\>>, an approximation \*2|)>,\,z|]>,\,\>> of P rem I>> with error2> can be computed in time <\equation*> O*\*|)>|)>*|)>|)>. <\proof> Proposition allows us to compute <\equation*> \ \+2>*2|)>,\,z|]>,\,\>> such that |)>|\|>\2> for some I>>. Using Proposition, we deduce that <\equation*> -P rem I>|\|>=|)>|)> rem I>|\|>\|1-\/\>.> Using Lemma, we finally compute an approximation of > in *2|)>,\,z|]>,\,\>> with error2> and verify that <\equation*> +|1-\/\>\2+|1-2>|)>\2.> This section is devoted to multi-point evaluation and interpolation at a spiroid. Evaluation will follow the classical \Pdivide and conquer\Q paradigm with respect to the number of evaluation points. Interpolation will reduce to evaluation. Let \|)>>> be as in the previous section. If =\=\=1> for some0>, then <\equation*> ,\,z|]>,\,\>=\,\,z|]>,\,\>.> Without loss of generality, this allows us to replace > by >> of dimension >\k> with >\|)>,\,|)>|)>> for each. In what follows, we will only work with spiroids > for which \1> whenever 2>. Our algorithm for multi-point evaluation of a polynomial \,\,z|]>,\,\>> at> is based on a generalization of the classical remainder tree technique used for the univariate case10>. If =1>, then an for > consists of a single leaf labeled by ;p>|)>\\>>. If\2> and \2>, then > induces two sequences of points >,\>>> which are defined by <\eqnarray*> >>|>|,\,\,\-2>|)>>>>|>>|>||\>,|\>,\,-1>|\>|)>>.>>>> For , we write >> for the dimension of>>, >,\,\>>>> for its partial degrees, and >> for its degree. The perturbations of the defining polynomials of >> are written >\>>|)>\\>>>>, so we have <\equation*> >>\F>+E>>\I>>> for all \\>>>>. We further define <\eqnarray*> >>|>|>-\>|\|>,|E>|\<\|\|\>>|)>,>>||\>>>|>|,n>>>-1|)>,>>|>>|>|>|2>>*\>>>>.>>>> Note that >\\/2>, >\2*\>, and |\>>\|\>> , for . An for the spiroid > consists of a root that is labeled by the family ;p>|)>\\>> and two children which are recursively the evaluation trees for >> and >>. Since >-\>|\|>\-\|\|>\\>, if -\|\|>\\> then >-\>|\|>\>|)>*2>. So, Proposition ensures that the >>> exists with >>|\|>\n>*|\>>*2*>-\>|\|>>, whenever \1>>. This shows that evaluation trees exist whenever -\|\|>> is sufficiently small. The construction of these trees is postponed to the end of this section. An evaluation tree for > is said to be known \*2> if >|)>\\>> is given with error \*2>, and if the evaluation tree for >> is known with error \>*2> for >. We will see in section that for such an evaluation tree with error \*2>, the ;p>> will be elements of *2,\,z|]>> with \+O>. Our next algorithm performs fast evaluation at > by taking advantage of evaluation trees. <\specified-algorithm> <\description-compact> \,\,z|]>,\,\>> and evaluation trees for >> with error \>*2> for . An approximation of |)>> in *2*\|)>>> with error |)>*2=\*2>, where p-*log \+1>. \4> and \\>. <|specified-algorithm> <\enumerate> If =1> then return an approximation of > in *2*\|)>>> with error |)>*2>, byusing Lemma. Compute a truncation \\,\,z|]>,\,\>> of *z|)>> with error2>. Compute truncations >> and >> of >>> and rem I>>> in *2,\,z|]>,\,\>> with error 2> via Corollary. Apply Algorithm recursively to *P>> and the evaluation sub-trees for>> in order to obtain approximations ,\,v/2-1>|)>\>>*\>|)>/2>> of *P>>|)>> with error \*2>>>, where >\p->+3|)>*log/2|)>+1>. Apply Algorithm recursively to *P>> and the evaluation sub-trees for>> in order to obtain an approximation ,\,w/2-1>|)>\>>*\|)>>/2>> of *P>>|)>> with error \*2>>>, where >\p->+3|)>>*log/2|)>+1>. Return an approximation of *,w,\,v/2-1>,w/2-1>|)>> in *2*\|)>>> with error|)>*2>. <\proposition> Algorithm is correct and takes time <\equation*> O*\*p|)>*|)>*log \|)>, whenever =O>. <\proof> If =1> then the algorithm behaves as expected. Let us now assume that \2>. The correctness mostly relies on the identities <\eqnarray*> |)>>||>>|)>|)>>>||)>>||*z|)> rem I>>|)>/\|)>,>>>> for ,\/2>. Let us verify that computations are done with sufficiently large precisions. By construction, we have -P*z|)>|\|>\2> and |\|>\*z|)>|\|>\1>. Corollary yields >|\|>\2>, >|\|>\2> and <\eqnarray*> >->>|)>|\|>>|>|>>|>- rem I>>|)>|\|>>|>||\|>*2\2.>>>>> Since \\\\/|\>>, Lemma ensures that <\equation> |)>|\>>\2.> In particular, it follows that |)>|\|>\2*\>. Assuming by induction that the claimed properties hold for the recursive call in step4, we obtain <\eqnarray*> |*v-P|)>|\|>>|>|*-2*P>|)>|\|>+>|)>->>|)>|)>|\|>>>||>|*/2|)>*2+/2|)>*2*|)>|\>>>>||>|*2+\*2)>>>||>|*2*+2|)>\2>)>>>||>|*2.>>>> Similarly, for > we obtain <\eqnarray*> |*w-P|)>|\|>>|>|*-2*P>/\|)>|\|>>>|||+>/\|)>- rem I>>|)>/\|)>|\|>>>|||+/\|)>-P|)>|\|>>>||>|*/2|)>*2+/2|)>*2*|)>|\>>+\*2*|)>|\>>>>||>|*2+\*2+\*2)>>>||>|*2*+2+2|)>\2>)>>>||>|*2.>>>> We are done with the correctness. As for the complexity analysis, the computation of |)>,\-1>> and then of >, with > bits of precision in step2 can be done in time *|)>> by Lemma, under the assumption \=O>. Therefore, step2 runs in time*|)>>. In view of Corollary, the complexity ,\,\|)>> of Algorithm satisfies <\eqnarray*> ,\,\|)>>|>|*\*|)>|)>*|)>+2*,\,\,|2>|)>,>>>> for some universal constant 0>. Unrolling this inequality, we obtain <\eqnarray*> ,\,\|)>>|>|*\*p|)>*|)>*log \+\*,\,\|)>.>>>> Unrolling this new inequality, while using the geometric increase of >, we obtain the desired complexity bound. Our interpolation algorithms rely on the evaluation one. We begin with a method dedicated to small precisions. <\lemma> Given \\>>, we can compute \,\,z|]>,\,\>> such that \> and |)>-\|\|>\\*2> in time <\equation*> O*|)>|)>|)>. <\proof> Using Lemma, we compute \*\|)>>> such that <\equation> -FFT>|)>|\|>\\*2> and |\|>\\*|\|>> in time *|)>|)>|)>>. For we take the unique polynomial of ,\,z|]>,\,\>> such that <\equation*> /\>,x/*\|)>>,\,x|)>=\*,\-1>u*x,> so we have |)>=\*FFT>|)>>. Finally yields <\eqnarray*> |)>-\|\|>>||*FFT>|)>-\|\|>>>||>|*2.>>>>> <\lemma> Assume \4>, \\>, and *2\1>. Given an evaluation tree for >> with error \>*2> for , and \\>>, we can compute *2|)>,\,z|]>,\,\>> such that <\equation*> |)>-\|\|>\\*2> in time *\*p|)>*|)>*log \|)>>. <\proof> Let p-*log \+1>. By induction, we construct sequences |)>0>> and|)>0>> such that \*2|)>,\,z|]>> and \*2|)>>>. We set \0> and \\>>. For 0>, we first compute \*2|)>,\,z|]>,\,\>> as the polynomial interpolated from> by inverse FFT as in Lemma, so we have |\|>\\> and |)>-\|\|>\\*2>. Using Algorithm, we then compute an approximation \*2*\|)>>> of |)>> with error <\equation*> |)>*2=\*2.> Using Lemma, we next compute a truncation > of -\> with error2>. Then <\eqnarray*> |)>-\+\|\|>>|>||)>-\|\|>+-\+\|\|>>>||>|**2+2|)>>>||>|*\*2.>>>> Using Lemmas and, we obtain <\eqnarray*> |)>-Q|)>|\|>>|>||\>*\*|\|>*|)>|\>-1>*\>>||>|-3*n+1>)>>>||>|.>>>> It follows that <\eqnarray*> |\|>>|>|-\|\|>>>||>|-Q|)>|\|>+|)>-Q|)>|\|>+|)>-\+\|\|>+--\|)>|\|>>>||>|**2+2+\*2+2|)> and)>>>||>|*+2+2+2|)>*2\1>)>>>||>|,>>>> so the sequence |)>0>> converges to . We define \P+Q> and verify that <\eqnarray*> |)>-\+\|\|>>||j\i>|)>-\+\|)>|\|>>>||>|j\i>|)>-\+\|\|>>>||>|j\i>2*\*2)>>>|||*2*j\i>2>>||>|*2.>>>> On the other hand, we have |\|>\j\i>|\|>\2>. We take minimal such that <\equation*> 2\\*2, so we have >. Let be a truncation of > with error2>. Then <\eqnarray*> |)>-\|\|>>|>||)>-P|)>|\|>+|)>-\+\|\|>+|\|>>>||>||)>|\>>*\*2+\*2+2)>>>||>|*2+\*2+\*2)>>>||>|*2.>>>> Finally, Lemma asserts that each> can be computed in time *p|)>|)>>. By Proposition, each > is obtained in time *\*p|)>*|)>*log \|)>>. The next faster algorithm exploits the usual \Pdivide and conquer\Q paradigm for large precisions . <\specified-algorithm> <\description-compact> Input>\\>> with \2> and evaluation trees for>> with error \>*2> for . *2|)>,\,z|]>,\,\>> such that |)>-\|\|>\\*2>. \4> and \\>. <|specified-algorithm> <\enumerate> If m\*log \+11> then compute \*2|)>,\,z|]>,\,\>> such that |)>-\|\|>\\*2> via Lemma. Return an approximation of > in *2|)>,\,z|]>,\,\>> with error 2>, by using Lemma. Let |p/2|\>> and |*log \+7|)>/2|\>> and let > be a truncation of > in>> with error 2>. Apply Algorithm recursively to > and let \*2|)>,\,z|]>,\,\>> denote the output. Compute an approximation \*2*\|)>>> of |)>> with error\*2>, byapplying Algorithm to *P>, where p-*log \+1.> Let *log \-h-l+5> and compute a truncation \*2|)>>> of-\>> with error 2>. Apply Algorithm recursively to *\> and let \*2|)>,\,z|]>,\,\>>> denote the output. Return an approximation of +2*P> with error 2>. <\proposition> Algorithm is correct and runs in time <\equation*> O*\*p*|)>|)>*log p|)>. <\proof> If m>, then p/2-1\l>, whence p>, which implies the termination of the algorithm. In step1 we obtain <\eqnarray*> |)>-\|\|>>|>||)>-P|)>|\|>+|)>-\|\|>>>||>|*2*|)>|\>>+\*2>>||>|*2+\*2)>>>||>|*2*+1|)>m> and \2>)>>>||>|*2,>>>> so the output is correct when the algorithm exits at step1. After step3 we have <\eqnarray*> |)>-\|\|>>|>|*2>>||)>-\|\|>>|>|*2.>>>> In step4, we note that <\equation> +*log \+7|2>+1\++1=p-1> and deduce <\eqnarray*> |\|>>|>|-\|\|>>>||||)>-\|\|>+|)>-\|\|>+-\|\|>>>||>|*+2|)>+2 and)>>>||>|*+2+2|)>> and \2>)>>>||>|*2>.>>>> Therefore, the recursive call in step4 is valid and gives <\equation> |)>-2*\|\|>\\*2>.> Combining and leads to <\eqnarray*> |)>+2*P|)>-\|\|>>|>||)>-\|\|>+2*|)>-2*\|\|>+-\+\|\|>>>||>|*2+2*\*2+2>>||>|*2*+2*log \-2*l+5>+2|)>>>||>|*2*+2|)>.>>>> Consequently, <\eqnarray*> |)>-\|\|>>|>||)>+2*P|)>-\|\|>+|)>-P|)>-2*P|)>|\|>>>||>|*2*+2|)>+\*2*|)>|\>>>>||>|*2*+2|)>+\*2)>>>||>|*2*+2+2|)>\2>)>>>||>|*2.>>>> Step1 takes time <\equation*> O*\*m|)>*|)>*log \|)>=O*\*m|)>|)>, by Lemma. Steps3 runs in time <\equation*> O*\*p|)>*|)>*log \|)>=O*\*p*m|)>*log p|)>, by Proposition. When m>, the complexity > of our algorithm therefore satisfies <\eqnarray*> >|>|*\*p*m|)>*log p++,>>>> for some universal constant 0>. Unrolling this inequality, we obtain <\eqnarray*> >||*\*p*m|)>*log p+||\>**\*m|)>|)>>>|||*\*p*m|)>*log p|)>,>>>> which concludes the proof. The evaluation tree for > is obtained by induction: we recursively compute evaluation trees for >> and>>, and deduce the one for > as follows. <\specified-algorithm> <\description-compact> An approximation |~>\*2|)>>> of > with error 2> where p-3*log \+*log \>>. An evaluation tree for > with error \*2>.\ \4>, -\|\|>\\>. <|specified-algorithm> <\enumerate> If =1> then return ;p>\1> and an approximation ;p>> of -|~>> with error\*2>. Let \p+*log \>. For , let >\-3*log \>+*log \>> and compute >\>>*2|)>>> with error 2>>>. Recursively compute the evaluation trees >> for>> with error \>*2>>. For each \\>> do: <\enumerate> Compute an approximation |~>\>*\|)>>> of >|~>|)>> with error\*2>>. Apply Algorithm to *|~>>, >>, >>; let >\>*2|)>,\,z|]>> be the returned polynomial, such that >|)>-\>*|~>|\|>\\*2+4>>. Compute an approximation ;p>\*2|)>,\,z|]>> of >+\*>> with error 2>, where p-log \+1>. Set ;p>\1> and return ;p>|)>\\>\*2|)>,\,z|]>>>, and approximations of >>, >> with errors \>*2> and \>*2>. <\proposition> Algorithm is correct and runs in time <\equation*> O**\*p*|)>|)>*log p|)>, provided that \=O>. <\proof> If the algorithm exits in step1, then we have >=z-\>, so the output is correct. Now, let us assume that \2> and note that <\eqnarray*> >>|>| \+*log \-*log \+n+1>>||>|>>>> and <\equation> |~>|\|>\|\|>+-\|\|>+-|~>|\|>\1+\+2\1+\+\*2\1.1.> For ,\/2>, we compute |~>>> as the truncation of |~>> with error 2>-1/2>> using Lemma. Hence, <\eqnarray*> |~>>-\>|\|>>|>||~>>-|~>|\|>+|~>-\>|\|>>>||>|>-1/2>+2>>||>|>>*+2|)>>>>||>|>>.>>>> From Lemma, we may compute a truncation \> of > with error 2> in time *|)>>, provided that \=O>. We compute an approximation > of |~>> with error 2>. It follows that <\eqnarray*> -\>|\|>>|>|-w*|~>|\|>+|~>-\*|~>|\|>+*|~>-\*\|\|>>>||>|+1.1\2+2)>>>||>|*+1.1\2+1|)>>>||>|>>||>|>-5/2>.)>>>>> Using Lemma, we obtain |~>>> as the truncation of > with error 2>-1/2>>. Hence, <\equation*> |~>>-\>|\|>\|~>>-\|\|>+-\>|\|>\2>-5/2>+2>-1/2>\2>>>. For step3.a, we need to verify that <\eqnarray*> |~>|\|>>|>|>|~>|)>|\|>+\*2>>>|||>|~>|)>-F>|)>|\|>+\*2>>>||>||\>+1|)>*2*+\*2|)>|\>+1|)>>*|~>-\|\|>+\*2> and)>>>||>||\>*2*/|\>+1|)>|)>|\>+1|)>>*|~>-\|\|>+\*2>>>||>||\>*2*|~>-\|\|>>+\*2>)>>>||>||\>*2*|~>-\|\|>+-\|\|>|)>>+\*2>>>||>||\>*2*+\|)>+\*2>>>||>|*\*2*+\|)>+\*2>>>||>|*\*2*\+\*2>>>||>|,\4> and \5>)>>>>> so the use of Algorithm is valid in step3.b. Thanks to Proposition and using that>|)>=-F>|)>>>, we obtain <\eqnarray*> ||*>|)>-E>|)>|\|>>>||>|*>|)>-\*|~>|\|>+|~>+F>|~>|)>|\|>+>|~>|)>-F>|)>|\|>>>||>|*\*2+4>+\*2>>>|||+n*|\>+1|)>*2*+\*2|)>|\>+1|)>>*|~>-\|\|> and)>>>||>|*2+\*2+n*|\>*2*/|\>+1|)>|)>|\>+1|)>>*|~>-\|\|>>>||>|*2+\*2+n*|\>*2*|~>-\|\|>)>>>||>|*2+\*2+n*|\>*\*2>>||>|*2*+2+2|)>>>||>|*2.>>>> Lemma yields *>-E>|\|>\\*2>. It follows that <\eqnarray*> ;p>->+E>|)>|\|>>|>|;p>->+\*>|)>|\|>+*>-E>|\|>>>||>|+\*2>>||>|*2*+2|)>>>||>|*2.>>>> This concludes the correctness proof. As for the complexity we note that =O> whenever \=O>. In step3, >|~>|)>> can be computed in time *p*n|)>*log \|)>>, using binary powering. In step3.b, the interpolation of >> takes <\equation*> O*\*p*|)>|)>*log p|)> operations, by Proposition. Overall the complexity |)>> of our algorithm satisfies <\eqnarray*> |)>>|>|/2|)>+C*2**\*p*|)>|)>*log p,>>>> for some universal constant 0>. Unrolling this inequality yields the claimed complexity bound. Consider /r*\|)>> and x+/r*\|)>>, where is a positive integer. Our goal is to compute Q|)> rem R>. We first follow Kedlaya and Umans, by the problem into a multivariate one. For some 1> and powers of two |~>,\,\,\>> with |~>\|~>*\*\*\\D> and *\*\\D>, there exists a unique lift <\equation*> >\/r*\|)>,\,z|]>|~>,\,\,\> such that <\eqnarray*> >|>|>*\*\>,x*\*\>,\,x|)>.>>>> The first degree bound |~>> will be adjusted below to a larger value > that will match the setting of the previous section. We define <\eqnarray*> >|>|*\*\> rem R,i=1,\,n,>>>> so we have <\eqnarray*> Q|)> rem R>||>,\,Q|)> rem R.>>>> In order to compute the right-hand side of(), we further lift the problem into another one with integer coefficients: let >\\,\,z|]>|~>,\,\,\>> and >,\,>\\> be the canonical lifts of >,Q,\,Q> with coefficients in ,r-1|}>> and set <\eqnarray*> >|>|>>>|>|>|>+r*M*x*\*\>,i=1,\,n,>>>> where is a positive integer that will be specified below. Note that ,,\,> are still lifts of >,Q,\,Q>, so <\eqnarray*> >,\,Q|)> rem R>||,\,|)> rem r|)> rem R.>>>> We finally rescale the polynomials, so that we can compute ,\,|)>> using a spiroid: <\eqnarray*> ,\,z|)>>|>|,\,r*M*z|)>>>|>|>|*,i=1,\,n.>>>> Note that <\eqnarray*> ,\,|)>>||,\,|)>>>>> and <\eqnarray*> -x*\*\>|\|>>|>|,i=1,\,n.>>>> We will evaluate <\equation*> ,\,|)> numerically using fixed point arithmetic. Setting deg >, we note that <\eqnarray*> |>||~>+\+\+\-n>>|,\,|)>|)>>|>|>>> Let > be the smallest power of two satisfying \d*|~>>, let \\/*\*\|)>\d*|~>>, and consider the standard primitive >-th root of unity \\*\/\>\\>. We wish to determine ,\,|)>> from its values at ,\,\-1>>. For ,\-1>, we let <\equation*> \|)>,\,|)>|)>,> so we have <\equation> -\|\|>\>. and <\equation*> ,\,|)>|)>=>|)>. Choosing sufficiently large, we will be able to apply Algorithm in order to evaluate> at the spiroid =,\,\-1>|)>>. We are now ready to prove the main theorem of this paper, from which Theorem follows. <\theorem> The cost of degree modular composition over /r*\> is bounded by <\equation*> O D>>*log D*log r*|)>. <\proof> Using binary powering, the computation of ,\,Q> takes time <\equation*> **log D|)>=O D*log r*log log r|)>.> Now we take <\equation*> | D>|\>-1,> whence n>. More precisely, we have <\equation*> D>-1\n\ D>,> >\D>, and +2*n+1>\D>. We distinguish the following cases in order to construct the sequence |~>,\,\,\>: <\itemize> If +n>\D> then we take 0> minimal such that +k>\D>, and then =\=\=2> and =\=\=|~>=2>. Note that n> does exist. Otherwise, +n>\D>. <\itemize> If +2*n>\D> then we take 0> minimal such that +n+k>\D>, and then =\=\=2> and =\=\=|~>=2>. Otherwise we take =2,> =\=\=|~>=2>. In this way we achieve |~>\2*D> and <\equation> |)>,> whence <\equation> =O*D|)>=O*D*log D|)>> and <\equation> =O>. Recall that \>|2*\>>. Now we take \4> and \> minimal such that <\equation*> \\,> so, using, we have <\equation> +log D|)>=O.> With this choice for , the inequality implies <\equation*> -\|\|>\\.> Let > be the smallest power of two that satisfies \r*>, so we have |\|>\2>. We set <\equation*> m-log \+*log \.> From and we obtain that <\eqnarray*> ||+n*log \|)>>>|||**log D|)>>>|||*log r*log D|)>>>>> and that <\equation> D|)>.> Since |\|>=\=|\|>=1>, using Lemma with p-3*log \+*log \=O>, we compute a truncation |~>> of> with error \*2> in time <\eqnarray*> ||*p|)>|)>>>|||*D*log r*log D|)>|)> and)>>>|||*D*log r*log D*|)> and)>>>|||*D*log r*log D*log log r|)>.>>>> Since \=O>, Proposition allows us to compute an evaluation tree for > with error \*2> in time <\eqnarray*> ||**\*p*|)>|)>*log p|)>>>|||*\*p*|)>*log*\*p*n*log \|)>*log p|)>>>|||*D*log D*log r|\> and)>>>|||\|)>*log*\*p|)>*log p||)>>>|||*D*log r*log D*|)>. and)>>>>> Following Proposition, this bound dominates the cost for obtaining an approximation\>*\*2|)>>> of|)>> with error \*2>, where \p-*log \+1>, so we have <\equation*> ,\,|)>|)>-\|\|>\\*2> hence <\equation> ,\,|)>-FFT>|)>|\|>\\*2,> thanks to Lemma. Via Lemma, we compute \*\*2|)>>> such that\ <\equation> >|)>|\|>\\*2> in time *p|)>|)>=O*D*log r*log D*log log r|)>>, using. Finally, we get <\eqnarray*> ,\,|)>|\|>>|>|>|)>|\|>+>|)>-,\,|)>|\|>>>||>|*2+\*2 and)>>>||>|*\*2*log \+2>+\*2*log \-*log \>>>||>|,>>>> so rounding the coefficients of to the nearest integers yields ,\,|)>>. Since <\eqnarray*> >||>>||||)>>>|||*|)> and)>>>|||*log r*log D|)>,>>>> we compute ,\,|)> rem r> in time <\eqnarray*> *|)>|)>>||*\*log r*log D*log*log r*log D|)>|)>>>|||*\*log r*log D*log log r|)>>>|||*D*log r*log D*log log r|)>.)>>>>> The final division of ,\,|)>> by over /r*\> has cost <\eqnarray*> |)>*|)>>||*log \*log r*log log r|)>>>|||*D*log r*log D*log log r|)>. and)>>>>> The total cost of the method is dominated by. <\bibliography|bib|tm-plain|> <\bib-list|18> V.Bhargava, S.Ghosh, Z.Guo, M.KumarC.Umans. Fast multivariate multipoint evaluation over all finite fields. , 221\U232. New York, NY, USA, 2022. IEEE. V.Bhargava, S.Ghosh, M.KumarC.K.Mohapatra. Fast, algebraic multivariate multipoint evaluation in small characteristic and applications. , 70(6):1\U46, 2023. ArticleNo. 42. R.P.BrentH.T.Kung. Fast algorithms for manipulating formal power series. , 25(4):581\U595, 1978. P.Bürgisser, M.ClausenM.A.Shokrollahi. , 315. Springer-Verlag, 1997. J.vonzur GathenJ.Gerhard. . Cambridge University Press, New York, 3rd, 2013. D.HarveyJ.vander Hoeven. Integer multiplication in time >. , 193(2):563\U617, 2021. D.Harvey, J.vander HoevenG.Lecerf. Even faster integer multiplication. Complexity>, 36:1\U30, 2016. J.vander Hoeven. . Scypress, 2020. J.vander HoevenG.Lecerf. Composition modulo powers of polynomials. , ISSAC '17, 445\U452. New York, NY, USA, 2017. ACM. J.vander HoevenG.Lecerf. Modular composition via factorization. Complexity>, 48:36\U68, 2018. J.vander HoevenG.Lecerf. Fast multivariate multi-point evaluation revisited. Complexity>, 56:101405, 2020. J.vander HoevenG.Lecerf. Univariate polynomial factorization over finite fields with large extension degree. , 35:121\U149, 2024. E.KaltofenV.Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. , ISSAC '97, 184\U188. New York, NY, USA, 1997. ACM. K.S.KedlayaC.Umans. Fast polynomial factorization and modular composition. Comput.>, 40(6):1767\U1802, 2011. Y.KinoshitaB.Li. Power series composition in near-linear time. , 2180\U2185. IEEE, 2024. V.Neiger, B.Salvy, É.SchostG.Villard. Faster modular composition. ACM>, 71(2):1\U79, 2023. Article No.11. M.S.PatersonL.J.Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. Comput.>, 2(1):60\U66, 1973. A.Schönhage. Asymptotically fast algorithms for the numerical muitiplication and division of polynomials with complex coefficients. J.Calmet, , 144, 3\U15. Berlin, Heidelberg, 1982. Springer-Verlag. V.V.Williams, Y.Xu, Z.XuR.Zhou. New bounds for matrix multiplication: from alpha to omega. D.P.Woodruff, , 3792\U3835. Philadelphia, PA 19104 USA, 2024. SIAM. <\initial> <\collection> <\attachments> <\collection> <\associate|bib-bibliography> <\db-entry|+1PlYSeg92FURz2kB|book|TeXmacs:vdH:book> <|db-entry> > <\db-entry|+2Oru6w36yI6|article|vdH:fffact> <|db-entry> G. > <\db-entry|+HCypi2vCoDaGyv|article|vdH:kucomp> <|db-entry> G. > Complexity> <\db-entry|+NuHCznHJqSBdAP|article|KedlayaUmans2011> <|db-entry> C. > Comput.> <\db-entry|+C2mCMTwEO5guVW|article|BrentKung1978> <|db-entry> H. T. > <\db-entry|+JrEjrvbWMxse2i|article|PaSt73> <|db-entry> L. J. > Comput.> <\db-entry|+78JfxnbGAPNu1f|inproceedings|KaltofenShoup1997> <|db-entry> V. > <\db-entry|+2txxTg5dpQ5|inproceedings|WilliamsXuXuZhou2024> <|db-entry> Y. Z. R. > > <\db-entry|+1AmLVKAfE2Q|article|NeigerSalvySchostVillard2023> <|db-entry> B. É. G. > ACM> 11> <\db-entry|+2BGSXcH5nN6|inproceedings|BhargavaGhoshGuoKumarUmans2022> <|db-entry> S. Z. M. C. > <\db-entry|+1w4EeHDHEag|article|BhargavaGhoshKumarMohapatra2023> <|db-entry> S. M. C. K. > No. 42> <\db-entry|+27PHx1kC20iqo7v|inproceedings|vdH:pcomp> <|db-entry> G. > <\db-entry|+NuHCznHJqSBdAo|article|vdH:ffcomp> <|db-entry> G. > Complexity> <\db-entry|+1hn2Wr9rMK1|inproceedings|KinoshitaLi2024> <|db-entry> B. > <\db-entry|+2MBfjYgs1WYemMtm|book|BuClSh1997> <|db-entry> M. M. A. > <\db-entry|+2WheVMn2twB3YIJ|book|GathenGerhard2013> <|db-entry> J. > <\db-entry|+1PsNR9OI1vmeGVuk|article|vdH:nlogn> <|db-entry> J. van der > >> <\db-entry|+1AmLVKAfE2a|article|HaHoLe2016> <|db-entry> J. van der G. > Complexity> <\db-entry|+32w2uTQXGEE|inproceedings|Schonhage1982b> <|db-entry> > > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> TeXmacs:vdH:book vdH:fffact vdH:kucomp KedlayaUmans2011 KedlayaUmans2011 BrentKung1978 PaSt73 KaltofenShoup1997 WilliamsXuXuZhou2024 NeigerSalvySchostVillard2023 NeigerSalvySchostVillard2023 WilliamsXuXuZhou2024 KedlayaUmans2011 vdH:kucomp BhargavaGhoshGuoKumarUmans2022 BhargavaGhoshKumarMohapatra2023 vdH:pcomp vdH:ffcomp KinoshitaLi2024 KedlayaUmans2011 BuClSh1997 GathenGerhard2013 vdH:nlogn HaHoLe2016 Schonhage1982b GathenGerhard2013 GathenGerhard2013 KedlayaUmans2011 <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |1.1.Main result |.>>>>|> > |1.2.Related work |.>>>>|> > |1.3.Overview of the paper |.>>>>|> > |math-font-series||font-shape||2.Prerequisites> |.>>>>|> |2.1.Fixed point arithmetic |.>>>>|> > |2.2.Fast Fourier transform |.>>>>|> > |2.3.Multivariate polynomials |.>>>>|> > |math-font-series||font-shape||3.Spiroid arithmetic> |.>>>>|> |3.1.Regular spiroids |.>>>>|> > |3.2.General spiroids |.>>>>|> > |3.3.Existence of spiroids |.>>>>|> > |3.4.Reduction by spiroids |.>>>>|> > |math-font-series||font-shape||4.Multi-point evaluation and interpolation> |.>>>>|> |4.1.Evaluation |.>>>>|> > |4.2.Interpolation |.>>>>|> > |4.3.Computation of the evaluation tree |.>>>>|> > |math-font-series||font-shape||5.Modular composition> |.>>>>|> |math-font-series||font-shape||Bibliography> |.>>>>|> <\links> <\collection> > > > > > >