> <\body> |>||<\author-affiliation> CNRS, École polytechnique, Institut Polytechnique de Paris Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) 1, rue Honoré d'Estienne d'Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau, France |>>|<\doc-note> > In this paper, we propose a carefully optimized \Phalf-gcd\Q algorithm for polynomials. We achieve a constant speed-up with respect to previous work for the asymptotic time complexity. We also discuss special optimizations that are possible when polynomial multiplication is done using radix two FFTs. > The computation of greatest common divisors is the key operation to be optimized when implementing a package for rational number arithmetic. For integers of small bit-length, one may use Euclid's algorithm, which has a quadratic time complexity |)>>. Asymptotically faster algorithms were first proposed by Knuth and Schönhage, based on earlier ideas by Lehmer. Schönhage's algorithm has a logarithmic log n> overhead with respect to integer multiplication, which is believed to be asymptotically optimal. Many variants and improvements have been proposed since, mainly aiming at faster practical implementations. All these subquadratic algorithms are based on a recursive reduction to half of the precision; for this reason it is convenient to regroup them under the name \Phalf-gcd algorithms\Q. An analogous story can be told about the history of polynomial gcd computations. The polynomial counterpart of Euclid's algorithm was first described by Stevin241>. The first algorithms for polynomial half-gcds are due to Moenck andBrent\UGustavson\UYun; several variants have been developed since. We refer to for a gentle exposition of the polynomial half-gcd algorithm. This exposition also contains a careful analysis of the constant factor in the asymptotic time complexity. More precisely, let > be the complexity to multiply two polynomials of degree d> over an abstract effective field >. Then the gcd of two polynomials of degreed> can be computed using 22**log d> operations in >. This complexity further drops to 10**log d> if the corresponding Euclidean remainder sequence is \Pnormal\Q (all quotients in the ordinary Euclidean algorithm are of degree one). The authors declare that they made no particular efforts to optimize these constants and; in , they ask the question how far these constants can be lowered. This main goal of the presentpaper is to make progress on this question. One major motivation for optimizing constant factors in time complexity bounds is the development of faster implementations. Now practical algorithms for multiplying large polynomials usually rely on fast Fourier techniques. We will also investigate optimizations that are only possible when multiplications are done in this way. Indeed, this so-called FFT model allows for various specific optimizations that typically help to reduce constant factors. We refer to for a nice survey of such tricks and mention NTL as a software library that was early to exploit them in a systematicway. A particularly favorable case is when the ground field > has primitive >th roots of unity for all sufficiently large . We will call this the \Pbinary FFT model\Q. Agood example is the finite field =\> for a prime of the form +1>, where is large. Such finite fields arise naturally after modular reduction, provided that can be chosen freely. The binary FFT model allows for additional tricks and is particularly useful for dichotomic algorithms such as the half-gcd. We refer to section for those results that will be required in this paper. Note that large polynomial multiplications over general finite fields are fastest using FFT-based techniques, although one may not necessarily use radix two FFTs; see and references therein for the fastest current algorithms. For convenience of the reader, we will first describe our new half gcd algorithms in the normal case (see section). This will allow us to avoid various technical complications and focus on the new optimizations. The first idea behind our new half gcd is to make the update step after the first recursive call particularly efficient, by using a 2> matrix variant of the middle product algorithm. This leads to an algorithm of time complexity **log d> in general and 2**log d> in the FFT model (Proposition). The second idea is to combine this with known optimizations for the (binary) FFT model, such as FFT doubling, FFT caching, and Harvey's variant of the middle product. In the binary FFT model, this allows us to compute half gcds in time **log d> (Proposition and Corollary). When dropping the normality assumption, the algorithm becomes more technical, but it turns out that the middle product trick can still be generalized. This is explained in section and leads to the bound **log d> for general gcds and **log d> in the FFT model(Theorem). In the binary FFT model, special efforts are needed in order to enforce the use of DFTs of power of two lengths. We prove the bound **log d> in this last case (Theorem). SeeTable for a summary of the new constant factors|||||>|||>||>|>>|>|>>>||>|>>|>>|>>>>>>> Summary of the constant factors in various cases. >. It is well known that polynomial gcds have many applications: fast arithmetic on rational functions, fast reconstruction of rational functions5.7>, polynomial factorization, fast decoding of error-correcting codes, sparse interpolation>, etc. Personally, we were mainly motivated by the last application to sparse interpolation. After the introduction of the tangent Graeffe algorithm, gcd computations have often become one of the main practical bottlenecks. For this application, it is typically possible to work modulo a prime of the form +1>, which allows us to exploit our optimizations for the binary FFTmodel. We made a first implementation of the new algorithm and also programmed the customary extension to the computation of subresultants (see and also for this extension). Work is in progress on HPC implementations of our algorithms and all required subalgorithms. A few questions remain for future work. It is likely that the ideas in this paper can be applied to integer gcds as well, except for the optimizations that are specific to the binary FFT model. It would also be nice to have counterparts for the \Pternary FFT model\Q, which could be used for fields > of characteristic two (see for an algorithm of this kind in the case of iterated Graeffe transforms). Finally, in view of Bernstein and Yang's recent ideas from, as well as Remark, one may wonder whether the bounds for the normal case can actually be extended to the general case. <\acknowledgments*> We are grateful to Michael for triggering this study and suggesting some early ideas. The best bounds in this paper will be proved for the so-called \Pbinary FFT model\Q, which requires special assumptions on the ground field >. First of all, this model requires to be invertible in >. Secondly, for any polynomial \> that occurs during computations, we assume that > has a primitive -th root of unity> with \deg P>. For convenience, we also assume that =\> whenever divides. We measure computational complexity in terms of the number of required field<\footnote> Using an easy refinement of the analysis, it turns out that the main half gcd algorithms in this paper involve only a linear number of divisions, so the bulk of the operations are actually ring operations in >. operations in >. Given \> with 0>, we write and for the quotient and the remainder of the Euclidean division of by , respectively. Let n>> denote the space of polynomials of degree n>. Given \n>> with>>, we define its \\> by <\eqnarray*> >|>||)>,P|)>,\,P|)>|)>.>>>> We will write > for the maximum cost of computing a discrete Fourier transform of order or the inverse transformation :\\\n>>. It is well known that =O>. In what follows, we will always assume that /n>, and />> are non-decreasing (for 1> and 2>, respectively). Therefore, we may just as well take \n*log n>, but it is instructive to keep the notation > to indicate where we use Fourier transforms. If 2> and the discrete Fourier transform of \n>> at order with respect to > is known, then it costs +O> to compute >. Indeed, we already know |)>,P|)>,\,P|)>|)>>, so it remains to compute |)>,P|)>,\,P|)>|)>>. But this is the DFT of \P*x|)> rem -1|)>> at order and it takes a linear number of operations to compute > in terms of . We call this trick . Given \> with \n>>, we may use the discrete Fourier transform to compute the product using <\eqnarray*> ||*DFT|)>.>>>> This is called . If we fix of degree n>, then we may consider the multiplication map with as alinear map :\n-d>\\n>>. We have <\eqnarray*> >||\>>\DFT\\,>>>> where >> stands for componentwise multiplication by > in > and > for the injection of n-d>> into n>>. Given \n>> of degree and \n>>, we define their R> by <\eqnarray*> R>||P*R|]>*x.>>>> Let us recall how R> can be computed efficiently using FFT multiplication. Let \> be of degree n>, let \n-d>> and \n>>. If , then we observe that <\eqnarray*> >>|>>|>>|>>|>>|>>>>>>||>||>|>|>|>|>||>>|>||>>||>|>>|||>>>>>*>>|>>|>>>>>.>>>> The matrix in the middle has rows and columns and we may think of it as the matrix of the map >. If R>, then we note that <\eqnarray*> >>|>>|>>>>>>||>|>|>|>||>||>|||>|>|||>|>|>|>>>>>*>>|>>|>>|>>|>>|>>>>>.>>>> In other words >>|)>>, where =x*P> and >>:\n>\\n-d>>> stands for the transpose of >>. Since <\eqnarray*> >>>||\|)>>>\DFT\\|)>>>>|||>\DFT>\|)>>>\|)>>>>|||\DFT\|)>>>\DFT,>>>> where :\n>\\n-d>;P\P rem x>, it follows that <\eqnarray*> R>|||)>*DFT|)> rem x.>>>> Taking DFTs with respect to > instead of > and using that >=n*DFT>>, we also obtain <\eqnarray*> ||>>|)>*DFT>|)> rem x>>|||>>|)>*DFT>|)>|)>*DFT>|)> rem x>>|||>>|)>*DFT>*DFT>|)> rem x>>|||*DFT|)> quo x.>>>> This is the alternative formula from that we will use. From the complexity perspective, let > be the cost to multiply two polynomials ind>>. Using FFT multiplication, we see that =3*+O=6*+O> in the binary FFT model. More generally, if is even, then any two polynomials \> with n> can be multiplied in time +O=+O> using this method. Similarly, the cost >> to compute a middle product() with and satisfies >=6*+O=+O>. If is even and d=deg P\n> is general, then R> can be computed in time >+O=+O>. It is important to note that the above results all generalize to the case when the coefficient field > is replaced by the algebra r>> of r> matrices with entries in>. For instance, assume that \2>> with n>. Then() allows us to compute using DFTs, inverse DFTs, and multiplications of 2> matrices in 2>>, for a total cost of +O=4*+O>. Although the main focus in this paper is on the binary FFT model, we will also consider other types of polynomial multiplication. In that case, we will still denote by > and >> the complexities of multiplying two polynomials ind>> and to compute the middle product R> for \> of degree and \2*d-1>>. Moreover, we make the following assumptions on these cost functions: <\itemize> We have >\>. The product of any \> with 2*d> can be computed in time > and similarly for the middle product R> of \> with 2*d> and 2*d>. The functions >, /d>, and />> are non-decreasing (for 0>, 1>>, and 2>, respectively). We also make the same assumptions for the analogue cost functions 2>> and2>>> when taking coefficients in 2>> instead of >. In the binary FFT model we have seen that we may take > and >> to be of the form *d*log d+\*d> for suitable constants ,\>, after which the above assumptions are satisfied (if is not a power of two, then one may use the truncated Fourier transform, for instance). They are also verified for all other commonly used types of multiplication, such as Karatsuba's and Toom's methods, or FFT-based methods for arbitrary radices. In addition, we define >2>> to be a constant such that 2>\>2>\>2>*\>2>*>+O>. We have seen that we may take >2>=4> in the binary FFT model; this generalizes to arbitrary FFT models. Using Strassen's method for 2> matrix multiplication, we may always take >2>\7>. From a practical point of view, we usually have>2>\8>. Let us examine the constant >2>> a bit more closely in the case of Karatsuba multiplication. The complexity of this method satisfies <\eqnarray*> >||>>|>||+\*d>>>> for certain constants > and >, which yields |)>\+\|)>*3>. Similarly, when using Karatsuba's method to multiply 2> matrices, the complexity 2>> satisfies <\eqnarray*> 2>>||>>|2>>||2>+4*\*d,>>>> which leads to 2>|)>\+4*\|)>*3> and <\equation*> >2>=+4*\|\+\>. This analysis can be generalized to general lengths and to the case when we only use Karatsuba's method for degrees above a certain threshold. In the latter case, it is usually favorable to choose the threshold over 2>> to be slightly lower than the threshold over>. Let \> be such that deg P\deg Q>. The |)>k\\>> is defined by <\eqnarray*> >|>|>|>|>|>|>|>| rem R.>>>> The of the sequence is the smallest index > for which >=0>. For k\\>, we set <\eqnarray*> >|>|>>|>>>>>.>>>> We also define the sequence of |)>k\\>> by <\eqnarray*> >|>||>|| quo R>>>>>,>>>> so that <\eqnarray*> >||*A,>>>> for k\\-1>. We have =R-1>>. We regard > as a matrix polynomial in 2>> and say that |)>k\\>> and |)>k\\>> are if =1> for all . This is the case if and only if =d+1> and =-1>=d-k> for all ,d|}>>. For j>, we also define <\eqnarray*> >||*\*B*B,>>>> so that <\eqnarray*> >||*A.>>>> (We understand that =Id> if .) In particular, <\eqnarray*> >>|>>>>=A>>||>*A=B>*>|>>>>,>>>> so an extended gcd computation essentially boils down to the computation of the matrix product >=B-1>*\*B>. In the case of a normal remainder sequence, this is done most efficiently using binary splitting: <\eqnarray*> >||>>|>||*B,i+2\j,h\||\>.>>>> In essence, this is also how the half-gcd algorithm works. <\lemma> For any i\j\\>, we have <\eqnarray*> >||+\+deg B>>|>||>>>> In particular, if |)>k\\>> is normal, then =j-i> and =d-i>. <\proof> Let us show by induction on that <\eqnarray*> =|)>>||+\+deg B>>||)>,\>>|>|+\+deg B>>>> for any ,\|)>\,,|}>>. This is clear if , so assume that j> and let |/2|\>>. Then <\equation*> |)>=|)>*|)>+|)>*|)>, so the induction hypothesis yields <\eqnarray*> |)>*|)>>|||)>+deg |)>=deg B+\+deg B>>||)>*|)>>|||)>+deg |)>\deg B+\+deg B,>>>> whence |)>=deg B+\+deg B>. In a similar way, the induction hypothesis yields |)>,\>\deg B+\+deg B> for all ,\|)>\,,|}>>. As to the second relation, we note that <\eqnarray*> >||-deg R,>>>> whence =deg B+\+deg B=deg R-deg R> <\example> Taking +2*x-x+4> and +x-2*x+5>, we obtain the following normal remainder sequence: <\equation*> ||||||||>||+2*x-x+4>||||||||>|>||+x-2*x+5>||>||||>|||>>>>>||>|||>||>>>>>>|>||-4*x-1>||>||||>|||>>>>>||>||||>|||>>>>>>|>||||>|||>||*x+>>>>>>||>||||>|||+2*x+4>>>>>>>|>||>||>||||||>|||*x->>>>>>>||>||||||+2*x+4>>|*x+*x+>|>|*x-*x+*x+>>>>>>>>|>||||||||>||||||*x+*x+>|>|*x-*x+*x+>>|*R>||*R>>>>>>>>>>> In the irnormal case, it is convenient to work with an alternative indexation of remainder sequences for which >\d-k>, as in the normal case. \ Note that we will not consider abnormal remainder sequences until section below, so the reader may safely skip this subsection until there. Let us now explain our reindexation in detail. For any ,\-1|}>>, let <\eqnarray*> >|>|>>>> We also take <\eqnarray*> >|>|>||)>>|>|>>> Then we set <\eqnarray*> >>>|>|>>|>>>|>|>>|>>>|>|.>>>> Moreover, for any +1,\,\-1|}>>, we define <\eqnarray*> >>|>|>>|>>|>|>>|>>|>|.>>>> For i\j\d>, we also define <\eqnarray*> >>||>*\*B>*B>,>>>> so that we still have <\eqnarray*> >>||>*A>.>>>> By construction, for ,d|}>>, we have <\eqnarray*> >>|>|>|>>|>|>>> As before, we will sometimes write >> instead of >> in order to emphasize the dependence on and , and similarly for >>, etc. Occasionally, when is not clear from the context, we also write >>>, >>, <\example> Taking +11*x+3*x-4*x+3> and +5*x-5*x-2>, we have <\equation*> ||||||||>||+11*x+3*x-4*x+3>||||||||>|>||+5*x-5*x-2>||>|||>||>>>>>||>|||>||>>>>>>|>||||>|||>||+5>>>>>>||>|||>||>>>>>>|>||||>|||>||*x->>>>>>||>|||>|+1>|+x-2*x>>>>>>>>|>||||||||>||+1>|+x-2*x>>|*R>|*R>>>>>>>>>>> After reindexation =0>, =1>, =3>, =4>, and =5>, we obtain <\equation*> ||||||||||||||>>||+11*x+3*x-4*x+3>||||||||>|>>||+5*x-5*x-2>||>>|||>||>>>>>||>>|||>||>>>>>>|>>||||>>|||>||>>>>>||>>|||>||>>>>>>|>>||||>>|||>||+5>>>>>>||>>|||>||>>>>>>|>>||||>>|||>||*x->>>>>>||>>|||>|+1>|+x-2*x>>>>>>>>|>>||||||||>>||+1>|+x-2*x>>|*R>|*R>>>>>>>>>>> Let \> be as in the previous section with remainder sequence|)>k\\>>. We will write > for the corresponding -th Bezout matrix > in case we wish to make the dependency on and clear. Similarly, we define \B*\*B> and =B*P+B*Q>. Given a polynomial \> and indices , it will also be convenient to define <\eqnarray*> >|>|+U*x+\+U*x>>|>|>|.>>>> Here we understand that \0> whenever i>. <\lemma> Given k\\> such that =\=deg B=1>, we have <\eqnarray*> >||,Q|)>>>|>||,Q|)>.>>>> <\proof> We have =j-i> for all i\j\\>. For ,k>, the relation <\eqnarray*> >||*A>>>> thus shows that the coefficient |)>>> of degree > in > only depends on coefficients >> and>> of and with \\-i>. In particular, <\equation*> |)>=R*P,x*Q|)>=R,Q|)> and <\equation*> -R quo R=-|)> quo |)>=-R,Q|)> quo R,Q|)>, since =d+1-i> and =d-i>. This shows that <\equation*> B=B,Q|)>. By induction on , we also obtain =B,Q|)>>. We conclude by taking . The lemma leads to the following recursive algorithm for the computation of>>: <\specified-algorithm> \> and d=deg P> with =1> for all i\k> > <|specified-algorithm> <\enumerate> If , then return |>|| quo Q>>>>>> Let |k/2|\>> and \k-h> Recursively compute B,Q|)>> Compute ;>>>|;>>>>>>> with >>|>>>>>=M*>|>>>>> Recursively compute \B+1>;>,>|)>> Return *M> <\proposition> Algorithm is correct. <\proof> If , then the result follows from Lemma. If 1>, then Lemma implies >, whence =R> and =R>. For |~>\>, we have |~>>,|)>=|~>>,R|)>>=deg B|~>>=1>. This allows us to apply Lemma once more, and obtain =+1>,R|)>>=B>. We conclude by noting that *M=>*B=B>. Let us now show how to compute ;>> and ;>> efficiently in step using a middle product of 2> matrix polynomials. In order to simplify the exposition, we assume that is a power of two; in subsection we will consider arbitrary lengths. We first decompose all polynomials into blocks of degreeh>. More precisely, let <\eqnarray*> >>||>>|>>||>>|>>||>>|>>||,>>>> so that <\eqnarray*> >||P>+P>*x+P>*x+P*x>>|>>||>+Q>*x+Q>*x>>>> and <\eqnarray*> >||>+>*x+*x>>|>||>+>*x.>>>> Then we observe that <\eqnarray*> >>|>>>|>>|>>>>>>>||P>+P>*x>|P>+P>*x>>|>+Q>*x>|>+Q>*x>>>>>,>>>> where the left hand matrix has degree h>, where has degree , and where the right hand matrix has degree k>; see also Figure.|gr-frame|>|gr-geometry||gr-grid||gr-grid-old||1>|gr-edit-grid-aspect|||>|gr-edit-grid||gr-edit-grid-old||1>|gr-grid-aspect|||>|gr-grid-aspect-props|||>|gr-text-at-halign|center|gr-dash-style|10|gr-opacity|30%||>||>>||||>>|||>>|||>||||>||>>||>>||>||>||>|>>|>>|>>|>>|>>|||>|||>|>>|>>|||>>|>>||>>||>>||>>>>> Schematic illustration of the computation of > and > by taking the middle product of and the 2> matrix with entries >, >, >, and >. > The individual term *x> can be recovered in linear time using <\eqnarray*> >||*P+M*Q|)>=|)>*P+|)>*Q|]>.>>>> Before we discuss further optimizations that are specific to the binary FFT model, let us first consider a general multiplication scheme (that satisfies the assumptions from section), and analyze the complexity of Algorithm with the middle product optimization. <\proposition> The cost of Algorithm with the middle product optimization is bounded by <\equation*> >2>|2>**log k+O|)>. Moreover, for a multiplication with \k>> for some \\2>, the cost is <\equation*> \>2>|2-1>-1>*. <\proof> Recall that we assumed to be a power of two. Then the running time of the algorithm satisfies the recurrence inequality <\eqnarray*> >|>||)>+2*>2>*|)>+O.>>>> Unrolling this relation while using the assumption that /> is non-decreasing yields the first bound. If \c*k>> for some 0> and \\2>, then the relation() yields\>2>*c|2-1>-1>*k>>. In order to efficiently implement Algorithm with the optimizations from the previous subsection in the binary FFT model, let us again assume that is a power of two. We essentially have to compute a 2> matrix middle product in step and a 2> matrix product in step. We will do all these computations using DFTs of length . Let us first consider the middle product(). We have in() and the right hand side matrix has degree k>. Consequently, we may apply() and compute the middle product using FFT multiplication: <\eqnarray*> >>|>>>|>>|>>>>>>>||*DFT>+P>*x>|>+P>*x>>|>+Q>*x>|>+Q>*x>>>>>|)>|)> quo x.>>>> We also recall that the individual term *x> can be recovered separately using(). As to the matrix product *M>, its degree is , which is just one too large to use FFT multiplication directly. Nevertheless, FFT multiplication still allows us to compute *M> modulo -1>. Then we may simply recover *M> by computing the leading coefficient *M|)>=*M> separately. Now the FFT model also allows for \PFFT caching\Q when doing the recursive calls. In addition to >, we return its DFT transform at the end of the algorithm. When computing the DFT transforms of and > at length , this means that we already know their DFT transforms at length , and thereby save half of the work. Summarizing, this leads to the following algorithm: <\specified-algorithm> \> and 2>> such that 2*k> and =1> for all i\k> > and |)>> <|specified-algorithm> <\enumerate> If , then return |>|| quo Q>>>>>> and > Let k/2> Recursively compute B,Q|)>> and > Compute =DFT> using FFT doubling Compute >>|>>>>>> with >>|>>>>>=M*>|>>>>> using() and() Recursively compute \B,|)>> and |)>> Compute |^>=DFT|)>> using FFT doubling Compute |^>*> and *M=DFT|^>*|)>+*M*-1|)>> Return *M> and |^>*> <\proposition> Algorithm is correct and its cost is bounded by <\equation*> **log k+O|)>. <\proof> Since the algorithm is simply an adaptation of Algorithm to the binary FFT model, it is correct by Proposition. Let us analyze the costs of the various steps without the recursive calls. <\itemize> The cost of the DFTs in step is bounded by +O>. The cost of step is bounded by +O>. The cost of the DFTs in step is again bounded by +O>. The cost of step is bounded by +O>. The total cost of the top-level of the algorithm is therefore bounded by +8*+O=16*+O=*+O>. Consequently, the cost of our algorithm satisfies the recurrence inequality <\eqnarray*> >|>||)>+*+C*k.>>>> Unrolling this relation while using the assumption that /> is non-decreasing yields the desired complexity bound. > Let us now generalize Algorithm to the case when is not necessarily a power of two. <\specified-algorithm> \> and \> such that 2*k> and =1> for all i\k> > <|specified-algorithm> <\enumerate> If , then return > Let 2>> be maximal such that k> and \k-h> Compute B> using Algorithm Compute ;>>>|;>>>>>>> with >>|>>>>>=M*>|>>>>> Recursively compute \B+1>;>,>|)>> Return *M> <\proposition> Algorithm is correct and its cost is bounded by <\equation*> **log k+O|)>. <\proof> The correctness is proved in the same way as for Algorithm. For some universal constant , the cost > of the algorithm satisfies the recurrence relation <\eqnarray*> >|>|**log h+|)>+C*.>>>> Writing +\+k> with ,\,k\2>> and \\\k>, it follows that <\eqnarray*> >|>|*|)>*log k+C*|)>>>||>||k>**k*log k+C*k|)>>>|||**log k+C|)>,>>>> where we used our assumption that /k> is non-decreasing. <\corollary> Let \> and d> be such that , , and =1>> for all i\k>. Then we may compute > in time <\equation*> **log k+O|)>. <\proof> Modulo multiplication of and with >, we may assume without loss of generality that 2*k>. <\remark> Instead of Algorithm, we may also use Algorithm with the middle product optimization in step. In that case, the complexity bound from Proposition generalizes in a similar way to arbitrary lengths. Algorithms and generalize to the abnormal case, modulo several technical adjustments. In this section we describe how to do this. Let us first show how to adapt Algorithm. Lemma now becomes: <\lemma> Given k\d>, we have <\eqnarray*> >>||>,Q|)>>>|>>||>,Q|)>.>>>> <\proof> We have \i-1> for ,k>, whence the relation >=B>*A>> shows that the coefficient >|)>>> of degree > in >> only depends on coefficients >> and>> of and with \\-i>. We next proceed in a similar way as in the proof of Lemma.<\float|float|tbh> <\specified-algorithm> \> with d\deg P> and d> >> <|specified-algorithm> <\enumerate> If =0>, then return |>||>>>>> If , then return |>|| quo Q>>>>>> Let |k/2|\>> and \k-h> Recursively compute B>,Q|)>> Let =h-deg M>, so that =d-h+\> and \d-h> Compute -\;>>>|-\;>>>>>>> with >>|>>>>>=M*>|>>>>> If =0>, then return If \0>, then <\itemize> Compute -\;> quo -\;>> and |>||>>>>> We claim that quo > Update J*M> and let \k-deg M\> Compute ;>>>|;>>>>>>> for the updated >>|>>>>>\>>|-D*>>>>>> Recursively compute \B+1>>;>,;>|)>> Return *M> <\proposition> Algorithm is correct. <\proof> If =0>, then the result is obvious. If and \0>, then the result follows from Lemma. Assume from now on that 1> and \0>. Then Lemma implies >>, whence =R>> and =R>>. Let be largest with \h>. If =0> in step, then \k>, so >=B>=M> and our algorithm returns the correct answer. Assume from now on that\0>. We call \h-deg M> the after steps. Let still be largest with \h>. Then we have +1>>> and +1>>=\=h-\>. In particular, we see that =R>>=R>, =R+1>>=R>, and =d-h+\>. Furthermore, \0> implies that \d-k>, so --\|)>=deg --\|)>>\+\>>. Therefore, we computed the +\+1> leading terms of > as part of -\;>>. Now -deg >=-deg >\+\>>, so we only need the +\+1> leading coefficients of > and > in order to compute quo >. This proves our claim that quo >. After step, we thus have =R>, =R>, and +1>>>, where >. Moreover, =d-deg M=>> and -|)>=-h+\>. In particular, we may indeed retrieve the new value of ;>> from and the old values of -\;>> and -\;>> at the end of step, since --\|)>>=-h+\=deg D>. Furthermore, -|)>=2*h>, which allows to apply Lemma, and obtain =B+1>>,R|)>>. We conclude that <\vgroup> <\eqnarray*> *M>||+1>>,R|)>*B+1>>>>|||+1;\+h+1>>*B+1>>>>|||+h+1>>>>|||>>>>> <\remark> For efficiency, we chose |k/2|\>> in step, but it is interesting to note that the above correctness proof actually works for any choice of with h\k>. We will use this property for our FFT version in section below, where we will take to be a power of two. Contrary to what we did in section, we do not require to be a power of two in this subsection. In fact, it is possible to efficiently implement step in general, using middle products. This time, we break up our input and output polynomials as follows: <\eqnarray*> >>|>||)>;d-k+i*|)>>>>|>>|>||)>;d-k+i*|)>>>>|>>|>|*|)>;d-k+i*|)>>>>|>>|>|*|)>;d-k+i*|)>>.>>>> Then we have (see See Figure|gr-frame|>|gr-geometry||gr-grid||gr-grid-old||1>|gr-edit-grid-aspect|||>|gr-edit-grid||gr-edit-grid-old||1>|gr-grid-aspect|||>|gr-grid-aspect-props|||>|gr-text-at-halign|center|gr-dash-style|10|gr-text-at-valign|center|gr-opacity|30%||>||>>|||>>||>>||>>||>>|>>|>>|>>|>>|>>|>>|||>>|>>||>>||>|||>||>||>>||||>>||||>>|>>||>>|>|>>|>|>>|>|>>||>>||>>||>||>>||>>|>|>>>>> Schematic illustration of the matricial middle product in the abnormal case, for even . >) <\eqnarray*> >>|>>>|>>|>>>>>>>||>||>>|>>>|>>|>>>>>>.>>>> As before, the coefficient >> is computed separately using <\eqnarray*> >>||>|)>*P-i>+|)>*Q-i>|]>.>>>> <\remark> In the continuation of Remark, we note that the above formulas again apply for any choice of with h\k>. Let >2>> be as in section. For the multiplication method that we selected, assume that the middle product() and the final product *M> can each be computed in time *>2>*+O>. Then we have the following generalization of Proposition: <\theorem> Let >2>\8> be as in Proposition. Then Algorithm with the middle product optimization runs in time at most <\equation*> >2>+3|4>**log k+O k|)>. Moreover, for a multiplication with \k>> for some \\2>, the cost is <\equation*> \>2>+3|2>-2>*. <\proof> The cost of steps and is bounded by >2>*+O>, according to the assumption just above this proposition. In step, the computation of can be done in time |)>+O> with 4>, using Newton's method for fast power series inversion; see, 6>. The update of amounts to two products of degrees h> by \>, which can certainly be computed in time 2*>. The update of ;>> can be computed efficiently using a middle product that takes > additional operations. Altogether, the cost > of the algorithm satisfies <\eqnarray*> >|>|+|)>+>2>+|)>*+4*|)>+C*k,>>>> for some constant 0>. Note also that +\\k> and that <\equation*> \\|max*log max> is a non-decreasing function. Let \>2>|2>+1> and \>+3*\> be such that <\eqnarray*> >|>|**log k+\|)>*k*max k,1|)>>>>> for all 2 |)>+2>>. Let us show by induction on that() holds for all 2 |)>+2>> as well. Indeed, <\eqnarray*> >|>|**log h+\*log h|)>*|)>>>|||+2*\*\*k*log k+A*\*\*log k+C*k>>||>|*\* k-2*log k+|)>*|)>+\*\ k-|)>*|)>>>|||+2*\*\*k*log k+4*\*\*log k+C*k>>||>|*\* k-2*log k+|)>*|)>+2*\*\*k*log k+A*\*\*log k>>|||+\*\* k-|)>*k+C*k>>||>|*\*k*log k-\*\* k-2*log k+|)>*\+A*\*\*log k>>|||+\*\*k*log k+>+*\-*\|)>*\*k>>||>|**log k+\|)>*k*log k-\**log k-|)>*log k+*\|)>*\>>||>|**log k+\|)>*k*log k.>>>> This completes the proof of the first bound. We skip the proof of the second one, which is based on similar arguments. <\remark> The constants in the bounds from Theorem are probably not sharp. The point is the following: if the quotients in the Euclidean remainder sequence are all of bounded degree, then the additional cost with respect to the normal case can actually be bounded by |)>>. If, on the contrary, some of the quotients have exceptionally large degrees, then many of the recursive calls will terminate early at step. This will actually make the constants instead of increase. The only problematic case therefore seems to be when many of the quotients have moderately large degrees; it would be interesting to know more about the precise worse case scenario. One important feature of Algorithm is that we use the same length for all our DFTs. If is a power of two, then we may conserve this property in the abnormal case. Indeed, the final product *M> causes no problem since its degree is still k>. As to the middle product(), we now have > and the degree of the right hand side is still k>. This allows us to apply(), which yields <\eqnarray*> >>|>>>|>>|>>>>>>>||*DFT>>|>>>|>>|>>>>>>|)>|)> quo x>.>>>> However, there is no reason why > should be a power of two for the second recursive call. In order to remedy to this problem, we introduce a new parameter \k> that we assume to be a power of two and that we will use for the lengths of our DFTs. This requires a minor adjustment of(): <\eqnarray*> >>|>>>|>>|>>>>>>>||>>*DFT>>>|>>>|>>|>>>>>>|)>|)> quo x>.>>>> We are now in a position to adapt Algorithm to the binary FFT model: see .<\float|float|tbh> <\specified-algorithm> \> with deg P> and \\2>> with deg P> >> and >>|)>> <|specified-algorithm> <\enumerate> If =0>, then return |>||>>>>> and >> If =1>, then return |>|| quo Q>>>>>> and > If \/2>, then <\itemize> Recursively compute B>> and /2>> Return and >>, which we compute using FFT doubling Let \/2> Recursively compute B>,Q|)>> and /2>> Compute =DFT>> using FFT doubling Let =h-deg M>, so that =d-h+\> and \d-h> Compute ;>>>|;>>>>>>> with >>|>>>>>=M*>|>>>>> using() and () If =0>, then return and > If \0>, then <\itemize> Compute -\;> quo -\;>= quo > and |>||>>>>> Compute \DFT>> and deduce \DFT>> Update \*> and >\J*M>>, where \+\-deg J> Compute ;>>>|;>>>>>>> for the updated >>|>>>>>\>>|-D*>>>>>> Recursively compute \B>+1>;>,;>|)>> and /2>|)>> Compute |^>=DFT>|)>> using FFT doubling Compute |^>*> and *M=DFT>|^>*|)>+>*M>*>-1|)>> Return *M> and |^>*> For the complexity analysis, it will be convenient to assume that > satisfies the properties from section. In particular, the assumption that /d> is non-decreasing implies that +|)>\|)>> for all >. Conversely, in the binary FFT model, it will be convenient to also assume that |)>>\+-1|)>+M*|)>> for all , \1>, and some fixed constant >. <\theorem> Algorithm is correct. Moreover, if \2*k>, then its cost is bounded by <\equation*> **log k+O|)>. <\proof> The correctness is proved in a similar way as the correctness of Proposition, while using Remarks and. The total cost of steps,,, and is |)>+O|)>>, as in the proof of Proposition. As to the update step, the computation of requires |)>|)>>> operations. The computation of > and > costs >|)>*k/\\|)>+O|)>>, whereas the multiplication *> takes linear time. Now let deg > and > before the updates, so that +\> with \0> and . Since =h-\>, the lowest > coefficients of > do not matter in step. During the update, this means that we essentially need to compute ;>|)>;>>. Now, using FFT multiplication, the computation of <\equation*> ;>|)>;>=D\;d-h-\> takes one direct and one inverse DFT of length > of total cost|)>+O|)>>, since we already know >. The remaining coefficients ;>|)>;d-3*h+\>> can be computed in time|)>|)>>. If |2>>, then |2>> and the above analysis shows that the time complexity |)>> of the algorithm satisfies <\eqnarray*> |)>>|>||2>+h+\\k> |2>,|2>|)>+,|2>|)>+*|)>+A*|)>+C*\|)>,>>>> for suitable constants and . If |2>>, then the FFT doubling in step can be done in time |)>+O|)>>, so <\eqnarray*> |)>>|>||2>|)>+*|)>+C*\,>>>> by increasing if necessary. In the bound for |)>> when |2>>, the term |)>> pollutes the complexity analysis. Our next objective is to reduce to the case when the sum ,|2>|)>+A*|)>> is replaced by |2>,\|)>>. We start with the definition of an upper bound |\>|)>> for |)>> as follows. For 1>, we take |\>\>. For |2>\k\\\2>> with \2>, we define <\eqnarray*> |\>|)>>|>||2>+h+\\k> |\>|2>,|2>|)>+|\>,|2>|)>+*|)>+A*|)>+C*\|)>.>>>> For |2>> with \\2>>, we take <\eqnarray*> |\>|)>>|>||\>|2>|)>+*|)>+C*\.>>>> Using an easy induction, we note that |\>|)>> is increasing in for fixed >. If \2> and |2>\k\\>, then there exist > and > with |2>+h+\\k> such that <\eqnarray*> |\>|)>>|||\>|2>,|2>|)>+|\>,|2>|)>+*|)>+A*|)>+C*\.>>>> Given > with \k+\\\>, it follows that <\eqnarray*> |\>,\|)>>|>||\>|2>,|2>|)>+|\>,|2>|)>+*|)>+A*+\|)>+C*\>>||>||\>|2>,|2>|)>+|\>,|2>|)>+*|)>+A*|)>+A*|)>+C*\>>||||\>|)>+A*|)>.>>>> More generally, for any k\k\\>, we claim that <\eqnarray*> |\>|)>+A*-k|)>>|>||\>,\|)>+2*\*\,>>>> where > is the constant from before the statement of this theorem. We prove our claim by induction on the smallest with \/2>. We already dealt with the case when , so assume that 1>. If \|2>>, then() and the induction hypothesis with |2>> in the role of > yield <\eqnarray*> |\>,\|)>-|\>|)>>|||\>,|2>|)>-|\>|2>|)>>>||>|-k|)>-\*\.>>>> In particular, <\eqnarray*> |\>|)>+A*|2>-k|)>>|>||\>|2>,\|)>+\*\.>>>> If \|2>>, then we have shown above (with |2>> in the role of ) that <\eqnarray*> |\>|2>+1,\|)>+A*-|2>+1|)>|)>>|>||\>,\|)>,>>>> whence <\eqnarray*> |\>|)>+A*-k|)>>|>||\>|)>+A*|2>-k|)>+A*-|2>+1|)>|)>+\*k>>||>||\>|2>,\|)>+A*-|2>+1|)>|)>+\*\+\*k>>||>||\>|2>+1,\|)>+A*-|2>+1|)>|)>+\*\+\*k>>||>||\>,\|)>+2*\*\,>>>> as claimed. Now consider |2>\k\\\2>> with \2> and let > and > be such that |2>+h+\\k>. Then our claim implies <\eqnarray*> |\>,|2>|)>+A*|)>>|>||\>+\,|2>|)>+\*\\|\>|2>,|2>|)>+\*\.>>>> Plugging this into(), while setting \C+\>, we obtain <\eqnarray*> |\>|)>>|>||\>|2>,|2>|)>+|\>|2>,|2>|)>+*|)>+C*\.>>>> for all \2> and |2>\k\\>. Unrolling this inequality for >, we deduce that there exists a constant > with <\eqnarray*> |\>,\|)>>|>|*|)>*log \+C*|)>>>>> for all \2>>. For general +\+k> with ,\,k\2>> and \\\k>, combining this bound with() and() yields <\eqnarray*> |\>|)>>|>|*|)>*log k+C*|)>+* \>|)>+C* \>2+O>>||>||k>**k*log k+C*k|)>+*|)>+2*C*\+O>>|||**log k+C|)>+*|)>+2*C*\+O.>>>> Under the assumption that \2*k>, we have *|)>+2*C*\=O|)>>, whence |)>\|\>|)>\**log k+O|)>>. <\bibliography|bib|tm-plain|> <\bib-list|37> M.Ben-OrP.Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. , 301\U309. New York, NY, USA, 1988. E.R.Berlekamp. . McGraw-Hill, 1968. D.J.Bernstein. , 325\U384. Mathematical Sciences Research Institute Publications. Cambridge University Press, United Kingdom, 2008. D.J.BernsteinB.-Y.Yang. Fast constant-time gcd computation and modular inversion. , 3:340\U398, 2019. R.P.Brent, F.G.GustavsonD.Y.Y.Yun. Fast solution of Toeplitz systems of equations and computation of Padé approximants. Algorithms>, 1(3):259\U295, 1980. D.G.CantorE.Kaltofen. On fast multiplication of polynomials over arbitrary algebras. , 28:693\U701, 1991. D.G.CantorH.Zassenhaus. A new algorithm for factoring polynomials over finite fields. , 36(154):587\U592, 1981. J.W.CooleyJ.W.Tukey. An algorithm for the machine calculation of complex Fourier series. , 19:297\U301, 1965. J.L.Dornstetter. On the equivalence between Berlekamp's and Euclid's algorithms. , 33:428\U431, 1987. J.vonzur GathenJ.Gerhard. . Cambridge University Press, New York, NY, USA, 3rd, 2013. B.Grenet, J.vander HoevenG.Lecerf. Randomized root finding over finite fields using tangent Graeffe transforms. , 197\U204. New York, NY, USA, 2015. ACM. G.Hanrot, M.QuerciaP.Zimmermann. The middle product algorithmI. speeding up the division and square root of power series. , 14:415\U438, 2004. D.Harvey. Faster algorithms for the square root and reciprocal of power series. , 80:387\U394, 2011. D.HarveyJ.vander Hoeven. Faster polynomial multiplication over finite fields using cyclotomic coefficient rings. of Complexity>, 54, 2019. Article ID 101404, 18 pages. D.HarveyJ.vander Hoeven. Polynomial multiplication over finite fields in time >. , HAL, 2019. . J.vander Hoeven. The truncated Fourier transform and applications. , 290\U296. Univ. of Cantabria, Santander, Spain, July 2004. J.vander HoevenM.Monagan. Computing one billion roots using the tangent Graeffe method. , 54(3):65\U85, 2021. J.vander Hoeven etal. GNU TeXmacs. , 1998. A.KaratsubaJ.Ofman. Multiplication of multidigit numbers on automata. , 7:595\U596, 1963. D.E.Knuth. The analysis of algorithms. , 3, 269\U274. Gauthier-Villars, 1971. G.Lecerf. On the complexity of the Lickteig\URoy subresultant algorithm. Symbolic Comput.>, 2018. . D.H.Lehmer. Euclid's algorithm for large numbers. , 227\U233, 1938. D.Lichtblau. Half-GCD and fast rational recovery. , 231\U236. 2005. J.Massey. Shift-register synthesis and bch decoding. , 15:122\U127, 1969. R.Moenck. Fast computation of GCDs. , 142\U171. New York, 1973. ACM Press. N.Möller. On Schönhage's algorithm and subquadratic integer gcd computation. , 77(261):589\U607, 2008. F.Morain. Implementing the Thull-Yap algorithm for computing euclidean remainder sequences. , 197\U205. 2022. R.Prony. Essai expérimental et analytique sur les lois de la dilatabilité des fluides élastiques et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool, à différentes températures. , 1:24\U76, 1795. Cahier 22. A.Schönhage. Schnelle Berechnung von Kettenbruchentwicklungen. , 1(2):139\U144, 1971. A.Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. , 7:395\U398, 1977. V.Shoup. NTL: a library for doing number theory. 1996. . D.StehléP.Zimmermann. A binary recursive gcd algorithm. D.Buell, , 411\U425. Springer Berlin Heidelberg, 2004. S.Stevin. . Imprimerie de Christophle Plantin, 1585. V.Strassen. Gaussian elimination is not optimal. , 13:352\U356, 1969. V.Strassen. The computational complexity of continued fractions. , 51\U67. 1981. K.ThullC.K.Yap. A unified approach to HGCD algorithms for polynomials and integers. . A.L.Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. , 4(2):714\U716, 1963. <\initial> <\collection> <\attachments> <\collection> <\associate|bib-biblio> <\db-entry|+if2lDqI1qmMqAaI|inproceedings|CKL89> <|db-entry> E. Y. > <\db-entry|+2X2sGxMCqQBxuWk|inproceedings|BoLeSc:2003:tellegen> <|db-entry> G. É. > <\db-entry|+1LCfUIVm228oQhYU|inproceedings|GR11> <|db-entry> D. S. > <\associate|bib-bibliography> <\db-entry|+1AdM1WYY16ax94ih|misc|TeXmacs:website> <|db-entry> > > <\db-entry|+OcAPOVBj6pZ8yg|inproceedings|Kn70> <|db-entry> > <\db-entry|+OcAPOVBj6pZ8yi|article|Sch71> <|db-entry> > <\db-entry|+OcAPOVBj6pZ8yh|article|Leh38> <|db-entry> > <\db-entry|+25SJdDQPDPFr4dP|unpublished|TY90> <|db-entry> C. K. > > <\db-entry|+25SJdDQPDPFr4dO|inproceedings|SZ04> <|db-entry> P. > > <\db-entry|+25SJdDQPDPFr4dZ|inproceedings|Licht05> <|db-entry> > <\db-entry|+2DZx1AuEMe|article|Mol07> <|db-entry> > <\db-entry|+25SJdDQPDPFr4dK|article|BY19> <|db-entry> B.-Y. > <\db-entry|+25SJdDQPDPFr4dY|inproceedings|Morain22> <|db-entry> > <\db-entry|+25SJdDQPDPFr4dN|book|Ste1585> <|db-entry> > <\db-entry|+2bPrDBsrH7rutLQ|inproceedings|Moe73> <|db-entry> > <\db-entry|+1LCfUIVm228oQhYT|article|BGY80> <|db-entry> F. G. D. Y. Y. > Algorithms> <\db-entry|+25SJdDQPDPFr4dR|inproceedings|Str81> <|db-entry> > <\db-entry|+qYhUkmR1lNMNuzt|book|GaGe13> <|db-entry> J. > <\db-entry|+22jpwSM91FqWXyZ4|inbook|Bern08> <|db-entry> > P. > <\db-entry|+1CQ02y1d169CJ0oq|misc|NTL> <|db-entry> > > <\db-entry|+1CQ02y1d169CJ0qA|article|vdH:cyclomult> <|db-entry> J. van der > of Complexity> <\db-entry|+2MmazzPzwkzLNWc|techreport|vdH:ffnlogn> <|db-entry> J. van der > >> > <\db-entry|+QfXtZPU0IGdwf8|article|HaQuZi04> <|db-entry> M. P. > I. speeding up the division and square root of power series> <\db-entry|+QfXtZPU0IGdwfA|article|Har09a> <|db-entry> > <\db-entry|+1hEca2LzyXL|article|CZ81> <|db-entry> H. > <\db-entry|+25SJdDQPDPFr4dS|book|Ber68> <|db-entry> > <\db-entry|+25SJdDQPDPFr4dT|article|Mas69> <|db-entry> > <\db-entry|+25SJdDQPDPFr4dL|article|Dorn87> <|db-entry> > <\db-entry|+1CQ02y1d169CJ0nQ|inproceedings|BenOrTiwari1988> <|db-entry> P. > <\db-entry|+1CQ02y1d169CJ0pz|article|Pro1795> <|db-entry> > <\db-entry|+FzaTvJ4dDvy37X|inproceedings|vdH:rmodroots> <|db-entry> J. van der G. > <\db-entry|+2GwKI2cC86Ig9ZH|article|vdH:graeffe-cca> <|db-entry> M. > <\db-entry|+90uJrWCM9osjDC|article|Lecerf2017> <|db-entry> > Symbolic Comput.> > <\db-entry|+GMDX9xKJbGQwPe|article|CT65> <|db-entry> J. W. > <\db-entry|+1CQ02y1d169CJ0q3|inproceedings|vdH:tft> <|db-entry> > <\db-entry|+8lDHqURSvmeX0g|article|Kar63> <|db-entry> J. > <\db-entry|+9izXaIC09Kv5v1|article|Toom63b> <|db-entry> > <\db-entry|+8lDHqURSvmeX5G|article|Sch77> <|db-entry> > <\db-entry|+qYhUkmR1lNMNuwz|article|CK91> <|db-entry> E. > <\db-entry|+8lDHqURSvmeX6N|article|Str69> <|db-entry> > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> TeXmacs:website Kn70 Sch71 Leh38 TY90 SZ04 Licht05 Mol07 BY19 Morain22 Ste1585 Moe73 BGY80 Str81 TY90 BY19 GaGe13 GaGe13 Bern08 NTL vdH:cyclomult vdH:ffnlogn HaQuZi04 Har09a GaGe13 GaGe13 CZ81 Ber68 Mas69 Dorn87 BenOrTiwari1988 Pro1795 vdH:rmodroots vdH:graeffe-cca GaGe13 Lecerf2017 vdH:graeffe-cca BY19 CT65 Har09a vdH:tft Kar63 Toom63b Sch77 CK91 vdH:ffnlogn Str69 Bern08 <\associate|figure> |1>|> Schematic illustration of the computation of |> and |> by taking the middle product of |M> and the |2\2> matrix with entries |P>, |P>, |Q>, and |Q>. |> |2>|> Schematic illustration of the matricial middle product in the abnormal case, for even |k=2*h>. |> <\associate|table> |1>|> Summary of the constant factors in various cases. |> <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |math-font-series||font-shape||2.Preliminaries> |.>>>>|> |2.1.The binary FFT model |.>>>>|> > |2.2.Other models |.>>>>|> > |math-font-series||font-shape||3.Euclidean remainder sequences> |.>>>>|> |3.1.Definition and basic properties |.>>>>|> > |3.2.Re-indexation of irnormal Euclidean remainder sequences |.>>>>|> > |math-font-series||font-shape||4.The normal case> |.>>>>|> |4.1.Statement of the non-optimized algorithm |.>>>>|> > |4.2.Exploiting the middle product |.>>>>|> > |4.3.Implementation in the binary FFT model |.>>>>|> > |4.4.Generalization to arbitrary lengths |k> |.>>>>|> > |math-font-series||font-shape||5.The general case> |.>>>>|> |5.1.Statement of the non-optimized algorithm |.>>>>|> > |5.2.Exploiting the middle product |.>>>>|> > |5.3.Implementation in the binary FFT model |.>>>>|> > |math-font-series||font-shape||Bibliography> |.>>>>|>