> <\body> root finding method>||>|<\doc-note> This paper is part of a project that has received funding from the French \PAgence de l'innovation de défense\Q. ||<\author-affiliation> Department of Mathematics Simon Fraser University 8888 University Drive Burnaby, British Columbia V5A 1S6, Canada |<\author-affiliation> CNRS, École polytechnique, Institut Polytechnique de Paris Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161) 1, rue Honoré d'Estienne d'Orves Bâtiment Alan Turing, CS35003 91120 Palaiseau, France |>>|||<\author-affiliation> Department of Mathematics Simon Fraser University 8888 University Drive Burnaby, British Columbia V5A 1S6, Canada |>>|<\doc-note> > The tangent Graeffe method has been developed for the efficient computation of single roots of polynomials over finite fields with multiplicative groups of smooth order. It is a key ingredient of sparse interpolation using geometric progressions, in the case when blackbox evaluations are comparatively cheap. In this paper, we improve the complexity of the method by a constant factor and we report on a new implementation of the method and a first parallel implemenation. > Consider a polynomial function \\> over a field > given through a black box capable of evaluating at points in >. The problem of is to recover the representation of \,\,x|]>> in its usual form, as a linear combination <\equation> f=i\t>c*\> of monomials >=x>*\*x>>. One popular approach to sparse interpolation is to evaluate at points in a geometric progression. This approach goes back to work of Prony in the eighteen's century and became well known after Ben-Or and Tiwari's seminal paper. It has widely been used in computer algebra, both in theory and in practice; see for a nice survey. More precisely, if a bound for the number of terms is known, then we first evaluate at pairwise distinct points ,\,\,\>, where =,\,\|)>\\> and \,\,\|)>> for all\>. The generating function of the evaluations at > satisfies the identity <\equation*> \>f|)>*z=i\t>\>c*\*k>*z=i\t>|1-\>*z>=|\>, where =>*z|)>*\*>*z|)>> and \> is of degree t>. The rational function > can be recovered from |)>,f|)>,\,f|)>> using fast Padé approximation. For well chosen points >, it is often possible to recover the exponents > from the values >\\>. If the exponents > are known, then the coefficients > can also be recovered using atransposed version of fast multipoint interpolation. This leaves us with the question how to compute the roots >> of> in an efficient way. For practical applications in computer algebra, we usually have =\>, in which case it is most efficient to use a multi-modular strategy. This means that we rather work with coefficients in afinite field =\>, where is a prime number that we are free to choose. It is well known that polynomial arithmetic over > can be implemented most efficiently using FFTs when the order of the multiplicative group is smooth. In practice, this prompts us to choose of the form +1> for some small and such that fits into amachine word. The traditional way to compute roots of polynomials over finite fields is using Cantor and Zassenhaus' method. In, alternative algorithms were proposed for our case of interest when is smooth. The fastest algorithm was based on the and it gains a factor with respect to Cantor\UZassenhaus' method. The aim of the present paper is to report on a parallel implementation of this new algorithm and on a few improvements that allow for a further constant speed-up. In section, we start by recalling generalities about the Graeffe transform and the heuristic root finding method based on the tangent Graeffe transform from. In section, we present the main new theoretical improvements, which all rely on optimizations in the FFT-model for fast polynomial arithmetic. Our contributions are threefold: <\itemize> In the FFT-model, one backward transform out of four can be saved for Graeffe transforms of order two (see section). When composing a large number of Graeffe transforms of order two, FFT caching can be used to gain another factor of (see section). All optimizations still apply in the TFT model, which can save a factor between one and two, depending on the number of roots (see section). We also indicate how to generalize our methods to Graeffe transforms of general orders in section. Section 4 is devoted to our new implementation. We first implemented a sequential version of the tangent Graeffe method in C, with the optimizations from sections and. We also started experimenting with a parallel implementation in Cilk C. Our sequential implementation confirms the major performance improvement with respect to Cantor\UZassenhaus' algorithm. We also observed the gain of a new factor of two when using the new optimizations. So far, we have achieved a parallel speed-up by a factor of 4.6 on an 8-core machine. Our implementation is freely available at: <\quote-env> The traditional of a monic polynomial \> of degree is the unique monic polynomial \\> of degree such that <\equation> G|)>=P*P. If splits over > into linear factors |)>*\*|)>>, then one has <\equation*> G=|)>*\*|)>. More generally, given 2>, we define the > to be the unique monic polynomial \\> of degree such that <\equation*> G=*Res,u-z|)> If |)>*\*|)>>, then <\equation*> G=|)>*\*|)>. If > is a primitive -th root of unity in >, then we also have <\equation> G|)>=P*P*z|)>*\*P*z|)>. If 2>, then we finally have <\equation> G=G\G=G\G. Let > be a formal indeterminate with =0>. Elements in |]>/|)>> are called . They are of the form > with \> and basic arithmetic operations go as follows: <\eqnarray*> |)>\|)>>||c|)>+d|)>*\>>||)>*|)>>||*\>>>> Now let \> be a monic polynomial of degree that splits over >: <\equation*> P=|)>*\*|)>, where ,\,\\\> are pairwise distinct. Then the \P|)>> satisfies <\equation*> =P+P*\=-\|)>|)>*\*-\|)>|)>. The definitions from the previous subsection readily extend to coefficients in |]>> instead of >. Given 2>, we call |)>> the of of order . We have <\equation*> G|)>=-\|)>|)>*\*-\|)>|)>, where <\equation*> -\|)>=\-r*\*\,k=1,\,d. Now assume that we have an efficient way to determine the roots ,\,\> of >. For some polynomial \>, we may decompose <\equation*> G|)>=G+T*\ For any root > of >, we then have <\eqnarray*> |)>-r*\*\|)>>|||)>+|)>-G|)>*r*\|)>*\>>||||)>-G|)>*r*\|)>*\>>|||>>> Whenever > happens to be a single root of >, it follows that <\equation*> r*\=|)>|G|)>>. If \0>, this finally allows us to recover > as <\equation*> \=r*|r*\>. Assume now that =\> is a finite field, where is a prime number of the form *2+1>> for some small >. Assume also that \\> be a primitive element of order for the multiplicative group of >. Let |)>*\*|)>\\> be as in the previous subsection. The tangent Graeffe method can be used to efficiently compute those > of for which> is a single root of >. In order to guarantee that there are a sufficient number of such roots, we first replace> by |)>> for a random shift \\>, and use the following heuristic: <\description> For any subset ,\,\|}>\\> of cardinality and any />, there exist at least elements \\> such that -\|)>,\,-\|)>|}>> contains at least elements. For a random shift \\> and any />, the assumption ensures with probability at least that |)>|)>> has at least single roots. Now take to be the largest power of two such that /> and let /r>>. By construction, note that >. The roots ,\,\> of > are all -th roots of unity in the set ,\,\*r>|}>>. We may thus determine them by evaluating > at > for ,s-1>>. Since >, this can be done efficiently using a discrete Fourier transform. Combined with the tangent Graeffe method from the previous subsection, this leads to the following probabilistic algorithm for root finding: <\specified-algorithm> ||>|\> of degree and only order one factors, *2+1>>>|>|,\,\|}>> of roots of >>>>> <|specified-algorithm> <\enumerate> If then return > Let \\> be a random shift Let \2>> be largest such that /> and let /r> Compute >\P|)>\\> Compute \P>|)>=P>+P>*\\|]>/|)>|)>> For ,N>, set \G|)>\|]>/|)>|)>> Write =A+B*\> and compute |)>>, |)>>, and |)>> for ,s-1> If |)>=0>, then set |}>>, else set \> Let \\>> be of order For any \,\,\*r>|}>> with |)>=0> and |)>\0>, let S\*A|)>/B|)>+\|}>> Compute \S>|)>> Recursively determine the set of roots > of Return S> <\remark> To compute |)>=G|)>> we may use |)>|)>=A*A+*B+B*A|)>*\>, which requires three polynomial multiplications in > of degree . In total, step 5 therefore performs =O|)>> such multiplications. We discuss how to perform step efficiently in the FFT model in section. <\remark> For practical implementations, one may vary the threshold /> for and the resulting threshold 4*d> for . For larger values of , the computations of the DFTs in step get more expensive, but the proportion of single roots goes up, so more roots are determined at each iteration. From an asymptotic complexity perspective, it would be best to take d*>. In practice, we actually preferred to take the threshold 2*d>, because the constant factor of our implementation of step(based on Bluestein's algorithm) is significant with respect to our highly optimized implementation of the tangent Graeffe method. A second reason we prefer of size instead of |)>> is that the total space used by the algorithm is linear in . In the future, it would be interesting to further speed up step by investing more time in the implementation of high performance DFTs of general orders . <\remark> For the application to sparse interpolation, it is possible to further speed up step for the top-level iteration, which is the most expensive step. More precisely, for apolynomial with terms, the idea is to take =0> and > of order t> instead of > for some constant with c\3>. This reduces (and the cost of the top-level iteration) by a factor of >. For the recursive calls, we still need to work with a primitive root of unity> of order and random shifts. Assume that \> is invertible in > and let \\> be a primitive -th root of unity. Consider a polynomial +a*z+\+a*z\\>. Then the discrete Fourier transform (DFT) of order of the sequence |)>i\n>> is defined by <\equation*> DFT>|)>i\n>|)>\|)>k\n>,\A|)>. We will write >> for the cost of one discrete Fourier transform in terms of the number of operations in> and assume that >|)>>. For any ,n-1|}>>, we have <\equation> DFT>|)>k\n>|)>=k\n>*\=j\n>a*k\n>\*k>=n*a. If is invertible in >, then it follows that >=n*DFT>>. The costs of direct and inverse transforms therefore coincide up to a factor >. If *n> is composite, k\n>, and k\n>, then we have <\eqnarray*> *n+k>>||i\n>i\n>a*n+i>*\*n+i|)>**n+k|)>>>>|||i\n>\*k>*i\n>a*n+i>*\*n*k>|]>*\*k*n>>>|||i\n>*k>*DFT>>*n+i>|)>i\n>|)>>|]>*\**n+k|)>>>>|||>>*k>*DFT>>*n+i>|)>i\n>|)>>|)>i\n>|)>>.>>>> This shows that a DFT of length reduces to > transforms of length > plus > transforms of length > plus multiplications in >: <\equation*> >*n|)>\n*>|)>+n*>|)>+O. In particular, if >, then >\r*>>. It is sometimes convenient to apply DFTs directly to polynomials as well; for this reason, we also define >\|)>k\n>>. Given two polynomials \> with \n>, we may then compute the product using <\eqnarray*> ||>>*DFT>|)>.>>>> In particular, if >> denotes the cost of multiplying two polynomials of degree n>, then we obtain >\3*>\6*>>. <\remark> In Algorithm, we note that step comes down to the computation of three DFTs of length . Since is a power of two, this length is of the form *2> for some \>. In view of(), we may therefore reduce step to > DFTs of length > plus 2> DFTs of length >. If > is very small, then we may use a naive implementation for DFTs of length >. In general, one may use Bluestein's algorithm to reduce the computation of a DFT of length > into the computation of a product in />-1|)>>, which can in turn be computed using FFT-multiplication and three DFTs of length a larger power of two. Let > be a field with a primitive >-th root of unity >. Let \> be a polynomial of degree n>. Then the relation() yields <\equation> G|)>=DFT>>|)>*DFT>|)>|)>. For any ,2*n-1|}>>, we further note that <\equation> DFT>|)>=P|)>=P rem |2*n|\>>|)>=DFT>|)>, so >|)>> can be obtained from >> using transpositions of elements in >. Concerning the inverse transform, we also note that <\equation*> DFT>|)>|)>=G|)>=DFT>|)>, for ,n-1>. Plugging this into(), we conclude that <\equation*> G=DFT>>*DFT> rem |2*n|\>>|)>k\n>|)>. This leads to the following algorithm for the computation of >: <\specified-algorithm> ||>|\> with n> and a primitive >-th root of unity \\>>>|>|>>>>>> <|specified-algorithm> <\enumerate> Compute |)>k\2*n>\DFT>> For ,n-1>, set \ rem |2*n|\>>> For ,n-1>, compute \*> Return >|)>k\n>|)>> <\proposition> Let \\> be a primitive |2*n|\>>-th root of unity in> and assume that is invertible in >. Given a monic polynomial \> with n>, we can compute > in time <\equation*> >\3*>. <\proof> We have already explained the correctness of Algorithm. Step 1 requires one forward DFT of length and cost >=2*>+O>. Steps 2 and 3 can be done in linear time >. Step 4 requires one inverse DFT of length and cost >+O>. The total cost of Algorithm is therefore >+O\3*>>. <\remark> In terms of the complexity of multiplication, we obtain >\*>.> This gives a improvement over the previously best known bound >\*>>> that was used in. Note that the best known algorithm for computing squares of polynomials of degree n> is *>>>. It would be interesting to know whether squares can also be computed in time *>>>. In view of(), Graeffe transforms of power of two orders > can be computed using <\equation> G>=|m\>\G|)>. Now assume that we computed the first Graeffe transform > using Algorithm and that we wish to apply a second Graeffe transform to the result. Then we note that <\equation> DFT>|)>=DFT>|)>= is already known for ,n-1>. We can use this to accelerate step 1 of the second application of Algorithm. Indeed, in view of() for =2> and =n>, we have <\equation> DFT>|)>=DFT>*G|)>i\n>|)> for ,n-1>. In order to exploit this idea in a recursive fashion, it is useful to modify Algorithm so as to include >> in the input and >|)>> in the output. This leads to the following algorithm: <\specified-algorithm> ||>|\> with n>, a primitive >-th root of unity \\>,>>|||)>k\n>=DFT>>>>|>|> and >|)>>>>>>> <|specified-algorithm> <\enumerate> Set |)>k\n>\|)>k\n>> Set |)>k\n>\DFT>*P|)>i\n>|)>> For ,n-1>, set \ rem |2*n|\>>> For ,n-1>, compute \*> Return >|)>k\n>|)>> and |)>k\n>> <\proposition> Let \\> be a primitive |2*n|\>>-th root of unity in> and assume that is invertible in >. Given a monic polynomial \> with n> and 1>, we can compute >> in time <\equation*> ,\>\*>. <\proof> It suffices to compute >> and then to apply Algorithm recursively, times. Every application of Algorithm now takes >+O\2*>> operations in >, whence the claimed complexity bound. <\remark> In, Graeffe transforms of order > were directly computed using the formula(), using 4*m*>> operations in >. The new algorithm is twice as fast for large . The algorithms from subsections and readily generalize to Graeffe transforms of order > for arbitrary 2>, provided that we have an >-th root of unity \\>. For convenience of the reader, we specified the generalization of Algorithm below, together with the resulting complexity bounds. <\specified-algorithm> ||>|\> with n>, 2>, a primitive >-th root of unity \\>,>>|||)>k\n>=DFT>>>>|>|> and >|)>>>>>>> <|specified-algorithm> <\enumerate> Set |)>k\n>\|)>k\n>> For ,r-1>, set |)>k\n>\DFT>*P|)>i\n>|)>> For ,r-1> and ,n-1>, set \ rem |r*n|\>>> For ,n-1>, compute \**\*> Return >|)>k\n>|)>> and |)>k\n>> <\proposition> Let \\> be a primitive >-th root of unity in>, where 2> is invertible in>. Given a monic polynomial \> with n> and 1>, we can compute >> in time <\equation*> ,\>\*>. <\proof> Straightforward generalization of Proposition. <\corollary> Let \\> be a primitive *\*r>*n|)>>-th root of unity in>, where \2,\,r>\2> are invertible in>. Given a monic polynomial \> with n> and ,\,m>\\>, we can compute >*\*r>>>>> in time <\equation*> >*\*r>>>,\>\*m+\+r>*m>+\|)>*>. <\proof> Direct consequence of(). <\remark> In our application to root finding, we are interested in the efficient computation of Graeffe transforms of high order >. In terms of the size > of >, it is instructive to observe that the \Paverage cost\Q <\equation*> >=,\>|log r*>>\ is minimal for . Whenever radix-3 DFTs can be implemented as efficiently as DFTs, then it follows that it might be interesting to use Graeffe transforms of order three. If =\> is a fixed finite field, then DFTs are most efficient for sizes that divide . For our root finding application, it is often convenient to take 2+1>, in which case should be a power of two or three times a power of two. The truncated Fourier transform was developed for the multiplication of polynomials such that the degree of the product does not have a nice size of this type. It turns out that we may also use it for the efficient computation of Graeffe transforms of polynomials of arbitrary degrees. Moreover, the optimizations from the previous subsections still apply. Let us briefly describe how the truncated Fourier transform can be used for the computation of Graeffe transforms of power of two orders. With the notations from subsections and, we assume that >> is a power of two as well and that we wish to compute the Graeffe transform of a polynomial of degree t> with t\n>>. Let >> denote the reversal of a binary number ,2*n-1|}>> of > bits. For instance, =12> and =40>. Then the truncated Fourier of at order t> is defined by <\equation*> TFT,T>=>>|)>,P>>|)>,\,P>>|)>|)>. It has been shown in that \TFT,T>> and ,T>|)>> can both be computed in time*>>. For generalizations to arbitrary radices, we refer to. Taking , we notice that <\equation*> P>>|)>=P>+n/2>|)>=P>>|)> for ,t-1>. This provides us with the required counterpart of() for retrieving ,2*t>|)>> efficiently from ,2*t>>. The relation() also has a natural counterpart: <\equation*> TFT,2*t>|)>=TFT,t>|)>, for ,t-1>, and so does(): <\equation*> TFT,2*t>|)>=DFT,t>*z|)>|)>. \ This leads to the following refinement of Algorithm: <\specified-algorithm> ||>|\> with t\n=2-1>>,>>||>-th root of unity \\>, and |)>k\t>=TFT,t>>>>|>|> and ,t>|)>>>>>>> <|specified-algorithm> <\enumerate> Set |)>k\t>\|)>k\t>> Set |)>k\t>\TFT,t>*z|)>|)>> For ,t-1>, set \> For ,t-1>, compute \*> Return ,t>|)>k\t>|)>> and |)>k\t>> <\proposition> Let \\> be a primitive |2*n|\>>-th root of unity in>, where >>, and assume that is invertible in >. Given a monic polynomial \> with deg P\t\n> and 1>, we can compute >> in time <\equation*> ,\>\**>. <\proof> Straightforward adaptation of the proof of Proposition, while using. In step of Algorithm, we still need an algorithm to compute the Taylor shift |)>>. If the characteristic of > exceeds , then it is (not so) well known3> that this can be reduced to a single polynomial multiplication of degree using the following algorithm: <\specified-algorithm> ||>|\> of degree char \> and \\>>>|>||)>>>>>>> <|specified-algorithm> <\enumerate> 0!*P+1!*P*z+\+d!*P*z> \z*L> 1+\*z+*\*z+\+*\*z> |~>\*E rem z> \z*|~>> Return *\+*\*z+\+*\*z> It is interesting to observe that Taylor shifts can still be computed in time |)>> in small characteristic, as follows: <\specified-algorithm> ||>|\> of degree p=char \\0> and \\>>>|>||)>>>>>>> <|specified-algorithm> <\enumerate> Define =z>> for ,k> where |log d/log p|\>> Rewrite ,\,z|)>\\,\,z|]>> with > P\p> for ,k-1> For ,k>, replace \,\,z,z+\>,z,\,z|)>> Return ,\,z>|)>> We have implemented the tangent Graeffe root finding algorithm (Algorithm) in C with the optimizations presented in section. Our C implementation supports primes of size up to 63 bits. In what follows all complexities count arithmetic operations in >. In Tables and, the input polynomial > of degree is constructed by choosing distinct values \\> for i\d> at random and creating =|)>>. We will use 29\2+1>, a smooth 63 bit prime. For this prime > is >. One goal we have is to determine how much faster the Tangent Graeffe (TG) root finding algorithm is in practice when compared with the Cantor-Zassenhaus (CZ) algorithm which is implemented in many computer algebra systems. In Table we present timings comparing our sequential implementation of the TG algorithm with Magma's implementation of the CZ algorithm. |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||>|>||||>|>|>||>|-1>>||||||||>|-1>>||||||||>|-1>>||||||||>|-1>>||||||||>|-1>>||||||||>|-1>>||||||||>|-1>>||||||||>|-1>>||||||||>>>>> \; Sequential timings in CPU seconds (29\2+1>)>> For polynomials in >, Magma uses Shoup's factorization algorithm from. For our input >, with distinct linear factors, Shoup uses the Cantor\UZassenhaus equal degree factorization method. Recall that the average complexity of TG is *+log d|)>|)>> and of CZ is *log p*log d|)>>. We also wanted to attempt a parallel implementation. To do this we used the MIT Cilk C compiler from. The Cilk C compiler provides a simple fork-join model of parallelism. Unlike the CZ algo- rithm, TG has no gcd computations that are hard to parallelize. We present some initial parallel timing data in Table. The timings in Table are sequential timings obtained on a a Linux server with an Intel Xeon E5-2660 CPU with 8 cores. In Table the time in column \Pfirst\Q is for the first application of the TG algorithm (steps \U of Algorithm) showing that it obtains about 69% of the roots. The time in column \Ptotal\Q is the total time for the algorithm. Columns step, step, and step report the time spent in steps , , and in Algorithm and do not count time in the recursive call in step. |||||||||||||||||||||||||||||||>|>|||>|>|>>|-1>>|||||>|-1>>|||||>|-1>>|||||>|-1>>|||||>|-1>>|||||>|-1>>|||||>|-1>>|||||>>>>> \; Real times in seconds for 1 core (8 cores) and 29\2+1>>> The Magma column timing is for Magma's command. The timings for Magma version V2.25-3 suggest that Magma's CZ implementation involves a subalgorithm with quadratic asymptotic complexity. Indeed it turns out that the author of the code implemented all of the sub-quadratic polynomial arithmetic correctly, as demonstrated by the second set of timings for Magma in column V2.25-5, but inserted the linear factors found into a list using linear insertion! Allan Steel of the Magma group identified and fixed the offending subroutine for Magma version V2.25-5. The timings show that TG is faster than CZ by a factor of 76.6 (=8.43/0.11) to 146.3 (=2809/19.2). To implement the Taylor shift |)>> in step, we used the |)>> method from. The better known method presented in is *log d|)>>. For step we use Algorithm as presented. It has complexity *log p|)>>. To evaluate ,A> and > in step in |)>> we used the Bluestein transformation. In step to compute the product =\\S>*|)>>, for > roots, we used the *log t|)>> tree multiplication algorithm. The division in step is done in |)>> with the fast division. The sequential timings in Tables and show that steps, and account for about 91% of the total time. We parallelized these three steps as follows. For step , the two forward and two inverse FFTs are done in parallel. We also parallelized our radix2 FFT by parallelizing recursive calls for size 2> and the main loop in blocks of size 2> as done in. For step there are three applications of Bluestein to compute |)>>, |)>> and |)>>. We parallelized these (thereby doubling the overall space used by our implementation). The main computation in the Bluestein transformation is a polynomial multiplication of two polynomials of degree. The two forward FFTs are done in parallel and the FFTs themselves are parallelized as for step. For the product in step we parallelize the two recursive calls in the tree multiplication for large sizes and again, the FFTs are parallelized as for step. To improve parallel speedup we also parallelized the polynomial multiplication in step and the computation of the roots in step. Although step is |)>>, it is relatively expensive because of two inverse computations in >. Because we have not parallelized about 5% of the computation the maximum parallel speedup we can obtain is a factor of =5.9>. The best overall parallel speedup we obtained is a factor of 4.6=1465/307.7 for -1>. In Cilk, for each recursive C subroutine we wish to parallelize, one first needs to make a parallel version of that routine. For large recursive calls, the parallel version does them in parallel. For smaller recursive calls, the parallel version needs to call the sequential version of the code to avoid the Cilk process management overhead. The cutoff for the size of the input for which we need to use the sequential code has to be determined by experiment. <\bibliography|bib|tm-plain|> <\bib-list|31> A.V.Aho, K.SteiglitzJ.D.Ullman. Evaluating polynomials on a fixed set of points. , 4:533\U539, 1975. M.Ben-OrP.Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. , 301\U309. ACM Press, 1988. LeoI.Bluestein. A linear filtering approach to the computation of discrete Fourier transform. , 18(4):451\U455, 1970. A.Bostan, G.LecerfÉ.Schost. Tellegen's principle into practice. , 37\U44. ACM Press, 2003. R.P.Brent, F.G.GustavsonD.Y.Y.Yun. Fast solution of Toeplitz systems of equations and computation of Padé approximants. Algorithms>, 1(3):259\U295, 1980. J.Canny, E.KaltofenY.Lakshman. Solving systems of non-linear polynomial equations faster. , 121\U128. ACM Press, 1989. D.G.CantorH.Zassenhaus. A new algorithm for factoring polynomials over finite fields. , 36(154):587\U592, 1981. A.DíazE.Kaltofen. FOXFOX: a system for manipulating symbolic objects in black box representation. , 30\U37. ACM Press, 1998. T.S.Freeman, G.M.Imirzian, E.KaltofenY.Lakshman. DAGWOOD: a system for manipulating polynomials given by straight-line programs. , 14:218\U240, 1988. M.Frigo, C.E.LeisorsonR.K.Randall. The implementation of the Cilk-5 multithreaded language. , 212\U223. ACM, 1998. J.vonzur GathenJ.Gerhard. . Cambridge University Press, New York, 3rd, 2013. B.Grenet, J.vander HoevenG.Lecerf. Randomized root finding over finite fields using tangent Graeffe transforms. , 197\U204. New York, NY, USA, 2015. ACM. B.Grenet, J.vander HoevenG.Lecerf. Deterministic root finding over finite fields using Graeffe transforms. , 27(3):237\U257, 2016. J.vander Hoeven. The truncated Fourier transform and applications. J.Gutierrez, , 290\U296. Univ. of Cantabria, Santander, Spain, July 4\U7 2004. J.vander HoevenG.Lecerf. On the bit-complexity of sparse polynomial and series multiplication. Symbolic Comput.>, 50:227\U254, 2013. J.vander HoevenG.Lecerf. Sparse polynomial interpolation in practice. , 48(3/4):187\U191, 2015. J.vander Hoeven etal. GNU TeXmacs. , 1998. J.HuM.B.Monagan. A fast parallel sparse polynomial GCD algorithm. S.A.Abramov, E.V.ZimaX.-S.Gao, , 271\U278. ACM, 2016. M.A.HuangA.J.Rao. Interpolation of sparse multivariate polynomials over large finite fields with applications. , 508\U517. Philadelphia, PA, USA, 1996. Society for Industrial and Applied Mathematics. M.JavadiM.Monagan. Parallel sparse polynomial interpolation over finite fields. M.Moreno MazaJ.-L.Roch, , 160\U168. ACM Press, 2010. E.Kaltofen. Computing with polynomials given by straight-line programs I: greatest common divisors. , 131\U142. ACM Press, 1985. E.Kaltofen, Y.N.LakshmanJ.-M.Wiley. Modular rational sparse multivariate polynomial interpolation. , 135\U139. New York, NY, USA, 1990. ACM Press. E.KaltofenB.M.Trager. Computing with polynomials given by black boxes for their evaluations: greatest common divisors, factorization, separation of numerators and denominators. Symbolic Comput.>, 9(3):301\U320, 1990. E.KaltofenL.Yagati. Improved sparse multivariate polynomial interpolation algorithms. , 467\U474. Springer Verlag, 1988. R.Larrieu. The truncated Fourier transform for mixed radices. , 261\U268. New York, NY, USA, 2017. ACM. M.LawM.Monagan. A parallel implementation for polynomial multiplication modulo a prime. , 78\U86. ACM, 2015. R.Moenck. Fast computation of GCDs. , 142\U171. New York, 1973. ACM Press. H.MuraoT.Fujise. Modular algorithm for sparse multivariate polynomial interpolation and its parallel implementation. , 21:377\U396, 1996. R.Prony. Essai expérimental et analytique sur les lois de la dilatabilité des fluides élastiques et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool, à différentes températures. de l'École Polytechnique Floréal et Plairial, an III>, 1(cahier 22):24\U76, 1795. D.S.Roche. What can (and can't) we do with sparse polynomials? C.Arreche, , 25\U30. ACM Press, 2018. V.Shoup. A new polynomial factorization and its implementation. , 20(4):363\U397, 1995. <\initial> <\collection> <\attachments> <\collection> <\associate|bib-biblio> <\db-entry|+if2lDqI1qmMqAaI|inproceedings|CKL89> <|db-entry> E. Y. > <\db-entry|+2X2sGxMCqQBxuWk|inproceedings|BoLeSc:2003:tellegen> <|db-entry> G. É. > <\db-entry|+1LCfUIVm228oQhYU|inproceedings|GR11> <|db-entry> D. S. > <\associate|bib-bibliography> <\db-entry|+if2lDqI1qmMqAaI|inproceedings|CKL89> <|db-entry> E. Y. > <\db-entry|+2X2sGxMCqQBxuWk|inproceedings|BoLeSc:2003:tellegen> <|db-entry> G. É. > <\db-entry|+2FvHpwlf2H6fD5fe|misc|TeXmacs:website> <|db-entry> > > <\db-entry|+2DPRky2cs1xL3Pp|article|Pro1795> <|db-entry> > de l'École Polytechnique Floréal et Plairial, an III> <\db-entry|+if2lDqI1qmMqAaW|inproceedings|BenOrTiwari1988> <|db-entry> P. > <\db-entry|+8lDHqURSvmeX09|inproceedings|HuangRao1996> <|db-entry> A. J. > <\db-entry|+1CdAZXTV2814h167|inproceedings|Kaltofen1985:stoc> <|db-entry> > <\db-entry|+2MmazzPzwkzLNWg|inproceedings|KaltofenLakshmanWiley1990> <|db-entry> Y. N. J.-M. > <\db-entry|+2DPRky2cs1xL3PV|article|KaltofenTrager1990> <|db-entry> B. M. > Symbolic Comput.> <\db-entry|+2MmazzPzwkzLNWi|inproceedings|KaltofenYagati1988> <|db-entry> L. > <\db-entry|+2MmazzPzwkzLNWe|article|MF96> <|db-entry> T. > <\db-entry|+1zXxMKe2u4Uc2Nj|inproceedings|DiazKaltofen1998> <|db-entry> E. > <\db-entry|+aBpkOnY2VUnXTuN|article|FrImKaLa88> <|db-entry> G. M. E. Y. > <\db-entry|+RbpjgzeC8oxQji|article|HoevenLecerf2013> <|db-entry> G. > Symbolic Comput.> <\db-entry|+8lDHqURSvmeXA0|article|vdH:spinpol> <|db-entry> G. > <\db-entry|+Qvfc7gcMyadUaW|inproceedings|HM16> <|db-entry> M. B. > E. V. X.-S. > <\db-entry|+2DPRky2cs1xL3PQ|inproceedings|JaMo10> <|db-entry> M. > J.-L. > <\db-entry|+2DPRky2cs1xL3Q7|inproceedings|Roche2018> <|db-entry> > > <\db-entry|+1LCfUIVm228oQhYT|article|BGY80> <|db-entry> F. G. D. Y. Y. > Algorithms> <\db-entry|+2bPrDBsrH7rutLQ|inproceedings|Moe73> <|db-entry> > <\db-entry|+1hEca2LzyXL|article|CZ81> <|db-entry> H. > <\db-entry|+FzaTvJ4dDvy37X|inproceedings|vdH:rmodroots> <|db-entry> J. van der G. > <\db-entry|+a4YsMqpmrhQcy5|article|vdH:dmodroots> <|db-entry> J. van der G. > <\db-entry|+8lDHqURSvmeWve|article|Blue70> <|db-entry> > <\db-entry|+TM239x3GmAFtnJ|inproceedings|vdH:tft> <|db-entry> > > <\db-entry|+Ct2UJzyF1FJmm3|inproceedings|Lar16> <|db-entry> > <\db-entry|+8lDHqURSvmeWuY|article|ASU75> <|db-entry> K. J. D. > <\db-entry|+b2Lm9Gm5UwaN4w|article|Shoup95> <|db-entry> > <\db-entry|+1THaB2zzcNhYWrl|inproceedings|FLR98> <|db-entry> C.E. R.K. > <\db-entry|+2XLWiHno2OWJIwsB|book|GaGe13> <|db-entry> J. > <\db-entry|+1THaB2zzcNhYWrm|inproceedings|LM15> <|db-entry> M. > <\references> <\collection> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > <\auxiliary> <\collection> <\associate|bib> TeXmacs:website Pro1795 BenOrTiwari1988 CKL89 HuangRao1996 Kaltofen1985:stoc KaltofenLakshmanWiley1990 KaltofenTrager1990 KaltofenYagati1988 MF96 DiazKaltofen1998 FrImKaLa88 HoevenLecerf2013 vdH:spinpol HM16 JaMo10 Roche2018 BGY80 Moe73 BoLeSc:2003:tellegen CKL89 CZ81 vdH:rmodroots vdH:dmodroots vdH:rmodroots Blue70 Blue70 vdH:rmodroots vdH:rmodroots vdH:tft Lar16 vdH:tft ASU75 Shoup95 FLR98 ASU75 GaGe13 Blue70 GaGe13 LM15 <\associate|table> |1>||Sequential timings in CPU seconds (|p=3\29\2+1>)>|> |2>||Real times in seconds for 1 core (8 cores) and |p=3\29\2+1>>|> <\associate|toc> |math-font-series||font-shape||1.Introduction> |.>>>>|> |math-font-series||font-shape||2.Root finding using the tangent Graeffe transform> |.>>>>|> |2.1.Graeffe transforms |.>>>>|> > |2.2.Root finding using tangent Graeffe transforms |.>>>>|> > |2.3.Heuristic root finding over smooth finite fields |.>>>>|> > |math-font-series||font-shape||3.Computing Graeffe transforms> |.>>>>|> |3.1.Reminders about discrete Fourier transforms |.>>>>|> > |3.2.Graeffe transforms of order two |.>>>>|> > |3.3.Graeffe transforms of power of two orders |.>>>>|> > |3.4.Graeffe transforms of arbitrary smooth orders |.>>>>|> > |3.5.Truncated Fourier transforms |.>>>>|> > |3.6.Taylor shifts |.>>>>|> > |math-font-series||font-shape||4.Implementation and benchmarks> |.>>>>|> |4.1.Implementation notes |.>>>>|> > |math-font-series||font-shape||Bibliography> |.>>>>|> <\links> <\collection> > > > > > > > > > > > >