Implementing the tangent Graeffe
root finding method

Joris van der Hoevenabc, Michael Monaganad

a. Department of Mathematics

Simon Fraser University

8888 University Drive

Burnaby, British Columbia

V5A 1S6, Canada

b. CNRS, École polytechnique, Institut Polytechnique de Paris

Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161)

1, rue Honoré d'Estienne d'Orves

Bâtiment Alan Turing, CS35003

91120 Palaiseau, France

c. Email: vdhoeven@lix.polytechnique.fr

d. Email: mmonagan@cecm.sfu.ca

October 21, 2020

. This paper is part of a project that has received funding from the French “Agence de l'innovation de défense”.

. This article has been written using GNU TeXmacs [17].

The tangent Graeffe method has been developed for the efficient computation of single roots of polynomials over finite fields with multiplicative groups of smooth order. It is a key ingredient of sparse interpolation using geometric progressions, in the case when blackbox evaluations are comparatively cheap. In this paper, we improve the complexity of the method by a constant factor and we report on a new implementation of the method and a first parallel implemenation.

1.Introduction

Consider a polynomial function over a field given through a black box capable of evaluating at points in . The problem of sparse interpolation is to recover the representation of in its usual form, as a linear combination

(1)

of monomials . One popular approach to sparse interpolation is to evaluate at points in a geometric progression. This approach goes back to work of Prony in the eighteen's century [29] and became well known after Ben-Or and Tiwari's seminal paper [2]. It has widely been used in computer algebra, both in theory [6, 19, 21, 22, 23, 24, 28] and in practice [8, 9, 15, 16, 18, 20]; see [30] for a nice survey.

More precisely, if a bound for the number of terms is known, then we first evaluate at pairwise distinct points , where and for all . The generating function of the evaluations at satisfies the identity

where and is of degree . The rational function can be recovered from using fast Padé approximation [5, 27]. For well chosen points , it is often possible to recover the exponents from the values . If the exponents are known, then the coefficients can also be recovered using a transposed version of fast multipoint interpolation [4, 6]. This leaves us with the question how to compute the roots of in an efficient way.

For practical applications in computer algebra, we usually have , in which case it is most efficient to use a multi-modular strategy. This means that we rather work with coefficients in a finite field , where is a prime number that we are free to choose. It is well known that polynomial arithmetic over can be implemented most efficiently using FFTs when the order of the multiplicative group is smooth. In practice, this prompts us to choose of the form for some small and such that fits into a machine word.

The traditional way to compute roots of polynomials over finite fields is using Cantor and Zassenhaus' method [7]. In [12, 13], alternative algorithms were proposed for our case of interest when is smooth. The fastest algorithm was based on the tangent Graeffe transform and it gains a factor with respect to Cantor–Zassenhaus' method. The aim of the present paper is to report on a parallel implementation of this new algorithm and on a few improvements that allow for a further constant speed-up.

In section 2, we start by recalling generalities about the Graeffe transform and the heuristic root finding method based on the tangent Graeffe transform from [12]. In section 3, we present the main new theoretical improvements, which all rely on optimizations in the FFT-model for fast polynomial arithmetic. Our contributions are threefold:

We also indicate how to generalize our methods to Graeffe transforms of general orders in section 3.4.

Section 4 is devoted to our new implementation. We first implemented a sequential version of the tangent Graeffe method in C, with the optimizations from sections 3.2 and 3.3. We also started experimenting with a parallel implementation in Cilk C. Our sequential implementation confirms the major performance improvement with respect to Cantor–Zassenhaus' algorithm. We also observed the gain of a new factor of two when using the new optimizations. So far, we have achieved a parallel speed-up by a factor of 4.6 on an 8-core machine. Our implementation is freely available at:

http://www.cecm.sfu.ca/CAG/code/TangentGraeffe

2.Root finding using the tangent Graeffe transform

2.1.Graeffe transforms

The traditional Graeffe transform of a monic polynomial of degree is the unique monic polynomial of degree such that

(2)

If splits over into linear factors , then one has

More generally, given , we define the Graeffe transform of order to be the unique monic polynomial of degree such that

If , then

If is a primitive -th root of unity in , then we also have

(3)

If , then we finally have

(4)

2.2.Root finding using tangent Graeffe transforms

Let be a formal indeterminate with . Elements in are called tangent numbers. They are of the form with and basic arithmetic operations go as follows:

Now let be a monic polynomial of degree that splits over :

where are pairwise distinct. Then the tangent deformation satisfies

The definitions from the previous subsection readily extend to coefficients in instead of . Given , we call the tangent Graeffe transform of of order . We have

where

Now assume that we have an efficient way to determine the roots of . For some polynomial , we may decompose

For any root of , we then have

Whenever happens to be a single root of , it follows that

If , this finally allows us to recover as

2.3.Heuristic root finding over smooth finite fields

Assume now that is a finite field, where is a prime number of the form for some small . Assume also that be a primitive element of order for the multiplicative group of .

Let be as in the previous subsection. The tangent Graeffe method can be used to efficiently compute those of for which is a single root of . In order to guarantee that there are a sufficient number of such roots, we first replace by for a random shift , and use the following heuristic:

H

For any subset of cardinality and any , there exist at least elements such that contains at least elements.

For a random shift and any , the assumption ensures with probability at least that has at least single roots.

Now take to be the largest power of two such that and let . By construction, note that . The roots of are all -th roots of unity in the set . We may thus determine them by evaluating at for . Since , this can be done efficiently using a discrete Fourier transform. Combined with the tangent Graeffe method from the previous subsection, this leads to the following probabilistic algorithm for root finding:

Algorithm 1

Input: of degree and only order one factors,
Output: the set of roots of

  1. If then return

  2. Let be a random shift

    Let be largest such that and let

  3. Compute

  4. Compute

  5. For , set

  6. Write and compute , , and for

  7. If , then set , else set

  8. Let be of order

    For any with and ,

    let

  9. Compute

  10. Recursively determine the set of roots of

  11. Return

Remark 1. To compute we may use , which requires three polynomial multiplications in of degree . In total, step 5 therefore performs such multiplications. We discuss how to perform step 5 efficiently in the FFT model in section 3.

Remark 2. For practical implementations, one may vary the threshold for and the resulting threshold for . For larger values of , the computations of the DFTs in step 6 get more expensive, but the proportion of single roots goes up, so more roots are determined at each iteration. From an asymptotic complexity perspective, it would be best to take . In practice, we actually preferred to take the lower threshold , because the constant factor of our implementation of step 6 (based on Bluestein's algorithm [3]) is significant with respect to our highly optimized implementation of the tangent Graeffe method. A second reason we prefer of size instead of is that the total space used by the algorithm is linear in . In the future, it would be interesting to further speed up step 6 by investing more time in the implementation of high performance DFTs of general orders .

Remark 3. For the application to sparse interpolation, it is possible to further speed up step 5 for the top-level iteration, which is the most expensive step. More precisely, for a polynomial with terms, the idea is to take and of order instead of for some constant with . This reduces (and the cost of the top-level iteration) by a factor of . For the recursive calls, we still need to work with a primitive root of unity of order and random shifts.

3.Computing Graeffe transforms

3.1.Reminders about discrete Fourier transforms

Assume that is invertible in and let be a primitive -th root of unity. Consider a polynomial . Then the discrete Fourier transform (DFT) of order of the sequence is defined by

We will write for the cost of one discrete Fourier transform in terms of the number of operations in and assume that . For any , we have

(5)

If is invertible in , then it follows that . The costs of direct and inverse transforms therefore coincide up to a factor .

If is composite, , and , then we have

(6)

This shows that a DFT of length reduces to transforms of length plus transforms of length plus multiplications in :

In particular, if , then .

It is sometimes convenient to apply DFTs directly to polynomials as well; for this reason, we also define . Given two polynomials with , we may then compute the product using

In particular, if denotes the cost of multiplying two polynomials of degree , then we obtain .

Remark 4. In Algorithm 1, we note that step 6 comes down to the computation of three DFTs of length . Since is a power of two, this length is of the form for some . In view of (6), we may therefore reduce step 6 to DFTs of length plus DFTs of length . If is very small, then we may use a naive implementation for DFTs of length . In general, one may use Bluestein's algorithm [3] to reduce the computation of a DFT of length into the computation of a product in , which can in turn be computed using FFT-multiplication and three DFTs of length a larger power of two.

3.2.Graeffe transforms of order two

Let be a field with a primitive -th root of unity . Let be a polynomial of degree . Then the relation (2) yields

(7)

For any , we further note that

(8)

so can be obtained from using transpositions of elements in . Concerning the inverse transform, we also note that

for . Plugging this into (7), we conclude that

This leads to the following algorithm for the computation of :

Algorithm 2

Input: with and a primitive -th root of unity
Output:

  1. Compute

  2. For , set

  3. For , compute

  4. Return

Proposition 5. Let be a primitive -th root of unity in and assume that is invertible in . Given a monic polynomial with , we can compute in time

Proof. We have already explained the correctness of Algorithm 2. Step 1 requires one forward DFT of length and cost . Steps 2 and 3 can be done in linear time . Step 4 requires one inverse DFT of length and cost . The total cost of Algorithm 2 is therefore .

Remark 6. In terms of the complexity of multiplication, we obtain This gives a improvement over the previously best known bound that was used in [12]. Note that the best known algorithm for computing squares of polynomials of degree is . It would be interesting to know whether squares can also be computed in time .

3.3.Graeffe transforms of power of two orders

In view of (4), Graeffe transforms of power of two orders can be computed using

(9)

Now assume that we computed the first Graeffe transform using Algorithm 2 and that we wish to apply a second Graeffe transform to the result. Then we note that

(10)

is already known for . We can use this to accelerate step 1 of the second application of Algorithm 2. Indeed, in view of (6) for and , we have

(11)

for . In order to exploit this idea in a recursive fashion, it is useful to modify Algorithm 2 so as to include in the input and in the output. This leads to the following algorithm:

Algorithm 3

Input: with , a primitive -th root of unity ,
and
Output: and

  1. Set

  2. Set

  3. For , set

  4. For , compute

  5. Return and

Proposition 7. Let be a primitive -th root of unity in and assume that is invertible in . Given a monic polynomial with and , we can compute in time

Proof. It suffices to compute and then to apply Algorithm 3 recursively, times. Every application of Algorithm 3 now takes operations in , whence the claimed complexity bound.

Remark 8. In [12], Graeffe transforms of order were directly computed using the formula (9), using operations in . The new algorithm is twice as fast for large .

3.4.Graeffe transforms of arbitrary smooth orders

The algorithms from subsections 3.2 and 3.3 readily generalize to Graeffe transforms of order for arbitrary , provided that we have an -th root of unity . For convenience of the reader, we specified the generalization of Algorithm 3 below, together with the resulting complexity bounds.

Algorithm 4

Input: with , , a primitive -th root of unity ,
and
Output: and

  1. Set

  2. For , set

  3. For and , set

  4. For , compute

  5. Return and

Proposition 9. Let be a primitive -th root of unity in , where is invertible in . Given a monic polynomial with and , we can compute in time

Proof. Straightforward generalization of Proposition 7.

Corollary 10. Let be a primitive -th root of unity in , where are invertible in . Given a monic polynomial with and , we can compute in time

Proof. Direct consequence of (4).

Remark 11. In our application to root finding, we are interested in the efficient computation of Graeffe transforms of high order . In terms of the size of , it is instructive to observe that the “average cost”

is minimal for . Whenever radix-3 DFTs can be implemented as efficiently as radix-2 DFTs, then it follows that it might be interesting to use Graeffe transforms of order three.

3.5.Truncated Fourier transforms

If is a fixed finite field, then DFTs are most efficient for sizes that divide . For our root finding application, it is often convenient to take , in which case should be a power of two or three times a power of two. The truncated Fourier transform was developed for the multiplication of polynomials such that the degree of the product does not have a nice size of this type. It turns out that we may also use it for the efficient computation of Graeffe transforms of polynomials of arbitrary degrees. Moreover, the optimizations from the previous subsections still apply.

Let us briefly describe how the truncated Fourier transform can be used for the computation of Graeffe transforms of power of two orders. With the notations from subsections 3.2 and 3.3, we assume that is a power of two as well and that we wish to compute the Graeffe transform of a polynomial of degree with . Let denote the reversal of a binary number of bits. For instance, and . Then the truncated Fourier of at order is defined by

It has been shown in [14] that and can both be computed in time . For generalizations to arbitrary radices, we refer to [25].

Taking , we notice that

for . This provides us with the required counterpart of (8) for retrieving efficiently from . The relation (10) also has a natural counterpart:

for , and so does (11):

This leads to the following refinement of Algorithm 3:

Algorithm 5

Input: with ,
a primitive -th root of unity , and
Output: and

  1. Set

  2. Set

  3. For , set

  4. For , compute

  5. Return and

Proposition 12. Let be a primitive -th root of unity in , where , and assume that is invertible in . Given a monic polynomial with and , we can compute in time

Proof. Straightforward adaptation of the proof of Proposition 7, while using [14].

3.6.Taylor shifts

In step 3 of Algorithm 1, we still need an algorithm to compute the Taylor shift . If the characteristic of exceeds , then it is (not so) well known [1, Lemma 3] that this can be reduced to a single polynomial multiplication of degree using the following algorithm:

Algorithm 6

Input: of degree and
Output:

  1. Return

It is interesting to observe that Taylor shifts can still be computed in time in small characteristic, as follows:

Algorithm 7

Input: of degree and
Output:

  1. Define for where

  2. Rewrite with for

  3. For , replace

  4. Return

4.Implementation and benchmarks

We have implemented the tangent Graeffe root finding algorithm (Algorithm 1) in C with the optimizations presented in section 3. Our C implementation supports primes of size up to 63 bits. In what follows all complexities count arithmetic operations in .

In Tables 1 and 2, the input polynomial of degree is constructed by choosing distinct values for at random and creating . We will use , a smooth 63 bit prime. For this prime is .

One goal we have is to determine how much faster the Tangent Graeffe (TG) root finding algorithm is in practice when compared with the Cantor-Zassenhaus (CZ) algorithm which is implemented in many computer algebra systems. In Table 1 we present timings comparing our sequential implementation of the TG algorithm with Magma's implementation of the CZ algorithm.

Our serial TG implementation in C Magma CZ timings
total first %roots step 5 step 6 step 9 V2.25-3 V2.25-5
0.11s 0.07s 69.8% 0.04s 0.02s 0.01s 23.22s 8.43
0.22s 0.14s 69.8% 0.09s 0.03s 0.01s 56.58s 18.94
0.48s 0.31s 68.8% 0.18s 0.07s 0.02s 140.76s 44.07
1.00s 0.64s 69.2% 0.38s 0.16s 0.04s 372.22s 103.5
2.11s 1.36s 68.9% 0.78s 0.35s 0.10s 1494.0s 234.2
4.40s 2.85s 69.2% 1.62s 0.74s 0.23s 6108.8s 534.5
9.16s 5.91s 69.2% 3.33s 1.53s 0.51s NA 1219.
19.2s 12.4s 69.2% 6.86s 3.25s 1.13s NA 2809.

Table 1. Sequential timings in CPU seconds ()

For polynomials in , Magma uses Shoup's factorization algorithm from [ 31 ]. For our input , with distinct linear factors, Shoup uses the Cantor–Zassenhaus equal degree factorization method. Recall that the average complexity of TG is and of CZ is . We also wanted to attempt a parallel implementation. To do this we used the MIT Cilk C compiler from [ 10 ]. The Cilk C compiler provides a simple fork-join model of parallelism. Unlike the CZ algo- rithm, TG has no gcd computations that are hard to parallelize. We present some initial parallel timing data in Table 2 .

The timings in Table 1 are sequential timings obtained on a a Linux server with an Intel Xeon E5-2660 CPU with 8 cores. In Table 1 the time in column “first” is for the first application of the TG algorithm (steps 19 of Algorithm 1 ) showing that it obtains about 69% of the roots. The time in column “total” is the total time for the algorithm. Columns step 5 , step 6 , and step 9 report the time spent in steps 5 , 6 , and 9 in Algorithm 1 and do not count time in the recursive call in step 10 .

Our parallel tangent Graeffe implementation in Cilk C
total first step 5 step 6 step 9
18.30s(9.616s) 11.98s(2.938s) 6.64s(1.56s) 3.13s(0.49s) 1.09s(0.29s)
38.69s(12.40s) 25.02s(5.638s) 13.7s(3.03s) 6.62s(1.04s) 2.40s(0.36s)
79.63s(20.16s) 52.00s(11.52s) 28.1s(5.99s) 13.9s(2.15s) 5.32s(0.85s)
166.9s(41.62s) 107.8s(23.25s) 57.6s(11.8s) 28.9s(4.57s) 11.7s(1.71s)
346.0s(76.64s) 223.4s(46.94s) 117.s(23.2s) 60.3s(9.45s) 25.6s(3.54s)
712.7s(155.0s) 459.8s(95.93s) 238.s(46.7s) 125.s(19.17) 55.8s(7.88s)
1465.s(307.7s) 945.0s(194.6s) 481.s(92.9s) 259.s(39.2s) 121.s(16.9s)

Table 2. Real times in seconds for 1 core (8 cores) and

The Magma column timing is for Magma's Factorization command. The timings for Magma version V2.25-3 suggest that Magma's CZ implementation involves a subalgorithm with quadratic asymptotic complexity. Indeed it turns out that the author of the code implemented all of the sub-quadratic polynomial arithmetic correctly, as demonstrated by the second set of timings for Magma in column V2.25-5, but inserted the linear factors found into a list using linear insertion! Allan Steel of the Magma group identified and fixed the offending subroutine for Magma version V2.25-5. The timings show that TG is faster than CZ by a factor of 76.6 (=8.43/0.11) to 146.3 (=2809/19.2).

4.1.Implementation notes

To implement the Taylor shift in step 3, we used the method from [1, Lemma 3]. The better known method presented in [11] is . For step 5 we use Algorithm 3 as presented. It has complexity . To evaluate and in step 6 in we used the Bluestein transformation [3]. In step 9 to compute the product , for roots, we used the tree multiplication algorithm [11]. The division in step 10 is done in with the fast division.

The sequential timings in Tables 1 and 2 show that steps 5, 6 and 9 account for about 91% of the total time. We parallelized these three steps as follows. For step 5, the two forward and two inverse FFTs are done in parallel. We also parallelized our radix 2 FFT by parallelizing recursive calls for size and the main loop in blocks of size as done in [26]. For step 6 there are three applications of Bluestein to compute , and . We parallelized these (thereby doubling the overall space used by our implementation). The main computation in the Bluestein transformation is a polynomial multiplication of two polynomials of degree . The two forward FFTs are done in parallel and the FFTs themselves are parallelized as for step 5. For the product in step 9 we parallelize the two recursive calls in the tree multiplication for large sizes and again, the FFTs are parallelized as for step 5.

To improve parallel speedup we also parallelized the polynomial multiplication in step 3 and the computation of the roots in step 8. Although step 8 is , it is relatively expensive because of two inverse computations in . Because we have not parallelized about 5% of the computation the maximum parallel speedup we can obtain is a factor of . The best overall parallel speedup we obtained is a factor of 4.6=1465/307.7 for .

In Cilk, for each recursive C subroutine we wish to parallelize, one first needs to make a parallel version of that routine. For large recursive calls, the parallel version does them in parallel. For smaller recursive calls, the parallel version needs to call the sequential version of the code to avoid the Cilk process management overhead. The cutoff for the size of the input for which we need to use the sequential code has to be determined by experiment.

Bibliography

[1]

A. V. Aho, K. Steiglitz, and J. D. Ullman. Evaluating polynomials on a fixed set of points. SIAM Journ. of Comp., 4:533–539, 1975.

[2]

M. Ben-Or and P. Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. In STOC '88: Proceedings of the twentieth annual ACM symposium on Theory of computing, pages 301–309. ACM Press, 1988.

[3]

Leo I. Bluestein. A linear filtering approach to the computation of discrete Fourier transform. IEEE Transactions on Audio and Electroacoustics, 18(4):451–455, 1970.

[4]

A. Bostan, G. Lecerf, and É. Schost. Tellegen's principle into practice. In ISSAC '03: Proceedings of the 2003 International Symposium on Symbolic and Algebraic Computation, pages 37–44. ACM Press, 2003.

[5]

R. P. Brent, F. G. Gustavson, and D. Y. Y. Yun. Fast solution of Toeplitz systems of equations and computation of Padé approximants. J. Algorithms, 1(3):259–295, 1980.

[6]

J. Canny, E. Kaltofen, and Y. Lakshman. Solving systems of non-linear polynomial equations faster. In Proceedings of the ACM-SIGSAM 1989 International Symposium on Symbolic and Algebraic Computation, pages 121–128. ACM Press, 1989.

[7]

D. G. Cantor and H. Zassenhaus. A new algorithm for factoring polynomials over finite fields. Math. Comp., 36(154):587–592, 1981.

[8]

A. Díaz and E. Kaltofen. FOXFOX: a system for manipulating symbolic objects in black box representation. In ISSAC '98: Proceedings of the 1998 International Symposium on Symbolic and Algebraic Computation, pages 30–37. ACM Press, 1998.

[9]

T. S. Freeman, G. M. Imirzian, E. Kaltofen, and Y. Lakshman. DAGWOOD: a system for manipulating polynomials given by straight-line programs. ACM Trans. Math. Software, 14:218–240, 1988.

[10]

M. Frigo, C.E. Leisorson, and R.K. Randall. The implementation of the Cilk-5 multithreaded language. In Proceedings of PLDI 1998, pages 212–223. ACM, 1998.

[11]

J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 3rd edition, 2013.

[12]

B. Grenet, J. van der Hoeven, and G. Lecerf. Randomized root finding over finite fields using tangent Graeffe transforms. In Proc. ISSAC '15, pages 197–204. New York, NY, USA, 2015. ACM.

[13]

B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. AAECC, 27(3):237–257, 2016.

[14]

J. van der Hoeven. The truncated Fourier transform and applications. In J. Gutierrez, editor, Proc. ISSAC 2004, pages 290–296. Univ. of Cantabria, Santander, Spain, July 4–7 2004.

[15]

J. van der Hoeven and G. Lecerf. On the bit-complexity of sparse polynomial and series multiplication. J. Symbolic Comput., 50:227–254, 2013.

[16]

J. van der Hoeven and G. Lecerf. Sparse polynomial interpolation in practice. ACM Commun. Comput. Algebra, 48(3/4):187–191, 2015.

[17]

J. van der Hoeven et al. GNU TeXmacs. https://www.texmacs.org, 1998.

[18]

J. Hu and M. B. Monagan. A fast parallel sparse polynomial GCD algorithm. In S. A. Abramov, E. V. Zima, and X.-S. Gao, editors, Proc. ISSAC '16, pages 271–278. ACM, 2016.

[19]

M. A. Huang and A. J. Rao. Interpolation of sparse multivariate polynomials over large finite fields with applications. In SODA '96: Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms, pages 508–517. Philadelphia, PA, USA, 1996. Society for Industrial and Applied Mathematics.

[20]

M. Javadi and M. Monagan. Parallel sparse polynomial interpolation over finite fields. In M. Moreno Maza and J.-L. Roch, editors, PASCO '10: Proceedings of the 4th International Workshop on Parallel and Symbolic Computation, pages 160–168. ACM Press, 2010.

[21]

E. Kaltofen. Computing with polynomials given by straight-line programs I: greatest common divisors. In STOC '85: Proceedings of the Seventeenth Annual ACM Symposium on Theory of Computing, pages 131–142. ACM Press, 1985.

[22]

E. Kaltofen, Y. N. Lakshman, and J.-M. Wiley. Modular rational sparse multivariate polynomial interpolation. In ISSAC '90: Proceedings of the International Symposium on Symbolic and Algebraic Computation, pages 135–139. New York, NY, USA, 1990. ACM Press.

[23]

E. Kaltofen and B. M. Trager. Computing with polynomials given by black boxes for their evaluations: greatest common divisors, factorization, separation of numerators and denominators. J. Symbolic Comput., 9(3):301–320, 1990.

[24]

E. Kaltofen and L. Yagati. Improved sparse multivariate polynomial interpolation algorithms. In ISSAC '88: Proceedings of the International Symposium on Symbolic and Algebraic Computation, pages 467–474. Springer Verlag, 1988.

[25]

R. Larrieu. The truncated Fourier transform for mixed radices. In Proc. ISSAC '17, pages 261–268. New York, NY, USA, 2017. ACM.

[26]

M. Law and M. Monagan. A parallel implementation for polynomial multiplication modulo a prime. In Proc. of PASCO 2015, pages 78–86. ACM, 2015.

[27]

R. Moenck. Fast computation of GCDs. In Proc. of the 5th ACM Annual Symposium on Theory of Computing, pages 142–171. New York, 1973. ACM Press.

[28]

H. Murao and T. Fujise. Modular algorithm for sparse multivariate polynomial interpolation and its parallel implementation. JSC, 21:377–396, 1996.

[29]

R. Prony. Essai expérimental et analytique sur les lois de la dilatabilité des fluides élastiques et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool, à différentes températures. J. de l'École Polytechnique Floréal et Plairial, an III, 1(cahier 22):24–76, 1795.

[30]

D. S. Roche. What can (and can't) we do with sparse polynomials? In C. Arreche, editor, ISSAC '18: Proceedings of the 2018 ACM International Symposium on Symbolic and Algebraic Computation, pages 25–30. ACM Press, 2018.

[31]

V. Shoup. A new polynomial factorization and its implementation. J. Symbolic Computation, 20(4):363–397, 1995.