
Consider a finite field whose multiplicative
group has smooth cardinality. We study the problem of computing all
roots of a polynomial that splits over ,
which was one of the bottlenecks for fast sparse interpolation in
practice. We revisit and slightly improve existing algorithms and then
present new randomized ones based on the Graeffe transform. We report on
our implementation in the
F.2.1 [ANALYSIS OF ALGORITHMS AND PROBLEM COMPLEXITY]: Numerical Algorithms and Problems–Computations in finite fields; G.4 [MATHEMATICAL SOFTWARE]: Algorithm design and analysis
Algorithms, Theory
Finite fields, polynomial root finding, algorithm, Mathemagix
Let represent the finite field with elements, where is a prime number, and . Throughout this article, such a field is supposed to be described as a quotient of by a monic irreducible polynomial. Let represent a separable monic polynomial of degree which splits over , which means that all its irreducible factors have degree one and multiplicity one. In this article we are interested in computing all the roots of .
One of our interests in root finding came from the recent design of efficient algorithms to interpolate, into the standard monomial basis, polynomials that are given through evaluation functions. This task is briefly called sparse interpolation, and root finding often turns out to be a bottleneck, as reported in [19]. In fact, in this case, the ground field can be chosen to be with , and where is taken to be much larger than the number of terms to be discovered. In practice, to minimize the size of , so that it fits a machine register, we take as small as possible. A typical example is . We informally refer to such primes as FFT primes.
Root finding over prime finite fields critically occurs during the computation of integer and rational roots of polynomials in , both for dense and lacunary representations. Yet other applications concern cryptography and error correcting codes. Nevertheless, practical root finding has received only moderate attention so far, existing algorithms with good average complexity bounds often being sufficient [20, 21].
In this article, we focus on fast probabilistic root finding algorithms,
targeting primarily FFT prime fields and, more generally, finite fields
whose multiplicative group has smooth cardinality. At a second stage, we
report on practical efficiency of our new algorithms within the
The multiplicative group of is written . In order to simplify the presentation of complexity bounds, we use the softOh notation: means that (we refer the reader to [11, Chapter 25, Section 7] for technical details). The least integer larger or equal to is written . The largest integer smaller or equal to is written . The remainder of in the division by is denoted by .
We write for a function that bounds the total cost of a polynomial product algorithm in terms of the number of ring operations performed independently of the coefficient ring, assuming a unity is available. In other words, two polynomials of degrees at most over such a ring can be multiplied with at most arithmetic operations in . The schoolbook algorithm allows us to take . On the other hand the fastest known algorithm, due to Cantor and Kaltofen [7], provides us with . In order to simplify the cost analysis of our algorithms we make the customary assumption that for all . Notice that this assumption implies the superadditivity of , namely for all and .
For operations in and in finite fields, we are interested in the Turing machine model, with a sufficiently large number of tapes. In short, we use the terms bitcost and bitcomplexity to refer to this model whenever the context is not clear. For randomized algorithms, we endow Turing machines with an additional instruction which generates random bits with a uniform distribution [28].
We write for a function that bounds the bitcost of an algorithm which multiplies two integers of bitsizes at most , viewed in classical binary representation. Recently, the best bound for has been improved to , where represents the iterated logarithm function [16]. Again, we make the customary assumption that for all . We freely use the following classical facts: ring operations in cost and one division or inversion in costs [11, Chapter 11].
For polynomial operations over , we let represent a function that bounds the bitcost of an algorithm that multiplies two polynomials of degrees at most , with the same kind of assumptions as for and . According to [17], we may take . The ring operations in cost at most , and inversions take at most . For convenience, and will respectively denote cost functions for the product and the inverse in .
Let us recall that the gcd of two polynomials of degrees at most over can be computed in time : One can for instance use pseudoremainders in [11, Algorithm 11.4], mutatis mutandis. Given monic polynomials and with and , all the remainders can be computed in time using a subproduct tree [11, Chapter 10]. The inverse problem, called Chinese remaindering, can be solved within a similar cost .
In this article, when needed, we consider that the factorization of and a primitive element of have been precomputed once, and we discard the necessary underlying costs. In practice, if the factorization of is given, then it is straightforward to verify whether a given element is primitive. For known complexity bounds and historical details on these tasks, we refer the reader to [1, 26, 31].
Seminal algorithms for polynomial factorization over finite fields are
classically attributed to Berlekamp [2, 3],
and Cantor and Zassenhaus [8], but central earlier ideas
can be found in works of Gauss, Galois, Arwins, Faddeev and Skopin.
Cantor–Zassenhaus' algorithm is randomized and well suited to
compute roots of polynomials of degree that
split over in average time . Of course, if then an
exhaustive search can be naively performed in time
(the factor can be discarded if a primitive
element of is given, by means of [6,
Proposition 3]), so that the cost of root finding simplifies to . This classical approach is for
instance implemented in the
In Section 2, for the sake of comparison, we first revisit Cantor–Zassenhaus' approach and propose a practical trick to slightly speed it up. We also briefly recall the complexity bound of Mignotte–Schnorr's algorithm [24].
We then design and analyze fast randomized algorithms based on tangent
Graeffe transforms, leading to an important speedup for FFT primes. The
practical efficiency of the new algorithms is discussed on the basis of
implementations in the
Let us finally mention that our present randomized algorithms admit deterministic counterparts which are studied in [14].
In this section we revisit efficient root finding algorithms previously described in the literature, and slightly improve their complexity analysis. The first stream, followed by Berlekamp [3], Cantor and Zassenhaus [8], consists in splitting the input polynomial by computing suitable modular exponentiations and gcds. The second stream, followed by Moenck [25], and Mignotte and Schnorr [24], takes advantage of the smoothness of .
In this subsection we suppose that has the form , with . The following randomized algorithm extends Cantor–Zassenhaus' one, which corresponds to , when is odd. We need a primitive root of unity of order .
Randomized algorithm
The roots of .
If then compute the roots of by calling a fallback root finding algorithm.
Pick up at random of degree at most .
Compute , and set .
For all from to , compute , make it monic, and replace by .
Recursively call the algorithm with those of which are not constant, and return the union of the sets of their roots.
In the classical case when , then we recall that step 3 costs and step 4 takes plus one inversion in . Since the average depth of the recursion is in [9], the total average cost amounts to .
For the case of arbitrary , we propose an informal discussion, so as to justify the interest in the algorithm. If is a root of , then has order dividing , hence is a power of , or zero with a very small probability. Therefore we have . Since is taken at random, we might expect that the values of are distributed uniformly among the powers of (we discard cases when ). In other words, the depth of the recursive calls is expected to be in .
If we further assume that , then step 3 takes approximately for a certain constant , and step 4 costs . Discarding the cost of the inversions, and compared to the case , we achieve an approximate speedup of
Whenever , this speedup is of order . In general, the speedup is maximal if .
Assuming given a primitive element of , Mignotte and Schnorr proposed a general deterministic root finding algorithm in [24], that is efficient when is smooth. For a fair comparison with our new algorithm presented in the next section (and since their article is written in French) we recall their method in a different and concise manner.
Let be integers , such that , and let and . For instance, if the irreducible factorization of is known, then we may take and , , …, and set . In order to split into factors of degrees at most , we may use the following algorithm.
Algorithm
in such that , and the are monic, separable, and divide , for all .
Let , and
compute , for all .
Initialize with the list .
For from down to do
Compute for all pairs in using a subproduct tree.
Initialize with the empty list, and for all from to do
For each pair in do
Compute .
If is not constant, then make it monic and append to .
Let .
Return .
As to the complexity, we can compute the for in time as follows: we first compute , and , and then deduce each by multiplying and .
Step 1 requires time . In step 3.a, since the sum of the degrees of the polynomials in is at most , and since these polynomials are monic, this step can be done in time . The cost of the gcds in step 3.b is by the superadditivity of . We compute all the in time , and then all the and by means of operations in . Deducing all the takes additional products in .
Since the cardinality of cannot exceed , the total number of proper factors in step 3.b is at most . Therefore, we need inversions in order to make all the polynomials in monic.
Notice that the dependence on is good when is smooth. For example, if and , then the cost of root finding via the latter proposition drops to . This is higher than the average bitcost of Cantor–Zassenhaus' algorithm. Nevertheless since the term corresponds to products in , with a small constant hidden in the , Mignotte–Schnorr's algorithm is competitive for small values of .
In the original work of Mignotte and Schnorr [24], the cost of the algorithm was estimated to operations in , if . Our presentation slightly differs by the use of a subproduct tree in step 3.a.
Let us briefly mention that a better bound for splitting was achieved in [10]. Nevertheless, for finding all the roots, the method of [10] does not seem competitive in general.
Moenck's algorithm [25] deals with the special case when . Let be a primitive root of . Let , let be a polynomial whose roots have orders dividing , and let be one of these roots. Then either this order divides , or we have , since . In the latter case, we obtain . The polynomials and have all their roots of order dividing . In this way, the roots of can be computed inductively starting from down to . At the end, we are reduced to finding roots of several polynomials whose orders divide . If is small, then an exhaustive search easily completes the computations. Otherwise we may use a fallback algorithm. Moenck's algorithm summarizes as follows.
Algorithm
in such that, the are monic, separable, and the roots of have orders dividing .
Compute for all .
Initialize with the list .
For from down to do
Compute the remainders of for all triples in using a subproduct tree.
Initialize as an empty list.
For all in do
Compute . If is not constant then make it monic and append to .
Compute . If is not constant then make it monic, and append to .
Replace by .
Return .
Our presentation differs from [25]. We also introduced step 3.a, which yields a sensible theoretical speedup, similarly to Mignotte–Schnorr's algorithm. In fact Moenck's and Mignotte–Schnorr's algorithms are quite similar from the point of view of the successive splittings of , and the intermediate operations. We thus leave out the details on correctness and cost analysis. As an advantage, the logarithms of the successive roots are not needed. Nevertheless computing all the powers of in steps 3.c amounts to a bitcost .
One advantage of Cantor–Zassenhaus' algorithm is its average depth in the recursive calls in . This is to be compared to the iterations of Algorithms 2 and 4. In this section we propose a new kind of root finder based on the tangent Graeffe transform, which takes advantage of a FFT prime field, with an average cost not exceeding the one of Cantor–Zassenhaus.
Classically, the Graeffe transform of a polynomial of degree is the unique polynomial satisfying . If , then . This construction can be extended to higher orders as follows: the generalized Graeffe transform of of order , written , is defined as the resultant
If , then . Equivalently, is the characteristic polynomial of multiplication by in (up to the sign). For our root finding algorithms, the most important case is when is smooth and the order of the generalized Graeffe transform divides .
Let us mention that good complexity bounds for Graeffe transforms for general finite fields can be found in [14].
Introducing a formal parameter with , we define the generalized tangent Graeffe transform of of order as being . For any ring , computations with “tangent numbers” in can be done with constant overhead with respect to computations in (in the FFT model, the overhead is asymptotically limited to a factor of two). Whenever a Graeffe transform preserves the number of distinct simple roots, the tangent Graeffe transform can be used to directly recover the original simple roots from the transformed polynomial, as follows:
In the context of the Graeffe method, the tangent transform is classical (for instance, see [23, 27] for history, references, and use in numerical algorithms). The generalized tangent Graeffe transform can also be seen as the tangent characteristic polynomial of modulo , and this construction is often attributed to Kronecker in algebraic geometry [22].
Let us write . Let , and let , …, be the Graeffe transforms of orders , , …, of . The roots of are to be found among the elements of order . One may then find the roots of among the th roots of the roots of , and, by induction, the roots of are to be found among the th roots of the roots of . This is the starting point of the efficient deterministic algorithm designed in [14]. But in order to make it much more efficient than Cantor–Zassenhaus's algorithm we introduce and analyze two types of randomizations in the next sections. First we avoid computing th roots by combining random variable shifts and tangent transforms. Second we analyze the behaviour for random polynomials, which leads us to a very efficient heuristic algorithm.
Our randomized algorithms can be competitive to Cantor–Zassenhaus' algorithm only for very smooth values of . For simplicity we focus on the important case of an FFT field where , with .
We introduce the parameter . Let be a divisor of , and let . Let denote the roots of . The Graeffe transform of of order equals .
Given a pair of distinct roots and of , we have for all but at most values of . Therefore for all but at most values of , the polynomial has no multiple root. Considering that is taken uniformly at random in , the probability that has multiple roots is at most . This yields the following randomized algorithm.
Randomized algorithm
The roots of .
If then set to . Otherwise set to the largest integer in such that . Compute and .
Pick uniformly at random, and compute , where .
Recursively compute the Graeffe transform of order of , for all .
Compute the list of logarithms of the roots of .
For from down to do
Replace by the concatenation of for .
If has cardinality more than then remove the elements from such that .
If then return .
Compute .
Compute .
Add to if .
Compute .
Compute recursively the roots of .
Return .
Step 3 takes by Proposition 5. Steps 2, 5.a, 5.b, 6, 7, 8, 9 execute in time . Step 4 costs . If then the number of iterations in step 5 is . Otherwise the number of iterations is . Consequently, the total cost of all steps but step 10 is . From the choice of , we have already seen that the degree of equals with probability at least . Writing for the average execution time, we thus have , which implies .
Since the base field supports FFTs it is important to perform all intermediate computations by taking advantage of sharing transforms. This technique was popularized within the NTL library for many algorithms. In addition we notice that the first few iterates of the loop of step 5 can be reduced to direct FFT computations instead of generic multipoint evaluations.
Taking a monic random polynomial of degree which splits over , for a uniform distribution, is equivalent to taking its roots at random in . Let represent the roots of . We examine the behavior of tangent Graeffe transforms of such random polynomials.
Let be the number of monic polynomials of degree which split over and have distinct nonzero roots of order dividing . Such a polynomial uniquely corresponds to the choice of distinct values among , namely the roots, and then of the multiplicities of these roots with sum . Therefore we have . One can check that , which corresponds to choosing a multiplicity in for each element of order dividing , under the constraint that the sum of the multiplicities is exactly . The average number of roots of such polynomials is
The latter formulas are direct consequences of the classical Chu–Vandermonde identity. Assuming that , the average number of distinct roots is at least . When is much greater than then this lower bound is getting close to .
Moreover, assuming that the probability that has at least simple roots is at least .
Setting and assuming that , so that , let be the probability that has less than simple roots. Then the average number of roots is at most , whence . We conclude that .
Let be a divisor of , and consider a polynomial of degree . In Section 3.4, we have shown that for all but at most values of , the Graeffe transform of order of has no multiple root. Taking , this implies that has no multiple roots with probability at least , when picking at random. Now Proposition 10 implies that at least one third of the roots remain simple with probability at least under the weaker condition that and for random . It is an interesting question whether this fact actually holds for any if is randomly chosen:
From now on, assume that is picked at random or that the above heuristic holds and is chosen at random. Taking , we may then find the roots of by computing a few discrete Fourier transforms of length , instead of more expensive multipoint evaluations. Using tangent Graeffe transforms, the simple roots can be lifted back for almost no cost. These considerations lead to the following efficient randomized algorithm.
Randomized algorithm
The roots of .
If then evaluate on all the elements of , and return the roots.
Otherwise set to be the largest integer in such that . Compute and .
Pick a random in , and compute the Graeffe transform of order of with .
Compute , , .
Compute . Add to if .
Compute .
Compute recursively the roots of .
Return .
The cost of step 3 is bounded by (and even by if according to [4, Chapter 1, Section 2]). When computing using tangent Graeffe transforms of order , the cost of step 3 is bounded by . Steps 4 and 5 can be done in time . The average execution time thus satisfies
where is the probability that contains more than elements in step 7. It follows that , whence .
The main bottleneck for Algorithm 11 is step 3. It would be possible to improve our complexity bound if a better algorithm were known to compute Graeffe transforms of order . In the FFT model, the tangent Graeffe transform of a polynomial of degree can be computed in time . In practice, this means that the asymptotic complexity of Algorithm 11 is equivalent to .
In order to reduce the cost of the Graeffe transforms, it is relevant to choose such that for some small , which is to be determined in practice as a function of and . In this way more time is spent in multipoint evaluations. For the multipoint evaluations in step 4, one may use FFTs of size for small values of , or appeal to Bluestein's chirp transform [5].
In this section we report on timings for the algorithms presented above.
We use a platform equipped with an
We use revision 9738 of
Table 1 shows timings for the FFT prime field with . The row
NTL corresponds to the function FindRoots, which
contains an optimization with respect to Algorithm 1 with
: in fact, the polynomials
in step are taken of
degree (see [30] for details). We
implemented the same optimization in our versions of
Cantor–Zassenhaus in
Concerning the deterministic algorithms, let us precise that we take and in Algorithm 2. In addition, in our implementation of Algorithm 4, when and are small, for efficiency reasons, we compute using binary powering instead of simultaneous remainders. These deterministic methods turn out to be quite competitive.
Our Table 2 is similar to Table 1 for the FFT prime field with . Nevertheless NTL does not support this larger prime within its type zz_p. This larger prime starts to reveal the interest in our modified version of Cantor–Zassenhaus with . The speedup measured with the modified version even more increases with larger primes: With , it reaches a factor 1.6 in size .
Using Graeffe transforms as in Algorithm 8 is not very competitive. However, in final, it is satisfactory to observe that our Algorithm 11, exploiting the tangent transform, gains by an order of magnitude over other existing methods in both tables.




Coming back to our initial motivation for sparse polynomial interpolation, let us mention that plugging Algorithm 11 into our implementations reported in [19] leads to quite good speedups. The total time for Example 1 of [19], concerning the product of random sparse polynomials, decreases from 12 s to 7.6 s with the “coefficient ratios” technique. In fact, the time spent in root finding decreases from 50 % to 8 % during the sole determination of the support.
Example 2 of [19] concerns the sparse interpolation of the determinant of the generic matrix. For , the time for the “Kronecker substitution” technique with a prime of 68 bits decreases from 95 s to 43 s. The time elapsed in root finding during the sole determination of the support drops from 70 % to 20 %. Thanks to additional optimizations, the “coefficient ratios” technique now runs this example in s.
Bruno Grenet was partially supported by a LIX–Qualcomm–Carnot postdoctoral fellowship.
E. Bach. Comments on search procedures for primitive roots. Math. Comp., 66:1719–1727, 1997.
E. R. Berlekamp. Factoring polynomials over finite fields. Bell System Tech. J., 46:1853–1859, 1967.
E. R. Berlekamp. Factoring polynomials over large finite fields. Math. Comp., 24:713–735, 1970.
D. Bini and V. Y. Pan. Polynomial and matrix computations. Vol. 1. Fundamental algorithms. Progress in Theoretical Computer Science. Birkhäuser, 1994.
L. I. Bluestein. A linear filtering approach to the computation of discrete Fourier transform. IEEE Transactions on Audio and Electroacoustics, 18(4):451–455, 1970.
A. Bostan and É. Schost. Polynomial evaluation and interpolation on special sets of points. J. Complexity, 21(4):420–446, 2005.
D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Infor., 28(7):693–701, 1991.
D. G. Cantor and H. Zassenhaus. A new algorithm for factoring polynomials over finite fields. Math. Comp., 36(154):587–592, 1981.
Ph. Flajolet and J.M. Steyaert. A branching process arising in dynamic hashing, trie searching and polynomial factorization. In M. Nielsen and E. M. Schmidt, editors, Automata, Languages and Programming. Proceedings of the 9th ICALP Symposium, volume 140 of Lecture Notes in Comput. Sci., pages 239–251. Springer Berlin Heidelberg, 1982.
J. von zur Gathen. Factoring polynomials and primitive elements for special primes. Theoret. Comput. Sci., 52(12):77–89, 1987.
J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, 2nd edition, 2003.
GCC, the GNU Compiler Collection. Software available at http://gcc.gnu.org, from 1987.
T. Granlund et al. GMP, the GNU multiple precision arithmetic library, from 1991. Software available at http://gmplib.org.
B. Grenet, J. van der Hoeven, and G. Lecerf. Deterministic root finding over finite fields using Graeffe transforms. Technical report, École polytechnique, 2015.
W. Hart, F. Johansson, and S. Pancratz. FLINT: Fast Library for Number Theory, 2014. Version 2.4.4, http://flintlib.org.
D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication. http://arxiv.org/abs/1407.3360, 2014.
D. Harvey, J. van der Hoeven, and G. Lecerf. Faster polynomial multiplication over finite fields. http://arxiv.org/abs/1407.3361, 2014.
J. van der Hoeven et al. Mathemagix, from 2002. http://www.mathemagix.org.
J. van der Hoeven and G. Lecerf. Sparse polynomial interpolation in practice. ACM Commun. Comput. Algebra, 48(4), 2014. In section "ISSAC 2014 Software Presentations".
E. Kaltofen and V. Shoup. Subquadratictime factoring of polynomials over finite fields. Math. Comp., 67(223):1179–1197, 1998.
K. S. Kedlaya and C. Umans. Fast modular composition in any characteristic. In A. Z. Broder et al., editors, 49th Annual IEEE Symposium on Foundations of Computer Science 2008 (FOCS '08), pages 146–155. IEEE, 2008.
L. Kronecker. Grundzüge einer arithmetischen Theorie der algebraischen Grössen. J. reine angew. Math., 92:1–122, 1882.
G. Malajovich and J. P. Zubelli. Tangent Graeffe iteration. Numer. Math., 89(4):749–782, 2001.
M. Mignotte and C. Schnorr. Calcul déterministe des racines d'un polynôme dans un corps fini. C. R. Acad. Sci. Paris Sér. I Math., 306(12):467–472, 1988.
R. T. Moenck. On the efficiency of algorithms for polynomial factoring. Math. Comp., 31:235–250, 1977.
G. L. Mullen and D. Panario. Handbook of Finite Fields. Discrete Mathematics and Its Applications. Chapman and Hall/CRC, 2013.
V. Pan. Solving a polynomial equation: Some history and recent progress. SIAM Rev., 39(2):187–220, 1997.
C. H. Papadimitriou. Computational Complexity. AddisonWesley, 1994.
L. Rónyai. Factoring polynomials modulo special primes. Combinatorica, 9(2):199–206, 1989.
V. Shoup. A fast deterministic algorithm for factoring polynomials over finite fields of small characteristic. In S. M. Watt, editor, ISSAC '91: Proceedings of the 1991 International Symposium on Symbolic and Algebraic Computation, pages 14–21. ACM Press, 1991.
V. Shoup. Searching for primitive roots in finite fields. Math. Comp., 58:369–380, 1992.
V. Shoup. NTL: A Library for doing Number Theory, 2014. Software, version 8.0.0. http://www.shoup.net/ntl.