Fast modular composition using spiroids

Fast modular composition using spiroids

Joris van der Hoeven^a, Grégoire Lecerf^b

Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161)

CNRS, École polytechnique, Institut Polytechnique de Paris

Bâtiment Alan Turing, CS35003

1, rue Honoré d'Estienne d'Orves

91120 Palaiseau, France

a. Email: vdhoeven@lix.polytechnique.fr

b. Email: lecerf@lix.polytechnique.fr

Preliminary version of December 5, 2025

. Grégoire Lecerf has been supported by the French ANR-22-CE48-0016 NODE project. Joris van der Hoeven has been supported by an ERC-2023-ADG grant for the ODELIX project (number 101142171).

Funded by the European Union. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.

. This article has been written using GNU TeXmacs [8].

Given three univariate polynomials , , and with coefficients in a prime finite field, we present a new algorithm for computing modulo in time close to linear and with an asymptotic complexity smaller than the one of the Kedlaya–Umans algorithm. As a novelty, our method mostly performs fast floating point Fourier transforms, while previously known ones rely on ad hoc algebraic constructions of finite fields.

1.Introduction

Let be an effective commutative ring, so that we have algorithms for the ring operations. Given a monic polynomial of degree and polynomials and in of degree the computation of the remainder of in the division by , written , is called the problem of modular composition. Modular composition is a central operation in computer algebra, especially for irreducible polynomial factorization; see [12] for instance. It is still unknown whether modular composition can be achieved in time nearly linear in or not for any ground ring .

1.1.Main result

In this paper, we prove the following new complexity bound for when is a finite ring and where is a common abbreviation for .

Theorem 1. The bit cost of degree modular composition over is bounded by

This result improves upon the best previously known bound

given in [11, Theorem 4], which is derived from an algorithm due to Kedlaya and Umans [14]. The novelty of our approach is a reduction of modular composition to the floating point evaluation of a multivariate polynomial at a special set of points, a so-called spiroid. We take advantage of fast Fourier transforms to perform these evaluations. In contrast, previously known methods, in the vein of [14], rely on number theoretic constructions.

1.2.Related work

Let denote a cost function that bounds the number of operations in required to multiply two polynomials of degree in . Over any ring , modular composition can be performed with operations in by applying Horner's rule to evaluate in . In 1978, Brent and Kung [3] gave a faster algorithm with cost , that uses the baby-step giant-step technique [17]. Their algorithm even yielded a sub-quadratic cost when combined with fast linear algebra; see [13, p. 185]. Here, the constant is a real value between 3 and 4 such that the product of a matrix by a matrix takes operations; one may take according to [19]. At present time, the fastest known modular composition method over any ground field is due to Neiger, Salvy, Schost, and Villard [16]: it is probabilistic of Las Vegas type and takes an expected number of

operations in ; see [16, Theorem 1.1]. Here, the constant denotes any real value between and such that two matrices over a commutative ring can be multiplied with ring operations. The current best known value is [19], so we may take .

A major breakthrough for the modular composition problem is due to Kedlaya and Umans [14] in the case where is a finite field (and even more generally a finite ring of the form for any integer and monic). For any fixed real value , they showed that can be computed using bit operations. The dependency in is analyzed in [11]. Recent improvements of the Kedlaya–Umans approach can be found in [1, 2], but the dependency in is not detailed.

In [9, 10] it was shown how the knowledge of factorizations of can be exploited to speed up modular composition. An important special case concerns the composition of power series, which corresponds to taking . Recently, Kinoshita and Li [15] showed how to accomplish this task using operations in .

1.3.Overview of the paper

Our modular composition algorithm will be presented in section 5: Theorem 1 follows from the sharper complexity bound given in Theorem 24. As in [14], the problem will be reduced to the multi-point evaluation of a multivariate polynomial, using Kronecker segmentation. But, instead of relying on finite field arithmetic, we benefit from floating point arithmetic and reduce the multi-point evaluation to a deformation of the fast Fourier transform. The deformed evaluation point set is presented in section 3. Corresponding evaluation and interpolation algorithms are given in section 4. The next section gathers prerequisites.

2.Prerequisites

For complexity analyses, we will consider bit complexity models such as computation trees or RAM machines [4, 5]. The function will bound the bit cost for multiplying two integers of bit size . We will assume that is nondecreasing. At the end, we will make use of the fast product of [6], that yields .

2.1.Fixed point arithmetic

Numbers will be represented in radix , with a finite number of digits. We will represent fixed point numbers by a signed mantissa and a fixed exponent. More precisely, given a precision parameter , we denote by the set of complex numbers of the form , where and . We write for the set of complex numbers of the form , where and ; in particular, for we always have . At every stage of our algorithms, the exponent will be specified, so the exponents do not have to be stored or manipulated explicitly.

In the error analyses of our numerical algorithms, each is really the approximation of some genuine complex number . So each such comes with an implicit error bound ; this is a real number for which we can guarantee that . A truncation of is an approximation of that is rounded towards zero in the sense that . Given , the sum can be computed exactly in time . The product can be computed exactly in time .

Lemma 2. Given and , we can compute an approximation of in with error in time .

Proof. Rounding into can be done as follows: let and be the truncations to the nearest of and into and let ; Then we have .

Lemma 3. Given and , we can compute a truncation of in with error in time .

Proof. Rounding towards zero into can be done as follows: let and be the truncations of and at precision and let ; Then we have and

2.2.Fast Fourier transform

Let be a power of and let be a vector of complex numbers. We write and .

Lemma 4. Given , we can compute truncations of in with error in time .

Proof. We use [7, Proposition 4] at precision in order to obtain first approximations of in with error . We next truncate these results in , which yields an overall error thanks to Lemma 3.

We define the fast Fourier transform of to be

where . In particular, we have .

Lemma 5. Given and , a truncation of in with error can be computed in time .

Proof. Thanks to [18, section 3] an approximation of in can be computed with error in time . Consider an entry of and let be the corresponding approximation. Let , where and . Then , , and . We may clearly compute in time . Thanks to Lemma 3, and using further operations, we may next compute a truncation of with error .

Lemma 6. When , we have .

Proof. If , then . Otherwise, and we have

2.3.Multivariate polynomials

Given integers , we define

where denotes the partial degree of in , for . In this paper, a dense representation will be used for univariate and multivariate polynomials. This means that a polynomial is stored as a vector of size made of the coefficients of the terms of of partial degree in , for . Precisely, when , a polynomial is stored as the vector . When , a polynomial will be regarded as a univariate polynomial in whose coefficients are polynomials in .

Given a polynomial , we define

Given polynomials the sum can be computed exactly in time . The product can be computed exactly in time using Kronecker substitution; see [5, Chapter 8]. In addition we have .

3.Spiroid arithmetic

Given integers , we let and consider a tuple of points

We define and call

the vanishing ideal of . We also define the evaluation map

We refer to the computation of as the problem of multi-point evaluation. The computation of is called the interpolation problem. is a bijection for a Zariski dense subset of tuples . Whenever this is the case, the points are pairwise distinct, so we have a natural bijection

This allows us to use polynomials in as canonical representatives of residue classes in . In particular, given a polynomial , we define to be the unique polynomial in with . For special tuples , called spiroids, we will show in this section that can be computed efficiently.

3.1.Regular spiroids

Assume from now that are powers of two and let be the standard primitive -th root of unity in . For each , we let

We call the tuple a regular spiroid. The vanishing ideal of this tuple of points is generated by the polynomials

(1)

A straightforward computation shows that these generators actually form a Gröbner basis for the grevlex monomial ordering. For instance, for , the -polynomial of and is

which reduces to . In particular, the monomials in are reduced.

Let , , and . For , we have

For convenience, we also set

Given and a family

we say that is an extended reduction of modulo if

Since , we have . We further define

Given we aim at computing an extended reduction of by . As a first step we begin with polynomials in .

Lemma 7. Let . The map

is well defined and one-to-one.

Proof. Since is generated by binomial polynomials (1), the reduction of a monomial by is a monomial. The defining equations (1) of further imply that are invertible in .

The support of a polynomial is the set of its monomials with a non-zero coefficient. We say that the monomials of are pairwise distinct modulo when any two distinct monomials of have distinct projections in .

Lemma 8. Let be such that the monomials of its support are pairwise distinct modulo . Let be defined recursively as follows: , and

(2)

where

for . For , the following properties hold:

,

the monomials of the support of are pairwise distinct modulo ,

the coefficients of are coefficients of .

Letting for and , the family is an extended reduction of .

Proof. The proof is done by induction on . The properties are clear for . Let us assume that they hold for . We decompose into with

We verify that

where

Since the monomials of the support of are pairwise distinct modulo , the supports of and are disjoint. In particular the monomials of the support of are pairwise distinct modulo , and the non-zero coefficients of are coefficients of .

Finally unrolling (2) yields

which constitutes the extended reduction of .

The reduction of a general modulo can be performed efficiently by the following algorithm.

Algorithm 1

Input
Output: An extended reduction of .

For all set

and compute
Let for , and for from down to do:
- For all , set
- Compute .
Let , compute for , and return .

Proposition 9. Algorithm 1 is correct and runs in time .

Proof. For all , we have . Let us examine the particular case where restricts to a single term for some . By Lemma 7, the monomials of the support of are pairwise distinct modulo , so Lemma 8 implies the following properties, for :

,
the coefficients of are coefficients of , hence of ,
is an extended reduction of .

Since the operations in step 2 are linear with respect to the coefficients of , the general case where is a sum of terms of the form satisfies the following properties:

,
,
is an extended reduction of .

On the other hand step 1 yields

So the correctness proof is completed by noting that

and . As for the complexity analysis, step 1 takes time . Step 2 runs in time

The cost of step 3 does not exceed .

Example. Let and . With the notation as in Algorithm 1, the only non-zero is , and we have . It follows that all the are all but , so we have . The extended reduction of is

3.2.General spiroids

We now turn to the situation where is a sufficiently small perturbation of . More precisely, we say that is a spiroid when there exist polynomials

such that

for all , and

with the convention that and . We call and the dimension and the degrees of the spiroid and let

By Lemma 6 we have for all . Consequently,

so the points are pairwise distinct. We let

where is a constant, that will be specified later. We denote by an approximation of with error , where will also be defined later. For convenience we always take .

3.3.Existence of spiroids

We are to show that is a spiroid as soon as is sufficiently small.

Lemma 10. Let and let be such that . Then we have .

Proof. The polynomial has degree

and satisfies for . This means that is the inverse Fourier transform of , that is , following the notation of section 2.2. We deduce that .

The two following technical lemmas will be needed several times in the paper.

Lemma 11. Let and let have non-zero terms. Then we have

Proof. We set for and use the classical inequality

From , we obtain

Lemma 12. With the above notation and , we have for all .

Proof. We verify that

Lemma 13. Assume that , , and let be such that . Given , there exists a unique such that . Moreover, .

Proof. Let and define , where stands for the unique polynomial in such that . From Lemma 10 we know that . Using Lemma 11, we deduce

Using Lemma 12 and , we obtain , hence for . Consequently, is well defined and we have

Finally, we verify that

Proposition 14. Assume that , , and let be such that . Then is a spiroid and we have .

Proof. From Lemmas 11 and 12, and using

we obtain

Thanks to Lemma 13, there exists such that and

whence .

3.4.Reduction by spiroids

In the rest of this section, we fix and assume that is a spiroid. A family

is called an extended reduction of by the spiroid if

Since , we have .

Proposition 15. Assume that . Given , there exists an extended reduction of by with .

Proof. If then we take for all . Otherwise, without loss of generality, we may assume that . For , we construct sequences , of polynomials as follows. For , we set and for . By induction, we assume that these sequences are defined up to some index for some . We let

and apply Proposition 9 to without any rounding, and let be the corresponding extended reduction with . We have

It follows that

Hence,

In particular converges to a limit written . For we have

The proof of Proposition 15 can be turned into an algorithm to compute approximations of extended reductions. For efficiency reasons, is required to be sufficiently small.

Lemma 16. Assume and . Let and

Given , we can compute

such that

in time .

Proof. We adapt the proof of Proposition 15. For , we construct sequences with and as follows.

For , we set and for . By induction, we assume that these sequences have been defined up to for some . We compute

exactly. Note that . We further assume by induction that . Applying Proposition 9 to , let be the obtained extended reduction. Then

Using Lemma 3, we compute a truncation of in with error . We obtain






			(since )

Iterating the latter inequality yields



			(3)

In particular, We are done with the construction of the sequences and . Let be the smallest integer such that

(4)

that is

since . Note that

(5)

We take and verify that


			(using (3))
			(using and (5))
			(6)

Finally, we obtain


			(using (3) and (6))
			(using (4))
			(since )

As for the complexity, one product takes time , and is computed with error . Consequently, the computation of from runs in time

Each use of Proposition 9 contributes . Since , the total cost for computing and up to index is .

The numerical convergence of the method behind Lemma 16 is linear, which is sufficient for small precisions. For higher precisions, our next faster algorithm benefits from the “divide and conquer” paradigm. In order to upper bound the norms of the extended reductions, we introduce the auxiliary notation

Note that is nondecreasing in and that for all .

Algorithm 2

Input

Output

, where ,

Assumption

If then return the extended reduction from Lemma 16.
Compute a truncation of with error using Lemma 3. Apply Algorithm 2 recursively to and let denote the result.
Compute a truncation of

with error . Apply Algorithm 2 recursively to and let denote the result.
Return a truncation of with error , by using Lemma 3.

Proposition 17. Algorithm 2 is correct and runs in time

Proof. If then the output is correct in step 1 by Lemma 16. Otherwise , so that and . Consequently, the algorithm always terminates. In step 2, by induction, we have

We verify that




			(since )
			(since )

so the polynomial is well defined in step 3, and we have

Let be a truncation of with error . On the one hand,





			(since )
			(7)

On the other hand, we obtain




			(since )

which yields

(8)

Combining (7) and (8) we deduce

This concludes the proof of the correctness of the algorithm. Let us now analyze the complexity. By Lemma 16, step 1 contributes

Each product can be computed in time

The total cost to obtain is therefore bounded by

Let be the computation time as a function of . So far, we have shown that

for . Unrolling this inequality leads to

The following corollary is the main result of this section.

Corollary 18. Assume and . Given , an approximation of with error can be computed in time

Proof. Proposition 17 allows us to compute

such that for some . Using Proposition 15, we deduce that

Using Lemma 2, we finally compute an approximation of in with error and verify that

4.Multi-point evaluation and interpolation

This section is devoted to multi-point evaluation and interpolation at a spiroid. Evaluation will follow the classical “divide and conquer” paradigm with respect to the number of evaluation points. Interpolation will reduce to evaluation.

4.1.Evaluation

Let be as in the previous section. If for some , then

Without loss of generality, this allows us to replace by of dimension with for each . In what follows, we will only work with spiroids for which whenever .

Our algorithm for multi-point evaluation of a polynomial at is based on a generalization of the classical remainder tree technique used for the univariate case [5, Chapter 10]. If , then an evaluation tree for consists of a single leaf labeled by . If and , then induces two sequences of points which are defined by

For , we write for the dimension of , for its partial degrees, and for its degree. The perturbations of the defining polynomials of are written , so we have

for all . We further define

Note that , , and , for .

An evaluation tree for the spiroid consists of a root that is labeled by the family and two children which are recursively the evaluation trees for and . Since , if then . So, Proposition 14 ensures that the exists with , whenever . This shows that evaluation trees exist whenever is sufficiently small. The construction of these trees is postponed to the end of this section.

An evaluation tree for is said to be known with error if is given with error , and if the evaluation tree for is known with error for . We will see in section 4.3 that for such an evaluation tree with error , the will be elements of with . Our next algorithm performs fast evaluation at by taking advantage of evaluation trees.

Algorithm 3

Input
Output: An approximation of in with error , where .
Assumption: and .

If then return an approximation of in with error , by using Lemma 2.
Compute a truncation of with error .
Compute truncations and of and in with error via Corollary 18.
Apply Algorithm 3 recursively to and the evaluation sub-trees for in order to obtain approximations of with error , where .
Apply Algorithm 3 recursively to and the evaluation sub-trees for in order to obtain an approximation of with error , where .
Return an approximation of in with error .

Proposition 19. Algorithm 3 is correct and takes time

whenever .

Proof. If then the algorithm behaves as expected. Let us now assume that . The correctness mostly relies on the identities

for . Let us verify that computations are done with sufficiently large precisions. By construction, we have and . Corollary 18 yields , and

Since , Lemma 12 ensures that

(9)

In particular, it follows that . Assuming by induction that the claimed properties hold for the recursive call in step 4, we obtain



			(using (9))
			(since )

Similarly, for we obtain





			(using (9))
			(since )

We are done with the correctness. As for the complexity analysis, the computation of and then of , with bits of precision in step 2 can be done in time by Lemma 4, under the assumption . Therefore, step 2 runs in time . In view of Corollary 18, the complexity of Algorithm 3 satisfies

for some universal constant . Unrolling this inequality, we obtain

Unrolling this new inequality, while using the geometric increase of , we obtain the desired complexity bound.

4.2.Interpolation

Our interpolation algorithms rely on the evaluation one. We begin with a method dedicated to small precisions.

Lemma 20. Given , we can compute such that and in time

Proof. Using Lemma 5, we compute such that

(10)

and in time . For we take the unique polynomial of such that

so we have . Finally (10) yields

Lemma 21. Assume , , and . Given an evaluation tree for with error for , and , we can compute such that

in time .

Proof. Let . By induction, we construct sequences and such that and . We set and . For , we first compute as the polynomial interpolated from by inverse FFT as in Lemma 20, so we have and . Using Algorithm 3, we then compute an approximation of with error

Using Lemma 3, we next compute a truncation of with error . Then



			(11)

Using Lemmas 11 and 12, we obtain


			(using (9))
			(12)

It follows that



			(using (11) and (12))
			(since )

so the sequence converges to . We define and verify that



			(using (11))

			(13)

On the other hand, we have . We take minimal such that

so we have . Let be a truncation of with error . Then


			(using (13))
			(using (9))

Finally, Lemma 20 asserts that each can be computed in time . By Proposition 19, each is obtained in time .

The next faster algorithm exploits the usual “divide and conquer” paradigm for large precisions .

Algorithm 4

Input: with and evaluation trees for with error for .
Output: such that .
Assumption: and .

If then compute such that via Lemma 21. Return an approximation of in with error , by using Lemma 2.
Let and and let be a truncation of in with error . Apply Algorithm 4 recursively to and let denote the output.
Compute an approximation of with error , by applying Algorithm 3 to , where
Let and compute a truncation of with error . Apply Algorithm 4 recursively to and let denote the output.
Return an approximation of with error .

Proposition 22. Algorithm 4 is correct and runs in time

Proof. If , then , whence , which implies the termination of the algorithm. In step 1 we obtain



			(using (9))
			(since and )

so the output is correct when the algorithm exits at step 1. After step 3 we have

			(14)
			(15)

In step 4, we note that

(16)

and deduce



			(using (14) and (15))
			(using (16) and )

Therefore, the recursive call in step 4 is valid and gives

(17)

Combining (15) and (17) leads to

Consequently,



			(using (9))
			(since )

Step 1 takes time

by Lemma 21. Steps 3 runs in time

by Proposition 19. When , the complexity of our algorithm therefore satisfies

for some universal constant . Unrolling this inequality, we obtain

which concludes the proof.

4.3.Computation of the evaluation tree

The evaluation tree for is obtained by induction: we recursively compute evaluation trees for and , and deduce the one for as follows.

Algorithm 5

Input
Output: An evaluation tree for with error .
Assumption: , .

If then return and an approximation of with error .
Let . For , let and compute with error . Recursively compute the evaluation trees for with error .
For each do:
1. Compute an approximation of with error .
2. Apply Algorithm 4 to , , ; let be the returned polynomial, such that .
3. Compute an approximation of with error , where .
Set and return , and approximations of , with errors and .

Proposition 23. Algorithm 5 is correct and runs in time

provided that .

Proof. If the algorithm exits in step 1, then we have , so the output is correct. Now, let us assume that and note that


			(18)

and

(19)

For , we compute as the truncation of with error using Lemma 2. Hence,



			(using (18)

From Lemma 4, we may compute a truncation of with error in time , provided that . We compute an approximation of with error . It follows that


			(using (19))


			(using (18))

Using Lemma 2, we obtain as the truncation of with error . Hence,

For step 3.a, we need to verify that



			(by Lemma 11 and (19))

			(by Lemma 12)




			(since and )

so the use of Algorithm 4 is valid in step 3.b. Thanks to Proposition 22 and using that , we obtain




			(by Lemma 11 and (19))

			(by Lemma 12)

Lemma 13 yields . It follows that

This concludes the correctness proof. As for the complexity we note that whenever . In step 3, can be computed in time , using binary powering. In step 3.b, the interpolation of takes

operations, by Proposition 22. Overall the complexity of our algorithm satisfies

for some universal constant . Unrolling this inequality yields the claimed complexity bound.

5.Modular composition

Consider and , where is a positive integer. Our goal is to compute . We first follow Kedlaya and Umans [14], by transforming the problem into a multivariate one. For some and powers of two with and , there exists a unique lift

such that

The first degree bound will be adjusted below to a larger value that will match the setting of the previous section. We define

so we have

(20)

In order to compute the right-hand side of (20), we further lift the problem into another one with integer coefficients: let and be the canonical lifts of with coefficients in and set

where is a positive integer that will be specified below. Note that are still lifts of , so

We finally rescale the polynomials, so that we can compute using a spiroid:

Note that

and

We will evaluate

numerically using fixed point arithmetic. Setting , we note that

Let be the smallest power of two satisfying , let , and consider the standard primitive -th root of unity . We wish to determine from its values at . For , we let

so we have

(21)

and

Choosing sufficiently large, we will be able to apply Algorithm 3 in order to evaluate at the spiroid . We are now ready to prove the main theorem of this paper, from which Theorem 1 follows.

Theorem 24. The cost of degree modular composition over is bounded by

Proof. Using binary powering, the computation of takes time

Now we take

whence . More precisely, we have

, and . We distinguish the following cases in order to construct the sequence :

If then we take minimal such that , and then and . Note that does exist.
Otherwise, .
- If then we take minimal such that , and then and .
- Otherwise we take .

In this way we achieve and

(22)

whence

(23)

and

(24)

Recall that . Now we take and minimal such that

so, using (24), we have

(25)

With this choice for , the inequality (21) implies

Let be the smallest power of two that satisfies , so we have . We set

From (22) and (25) we obtain that



			(26)

and that

(27)

Since , using Lemma 5 with , we compute a truncation of with error in time


			(using (23) and (26))
			(using (24) and (27))
			(28)

Since , Proposition 23 allows us to compute an evaluation tree for with error in time



			(using (23) and (26))

			(using (24) and (27)) (29)

Following Proposition 19, this bound dominates the cost for obtaining an approximation of with error , where , so we have

hence

(30)

thanks to Lemma 10. Via Lemma 5, we compute such that

(31)

in time , using (28). Finally, we get


			(by (31) and (30))

so rounding the coefficients of to the nearest integers yields . Since



			(using (22) and (25))

we compute in time



			(using (23))

The final division of by over has cost


			(using (23) and (24))

The total cost of the method is dominated by (29).

Bibliography

[1]: V. Bhargava, S. Ghosh, Z. Guo, M. Kumar, and C. Umans. Fast multivariate multipoint evaluation over all finite fields. In 2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS), pages 221–232. New York, NY, USA, 2022. IEEE.
[2]: V. Bhargava, S. Ghosh, M. Kumar, and C. K. Mohapatra. Fast, algebraic multivariate multipoint evaluation in small characteristic and applications. J. ACM, 70(6):1–46, 2023. Article No. 42.
[3]: R. P. Brent and H. T. Kung. Fast algorithms for manipulating formal power series. J. ACM, 25(4):581–595, 1978.
[4]: P. Bürgisser, M. Clausen, and M. A. Shokrollahi. Algebraic Complexity Theory, volume 315 of Grundlehren der Mathematischen Wissenschaften. Springer-Verlag, 1997.
[5]: J. von zur Gathen and J. Gerhard. Modern computer algebra. Cambridge University Press, New York, 3rd edition, 2013.
[6]: D. Harvey and J. van der Hoeven. Integer multiplication in time . Ann. Math., 193(2):563–617, 2021.
[7]: D. Harvey, J. van der Hoeven, and G. Lecerf. Even faster integer multiplication. J. Complexity, 36:1–30, 2016.
[8]: J. van der Hoeven. The Jolly Writer. Your Guide to GNU TeXmacs. Scypress, 2020.
[9]: J. van der Hoeven and G. Lecerf. Composition modulo powers of polynomials. In Proceedings of the 2017 ACM on International Symposium on Symbolic and Algebraic Computation, ISSAC '17, pages 445–452. New York, NY, USA, 2017. ACM.
[10]: J. van der Hoeven and G. Lecerf. Modular composition via factorization. J. Complexity, 48:36–68, 2018.
[11]: J. van der Hoeven and G. Lecerf. Fast multivariate multi-point evaluation revisited. J. Complexity, 56:101405, 2020.
[12]: J. van der Hoeven and G. Lecerf. Univariate polynomial factorization over finite fields with large extension degree. Appl. Algebra Eng. Commun. Comput., 35:121–149, 2024.
[13]: E. Kaltofen and V. Shoup. Fast polynomial factorization over high algebraic extensions of finite fields. In Proceedings of the 1997 International Symposium on Symbolic and Algebraic Computation, ISSAC '97, pages 184–188. New York, NY, USA, 1997. ACM.
[14]: K. S. Kedlaya and C. Umans. Fast polynomial factorization and modular composition. SIAM J. Comput., 40(6):1767–1802, 2011.
[15]: Y. Kinoshita and B. Li. Power series composition in near-linear time. In 2024 IEEE 65th Annual Symposium on Foundations of Computer Science (FOCS), pages 2180–2185. IEEE, 2024.
[16]: V. Neiger, B. Salvy, É. Schost, and G. Villard. Faster modular composition. J. ACM, 71(2):1–79, 2023. Article No. 11.
[17]: M. S. Paterson and L. J. Stockmeyer. On the number of nonscalar multiplications necessary to evaluate polynomials. SIAM J. Comput., 2(1):60–66, 1973.
[18]: A. Schönhage. Asymptotically fast algorithms for the numerical muitiplication and division of polynomials with complex coefficients. In J. Calmet, editor, Computer Algebra. EUROCAM '82, European Computer Algebra Conference, Marseilles, France, April 5-7, 1982, volume 144 of Lect. Notes Comput. Sci., pages 3–15. Berlin, Heidelberg, 1982. Springer-Verlag.
[19]: V. V. Williams, Y. Xu, Z. Xu, and R. Zhou. New bounds for matrix multiplication: from alpha to omega. In D. P. Woodruff, editor, Proceedings of the 2024 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 3792–3835. Philadelphia, PA 19104 USA, 2024. SIAM.