Optimizing the half-gcd algorithm

Joris van der Hoeven

CNRS, École polytechnique, Institut Polytechnique de Paris

Laboratoire d'informatique de l'École polytechnique (LIX, UMR 7161)

1, rue Honoré d'Estienne d'Orves

Bâtiment Alan Turing, CS35003

91120 Palaiseau, France

Email: vdhoeven@lix.polytechnique.fr

November 3, 2023

. This article has been written using GNU TeXmacs [18].

In this paper, we propose a carefully optimized “half-gcd” algorithm for polynomials. We achieve a constant speed-up with respect to previous work for the asymptotic time complexity. We also discuss special optimizations that are possible when polynomial multiplication is done using radix two FFTs.

1.Introduction

The computation of greatest common divisors is the key operation to be optimized when implementing a package for rational number arithmetic. For integers of small bit-length , one may use Euclid's algorithm, which has a quadratic time complexity . Asymptotically faster algorithms were first proposed by Knuth [20] and Schönhage [29], based on earlier ideas by Lehmer [22]. Schönhage's algorithm has a logarithmic overhead with respect to integer multiplication, which is believed to be asymptotically optimal. Many variants and improvements have been proposed since, mainly aiming at faster practical implementations [36, 32, 23, 26, 4, 27]. All these subquadratic algorithms are based on a recursive reduction to half of the precision; for this reason it is convenient to regroup them under the name “half-gcd algorithms”.

An analogous story can be told about the history of polynomial gcd computations. The polynomial counterpart of Euclid's algorithm was first described by Stevin [33, p. 241]. The first algorithms for polynomial half-gcds are due to Moenck [25] and Brent–Gustavson–Yun [5]; several variants have been developed since [35, 36, 4].

We refer to [10, chapter 11] for a gentle exposition of the polynomial half-gcd algorithm. This exposition also contains a careful analysis of the constant factor in the asymptotic time complexity. More precisely, let be the complexity to multiply two polynomials of degree over an abstract effective field . Then the gcd of two polynomials of degree can be computed using operations in . This complexity further drops to if the corresponding Euclidean remainder sequence is “normal” (all quotients in the ordinary Euclidean algorithm are of degree one). The authors declare that they made no particular efforts to optimize these constants and ; in [10, Research problem 11.11], they ask the question how far these constants can be lowered. This main goal of the present paper is to make progress on this question.

One major motivation for optimizing constant factors in time complexity bounds is the development of faster implementations. Now practical algorithms for multiplying large polynomials usually rely on fast Fourier techniques. We will also investigate optimizations that are only possible when multiplications are done in this way. Indeed, this so-called FFT model allows for various specific optimizations that typically help to reduce constant factors. We refer to [3] for a nice survey of such tricks and mention NTL [31] as a software library that was early to exploit them in a systematic way.

A particularly favorable case is when the ground field has primitive -th roots of unity for all sufficiently large . We will call this the “binary FFT model”. A good example is the finite field for a prime of the form , where is large. Such finite fields arise naturally after modular reduction, provided that can be chosen freely. The binary FFT model allows for additional tricks and is particularly useful for dichotomic algorithms such as the half-gcd. We refer to section 2 for those results that will be required in this paper. Note that large polynomial multiplications over general finite fields are fastest using FFT-based techniques, although one may not necessarily use radix two FFTs; see [14, 15] and references therein for the fastest current algorithms.

For convenience of the reader, we will first describe our new half gcd algorithms in the normal case (see section 4). This will allow us to avoid various technical complications and focus on the new optimizations. The first idea behind our new half gcd is to make the update step after the first recursive call particularly efficient, by using a matrix variant of the middle product algorithm [12]. This leads to an algorithm of time complexity in general and in the FFT model (Proposition 6). The second idea is to combine this with known optimizations for the (binary) FFT model, such as FFT doubling, FFT caching, and Harvey's variant of the middle product [13]. In the binary FFT model, this allows us to compute half gcds in time (Proposition 7 and Corollary 9).

When dropping the normality assumption, the algorithm becomes more technical, but it turns out that the middle product trick can still be generalized. This is explained in section 5 and leads to the bound for general gcds and in the FFT model (Theorem 15 ). In the binary FFT model, special efforts are needed in order to enforce the use of DFTs of power of two lengths. We prove the bound in this last case (Theorem 17 ). See Table 1 for a summary of the new constant factors

MCA [10] General FFT model Binary FFT model
Normal case
General case

Table 1. Summary of the constant factors in various cases.

.

It is well known that polynomial gcds have many applications: fast arithmetic on rational functions, fast reconstruction of rational functions [10, section 5.7], polynomial factorization [7], fast decoding of error-correcting codes [2, 24, 9], sparse interpolation [1, 28], etc. Personally, we were mainly motivated by the last application to sparse interpolation. After the introduction of the tangent Graeffe algorithm [11, 17], gcd computations have often become one of the main practical bottlenecks. For this application, it is typically possible to work modulo a prime of the form , which allows us to exploit our optimizations for the binary FFT model.

We made a first implementation of the new algorithm and also programmed the customary extension to the computation of subresultants (see [10, section 11.2] and also [21] for this extension). Work is in progress on HPC implementations of our algorithms and all required subalgorithms.

A few questions remain for future work. It is likely that the ideas in this paper can be applied to integer gcds as well, except for the optimizations that are specific to the binary FFT model. It would also be nice to have counterparts for the “ternary FFT model”, which could be used for fields of characteristic two (see [17, section 3.4] for an algorithm of this kind in the case of iterated Graeffe transforms). Finally, in view of Bernstein and Yang's recent ideas from [4], as well as Remark 16, one may wonder whether the bounds for the normal case can actually be extended to the general case.

Acknowledgments. We are grateful to Michael Monagan for triggering this study and suggesting some early ideas.

2.Preliminaries

2.1.The binary FFT model

The best bounds in this paper will be proved for the so-called “binary FFT model”, which requires special assumptions on the ground field . First of all, this model requires to be invertible in . Secondly, for any polynomial that occurs during computations, we assume that has a primitive -th root of unity with . For convenience, we also assume that whenever divides . We measure computational complexity in terms of the number of required field

1. Using an easy refinement of the analysis, it turns out that the main half gcd algorithms in this paper involve only a linear number of divisions, so the bulk of the operations are actually ring operations in .

1 operations in . Given with , we write and for the quotient and the remainder of the Euclidean division of by , respectively.

Let denote the space of polynomials of degree . Given with , we define its discrete Fourier transform by

We will write for the maximum cost of computing a discrete Fourier transform of order or the inverse transformation . It is well known [8] that . In what follows, we will always assume that , and are non-decreasing (for and , respectively). Therefore, we may just as well take , but it is instructive to keep the notation to indicate where we use Fourier transforms.

If and the discrete Fourier transform of at order with respect to is known, then it costs to compute . Indeed, we already know , so it remains to compute . But this is the DFT of at order and it takes a linear number of operations to compute in terms of . We call this trick FFT doubling.

Given with , we may use the discrete Fourier transform to compute the product using

(1)

This is called FFT multiplication. If we fix of degree , then we may consider the multiplication map with as a linear map . We have

where stands for componentwise multiplication by in and for the injection of into .

Given of degree and , we define their middle product by

(2)

Let us recall how can be computed efficiently using FFT multiplication.

Let be of degree , let and . If , then we observe that

The matrix in the middle has rows and columns and we may think of it as the matrix of the map . If , then we note that

In other words , where and stands for the transpose of . Since

where , it follows that

Taking DFTs with respect to instead of and using that , we also obtain

(3)

This is the alternative formula from [13] that we will use.

From the complexity perspective, let be the cost to multiply two polynomials in . Using FFT multiplication, we see that in the binary FFT model. More generally, if is even, then any two polynomials with can be multiplied in time using this method. Similarly, the cost to compute a middle product (2) with and satisfies . If is even and is general, then can be computed in time .

It is important to note that the above results all generalize to the case when the coefficient field is replaced by the algebra of matrices with entries in . For instance, assume that with . Then (1) allows us to compute using DFTs, inverse DFTs, and multiplications of matrices in , for a total cost of .

2.2.Other models

Although the main focus in this paper is on the binary FFT model, we will also consider other types of polynomial multiplication. In that case, we will still denote by and the complexities of multiplying two polynomials in and to compute the middle product for of degree and . Moreover, we make the following assumptions on these cost functions:

We also make the same assumptions for the analogue cost functions and when taking coefficients in instead of . In the binary FFT model we have seen that we may take and to be of the form for suitable constants , after which the above assumptions are satisfied (if is not a power of two, then one may use the truncated Fourier transform [16], for instance). They are also verified for all other commonly used types of multiplication, such as Karatsuba's and Toom's methods [19, 37], or FFT-based methods for arbitrary radices [30, 6, 15].

In addition, we define to be a constant such that . We have seen that we may take in the binary FFT model; this generalizes to arbitrary FFT models. Using Strassen's method for matrix multiplication [34], we may always take . From a practical point of view, we usually have .

Let us examine the constant a bit more closely in the case of Karatsuba multiplication. The complexity of this method satisfies

for certain constants and , which yields . Similarly, when using Karatsuba's method to multiply matrices, the complexity satisfies

which leads to and

This analysis can be generalized to general lengths and to the case when we only use Karatsuba's method for degrees above a certain threshold. In the latter case, it is usually favorable to choose the threshold over to be slightly lower than the threshold over .

3.Euclidean remainder sequences

3.1.Definition and basic properties

Let be such that . The Euclidean remainder sequence is defined by

The length of the sequence is the smallest index for which . For , we set

We also define the sequence of Bezout matrices by

so that

for . We have .

We regard as a matrix polynomial in and say that and are normal if for all . This is the case if and only if and for all . For , we also define

so that

(We understand that if .) In particular,

so an extended gcd computation essentially boils down to the computation of the matrix product . In the case of a normal remainder sequence, this is done most efficiently using binary splitting:

In essence, this is also how the half-gcd algorithm works.

Lemma 1. For any , we have

In particular, if is normal, then and .

Proof. Let us show by induction on that

for any . This is clear if , so assume that and let . Then

so the induction hypothesis yields

whence . In a similar way, the induction hypothesis yields for all .

As to the second relation, we note that

whence

Example 2. Taking and , we obtain the following normal remainder sequence:

3.2.Re-indexation of irnormal Euclidean remainder sequences

In the irnormal case, it is convenient to work with an alternative indexation of remainder sequences for which , as in the normal case. Note that we will not consider abnormal remainder sequences until section 5 below, so the reader may safely skip this subsection until there.

Let us now explain our reindexation in detail. For any , let

We also take

Then we set

Moreover, for any , we define

For , we also define

so that we still have

By construction, for , we have

As before, we will sometimes write instead of in order to emphasize the dependence on and , and similarly for , etc. Occasionally, when is not clear from the context, we also write , , etc.

Example 3. Taking and , we have

After reindexation , , , , and , we obtain

4.The normal case

4.1.Statement of the non-optimized algorithm

Let be as in the previous section with remainder sequence . We will write for the corresponding -th Bezout matrix in case we wish to make the dependency on and clear. Similarly, we define and . Given a polynomial and indices , it will also be convenient to define

Here we understand that whenever .

Lemma 4. Given such that , we have

Proof. We have for all . For , the relation

thus shows that the coefficient of degree in only depends on coefficients and of and with . In particular,

and

since and . This shows that

By induction on , we also obtain . We conclude by taking .

The lemma leads to the following recursive algorithm for the computation of :

Algorithm 1

Input: and with for all

Output:

  1. If , then return

  2. Let and

  3. Recursively compute

  4. Compute with

  5. Recursively compute

  6. Return

Proposition 5. Algorithm 1 is correct.

Proof. If , then the result follows from Lemma 4. If , then Lemma 4 implies , whence and . For , we have . This allows us to apply Lemma 4 once more, and obtain . We conclude by noting that .

4.2.Exploiting the middle product

Let us now show how to compute and efficiently in step 4 using a middle product of matrix polynomials. In order to simplify the exposition, we assume that is a power of two; in subsection 4.4 we will consider arbitrary lengths. We first decompose all polynomials into blocks of degree . More precisely, let

so that

and

Then we observe that

(4)

where the left hand matrix has degree , where has degree , and where the right hand matrix has degree ; see also Figure 1 .

Figure 1. Schematic illustration of the computation of and by taking the middle product of and the matrix with entries , , , and .

The individual term can be recovered in linear time using

(5)

Before we discuss further optimizations that are specific to the binary FFT model, let us first consider a general multiplication scheme (that satisfies the assumptions from section 2.2), and analyze the complexity of Algorithm 1 with the middle product optimization.

Proposition 6. The cost of Algorithm 1 with the middle product optimization is bounded by

Moreover, for a multiplication with for some , the cost is

Proof. Recall that we assumed to be a power of two. Then the running time of the algorithm satisfies the recurrence inequality

(6)

Unrolling this relation while using the assumption that is non-decreasing yields the first bound. If for some and , then the relation (6) yields .

4.3.Implementation in the binary FFT model

In order to efficiently implement Algorithm 1 with the optimizations from the previous subsection in the binary FFT model, let us again assume that is a power of two. We essentially have to compute a matrix middle product in step 4 and a matrix product in step 6. We will do all these computations using DFTs of length .

Let us first consider the middle product (4). We have in (4) and the right hand side matrix has degree . Consequently, we may apply (3) and compute the middle product using FFT multiplication:

(7)

We also recall that the individual term can be recovered separately using (5).

As to the matrix product , its degree is , which is just one too large to use FFT multiplication directly. Nevertheless, FFT multiplication still allows us to compute modulo . Then we may simply recover by computing the leading coefficient separately.

Now the FFT model also allows for “FFT caching” when doing the recursive calls. In addition to , we return its DFT transform at the end of the algorithm. When computing the DFT transforms of and at length , this means that we already know their DFT transforms at length , and thereby save half of the work.

Summarizing, this leads to the following algorithm:

Algorithm 2

Input: and such that and for all

Output: and

  1. If , then return and

  2. Let

  3. Recursively compute and

    Compute using FFT doubling

  4. Compute with using (7) and (5)

  5. Recursively compute and

    Compute using FFT doubling

  6. Compute and

    Return and

Proposition 7. Algorithm 2 is correct and its cost is bounded by

Proof. Since the algorithm is simply an adaptation of Algorithm 1 to the binary FFT model, it is correct by Proposition 5. Let us analyze the costs of the various steps without the recursive calls.

The total cost of the top-level of the algorithm is therefore bounded by . Consequently, the cost of our algorithm satisfies the recurrence inequality

Unrolling this relation while using the assumption that is non-decreasing yields the desired complexity bound.

4.4.Generalization to arbitrary lengths

Let us now generalize Algorithm 2 to the case when is not necessarily a power of two.

Algorithm 3

Input: and such that and for all

Output:

  1. If , then return

  2. Let be maximal such that and

  3. Compute using Algorithm 2

  4. Compute with

  5. Recursively compute

  6. Return

Proposition 8. Algorithm 2 is correct and its cost is bounded by

Proof. The correctness is proved in the same way as for Algorithm 1. For some universal constant , the cost of the algorithm satisfies the recurrence relation

Writing with and , it follows that

where we used our assumption that is non-decreasing.

Corollary 9. Let and be such that , , and for all . Then we may compute in time

Proof. Modulo multiplication of and with , we may assume without loss of generality that .

Remark 10. Instead of Algorithm 2, we may also use Algorithm 1 with the middle product optimization in step 3. In that case, the complexity bound from Proposition 6 generalizes in a similar way to arbitrary lengths.

5.The general case

Algorithms 1 and 2 generalize to the abnormal case, modulo several technical adjustments. In this section we describe how to do this.

5.1.Statement of the non-optimized algorithm

Let us first show how to adapt Algorithm 1. Lemma 4 now becomes:

Lemma 11. Given , we have

Proof. We have for , whence the relation shows that the coefficient of degree in only depends on coefficients and of and with . We next proceed in a similar way as in the proof of Lemma 4 .

Algorithm 4

Input: with and

Output:

  1. If , then return

  2. If , then return

  3. Let and

  4. Recursively compute

  5. Let , so that and

    Compute with

  6. If , then return

  7. If , then

    • Compute and

      We claim that

    • Update and let

    • Compute for the updated

  8. Recursively compute

  9. Return

Proposition 12. Algorithm 4 is correct.

Proof. If , then the result is obvious. If and , then the result follows from Lemma 11. Assume from now on that and . Then Lemma 11 implies , whence and .

Let be largest with . If in step 6, then , so and our algorithm returns the correct answer. Assume from now on that .

We call the degeneracy after steps. Let still be largest with . Then we have and . In particular, we see that , , and . Furthermore, implies that , so . Therefore, we computed the leading terms of as part of . Now , so we only need the leading coefficients of and in order to compute . This proves our claim that .

After step 7, we thus have , , and , where . Moreover, and . In particular, we may indeed retrieve the new value of from and the old values of and at the end of step 7, since . Furthermore, , which allows to apply Lemma 11, and obtain . We conclude that

Remark 13. For efficiency, we chose in step 3, but it is interesting to note that the above correctness proof actually works for any choice of with . We will use this property for our FFT version in section 5.3 below, where we will take to be a power of two.

5.2.Exploiting the middle product

Contrary to what we did in section 4.2, we do not require to be a power of two in this subsection. In fact, it is possible to efficiently implement step 5 in general, using middle products. This time, we break up our input and output polynomials as follows:

Then we have (see See Figure 1

Figure 2. Schematic illustration of the matricial middle product in the abnormal case, for even .

)

(8)

As before, the coefficient is computed separately using

(9)

Remark 14. In the continuation of Remark 13, we note that the above formulas again apply for any choice of with .

Let be as in section 4.2. For the multiplication method that we selected, assume that the middle product (8) and the final product can each be computed in time . Then we have the following generalization of Proposition 6:

Theorem 15. Let be as in Proposition 6. Then Algorithm 4 with the middle product optimization runs in time at most

Moreover, for a multiplication with for some , the cost is

Proof. The cost of steps 5 and 9 is bounded by , according to the assumption just above this proposition. In step 7, the computation of can be done in time with , using Newton's method for fast power series inversion; see, e.g. [3, section 6]. The update of amounts to two products of degrees by , which can certainly be computed in time . The update of can be computed efficiently using a middle product that takes additional operations. Altogether, the cost of the algorithm satisfies

for some constant . Note also that and that

is a non-decreasing function. Let and be such that

(10)

for all . Let us show by induction on that (10) holds for all as well. Indeed,

This completes the proof of the first bound. We skip the proof of the second one, which is based on similar arguments.

Remark 16. The constants in the bounds from Theorem 15 are probably not sharp. The point is the following: if the quotients in the Euclidean remainder sequence are all of bounded degree, then the additional cost with respect to the normal case can actually be bounded by . If, on the contrary, some of the quotients have exceptionally large degrees, then many of the recursive calls will terminate early at step 6. This will actually make the constants decrease instead of increase. The only problematic case therefore seems to be when many of the quotients have moderately large degrees; it would be interesting to know more about the precise worse case scenario.

5.3.Implementation in the binary FFT model

One important feature of Algorithm 2 is that we use the same length for all our DFTs. If is a power of two, then we may conserve this property in the abnormal case. Indeed, the final product causes no problem since its degree is still . As to the middle product (8), we now have and the degree of the right hand side is still . This allows us to apply (3), which yields

(11)

However, there is no reason why should be a power of two for the second recursive call. In order to remedy to this problem, we introduce a new parameter that we assume to be a power of two and that we will use for the lengths of our DFTs. This requires a minor adjustment of (11):

(12)

We are now in a position to adapt Algorithm 4 to the binary FFT model: see Algorithm 5 .

Algorithm 5

Input: with and with

Output: and

  1. If , then return and

  2. If , then return and

  3. If , then

    • Recursively compute and

    • Return and , which we compute using FFT doubling

  4. Let

  5. Recursively compute and

    Compute using FFT doubling

  6. Let , so that and

    Compute with using (9) and (12)

  7. If , then return and

  8. If , then

    • Compute and

    • Compute and deduce

    • Update and , where

    • Compute for the updated

  9. Recursively compute and

    Compute using FFT doubling

  10. Compute and

    Return and

For the complexity analysis, it will be convenient to assume that satisfies the properties from section 2.2. In particular, the assumption that is non-decreasing implies that for all . Conversely, in the binary FFT model, it will be convenient to also assume that for all , , and some fixed constant .

Theorem 17. Algorithm 5 is correct. Moreover, if , then its cost is bounded by

Proof. The correctness is proved in a similar way as the correctness of Proposition 12, while using Remarks 13 and 14.

The total cost of steps 5, 6, 9, and 10 is , as in the proof of Proposition 7. As to the update step 8, the computation of requires operations. The computation of and costs , whereas the multiplication takes linear time.

Now let and before the updates, so that with and . Since , the lowest coefficients of do not matter in step 9. During the update, this means that we essentially need to compute . Now, using FFT multiplication, the computation of

takes one direct and one inverse DFT of length of total cost , since we already know . The remaining coefficients can be computed in time .

If , then and the above analysis shows that the time complexity of the algorithm satisfies

for suitable constants and . If , then the FFT doubling in step 3 can be done in time , so

by increasing if necessary. In the bound for when , the term pollutes the complexity analysis. Our next objective is to reduce to the case when the sum is replaced by .

We start with the definition of an upper bound for as follows. For , we take . For with , we define

For with , we take

(13)

Using an easy induction, we note that is increasing in for fixed . If and , then there exist and with such that

(14)

Given with , it follows that

More generally, for any , we claim that

where is the constant from before the statement of this theorem.

We prove our claim by induction on the smallest with . We already dealt with the case when , so assume that . If , then (13) and the induction hypothesis with in the role of yield

In particular,

If , then we have shown above (with in the role of ) that

whence

as claimed.

Now consider with and let and be such that . Then our claim implies

Plugging this into (14), while setting , we obtain

(15)

for all and . Unrolling this inequality for , we deduce that there exists a constant with

for all . For general with and , combining this bound with (13) and (15) yields

Under the assumption that , we have , whence .

Bibliography

[1]

M. Ben-Or and P. Tiwari. A deterministic algorithm for sparse multivariate polynomial interpolation. In Proc. ACM STOC '88, pages 301–309. New York, NY, USA, 1988.

[2]

E. R. Berlekamp. Algebraic coding theory. McGraw-Hill, 1968.

[3]

D. J. Bernstein. Fast multiplication and its applications, pages 325–384. Mathematical Sciences Research Institute Publications. Cambridge University Press, United Kingdom, 2008.

[4]

D. J. Bernstein and B.-Y. Yang. Fast constant-time gcd computation and modular inversion. IACR Trans. Cryptogr. Hardw. Embed. Syst., 3:340–398, 2019.

[5]

R. P. Brent, F. G. Gustavson, and D. Y. Y. Yun. Fast solution of Toeplitz systems of equations and computation of Padé approximants. J. Algorithms, 1(3):259–295, 1980.

[6]

D. G. Cantor and E. Kaltofen. On fast multiplication of polynomials over arbitrary algebras. Acta Informatica, 28:693–701, 1991.

[7]

D. G. Cantor and H. Zassenhaus. A new algorithm for factoring polynomials over finite fields. Math. Comp., 36(154):587–592, 1981.

[8]

J. W. Cooley and J. W. Tukey. An algorithm for the machine calculation of complex Fourier series. Math. Computat., 19:297–301, 1965.

[9]

J. L. Dornstetter. On the equivalence between Berlekamp's and Euclid's algorithms. IEEE Transactions on Information Theory, 33:428–431, 1987.

[10]

J. von zur Gathen and J. Gerhard. Modern Computer Algebra. Cambridge University Press, New York, NY, USA, 3rd edition, 2013.

[11]

B. Grenet, J. van der Hoeven, and G. Lecerf. Randomized root finding over finite fields using tangent Graeffe transforms. In Proc. ISSAC '15, pages 197–204. New York, NY, USA, 2015. ACM.

[12]

G. Hanrot, M. Quercia, and P. Zimmermann. The middle product algorithm I. speeding up the division and square root of power series. AAECC, 14:415–438, 2004.

[13]

D. Harvey. Faster algorithms for the square root and reciprocal of power series. Math. Comp., 80:387–394, 2011.

[14]

D. Harvey and J. van der Hoeven. Faster polynomial multiplication over finite fields using cyclotomic coefficient rings. J. of Complexity, 54, 2019. Article ID 101404, 18 pages.

[15]

D. Harvey and J. van der Hoeven. Polynomial multiplication over finite fields in time . Technical Report, HAL, 2019. http://hal.archives-ouvertes.fr/hal-02070816.

[16]

J. van der Hoeven. The truncated Fourier transform and applications. In Proc. ISSAC 2004, pages 290–296. Univ. of Cantabria, Santander, Spain, July 2004.

[17]

J. van der Hoeven and M. Monagan. Computing one billion roots using the tangent Graeffe method. ACM SIGSAM Commun. Comput. Algebra, 54(3):65–85, 2021.

[18]

J. van der Hoeven et al. GNU TeXmacs. https://www.texmacs.org, 1998.

[19]

A. Karatsuba and J. Ofman. Multiplication of multidigit numbers on automata. Soviet Physics Doklady, 7:595–596, 1963.

[20]

D. E. Knuth. The analysis of algorithms. In Actes du congrès international des matheématiciens 1970, volume 3, pages 269–274. Gauthier-Villars, 1971.

[21]

G. Lecerf. On the complexity of the Lickteig–Roy subresultant algorithm. J. Symbolic Comput., 2018. https://doi.org/10.1016/j.jsc.2018.04.017.

[22]

D. H. Lehmer. Euclid's algorithm for large numbers. Amer. Math. Monthly, pages 227–233, 1938.

[23]

D. Lichtblau. Half-GCD and fast rational recovery. In Proc. ISSAC '05, pages 231–236. 2005.

[24]

J. Massey. Shift-register synthesis and bch decoding. IEEE Transactions on Information Theory, 15:122–127, 1969.

[25]

R. Moenck. Fast computation of GCDs. In Proc. of the 5th ACM Annual Symposium on Theory of Computing, pages 142–171. New York, 1973. ACM Press.

[26]

N. Möller. On Schönhage's algorithm and subquadratic integer gcd computation. Math. Comp., 77(261):589–607, 2008.

[27]

F. Morain. Implementing the Thull-Yap algorithm for computing euclidean remainder sequences. In Proc. ISSAC '22, pages 197–205. 2022.

[28]

R. Prony. Essai expérimental et analytique sur les lois de la dilatabilité des fluides élastiques et sur celles de la force expansive de la vapeur de l'eau et de la vapeur de l'alkool, à différentes températures. J. de l'École Polytechnique Floréal et Plairial, an III, 1:24–76, 1795. Cahier 22.

[29]

A. Schönhage. Schnelle Berechnung von Kettenbruchentwicklungen. Acta Informatica, 1(2):139–144, 1971.

[30]

A. Schönhage. Schnelle Multiplikation von Polynomen über Körpern der Charakteristik 2. Acta Informatica, 7:395–398, 1977.

[31]

V. Shoup. NTL: a library for doing number theory. 1996. www.shoup.net/ntl.

[32]

D. Stehlé and P. Zimmermann. A binary recursive gcd algorithm. In D. Buell, editor, Algorithmic Number Theory, pages 411–425. Springer Berlin Heidelberg, 2004.

[33]

S. Stevin. L'arithmétique. Imprimerie de Christophle Plantin, 1585.

[34]

V. Strassen. Gaussian elimination is not optimal. Numer. Math., 13:352–356, 1969.

[35]

V. Strassen. The computational complexity of continued fractions. In Proc. of the Fourth ACM Symp. on Symbolic and Algebraic Computation, pages 51–67. 1981.

[36]

K. Thull and C. K. Yap. A unified approach to HGCD algorithms for polynomials and integers. https://cs.nyu.edu/yap/papers/SYNOP.htm#hgcd.

[37]

A. L. Toom. The complexity of a scheme of functional elements realizing the multiplication of integers. Soviet Mathematics, 4(2):714–716, 1963.